我的dataframe data1的结构有超过150万行,如下所示:
data1 <- data.frame(NEW_UPC=c(11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005991,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005992,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005993,11820005994,11820005994,11820005994,11820005994,11820005994,11820005994,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995,11820005995),
                IRI_KEY=c(1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1078107,1078107,1078107,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106,1073521,1073521,1073525,1073525,1078106,1078106,1073521,1073521,1073521,1073525,1073525,1073525,1078106,1078106,1078106),
                WEEK = c(1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1229,1230,1232,1218,1224,1229,1282,1285,1287,1229,1230,1232,1217,1221,1227,1270,1272,1273,1273,1274,1270,1272,1217,1221,1229,1230,1232,1218,1224,1229,1282,1285,1287),
                END=c(1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1232,1232,1232,1229,1229,1229,1287,1287,1287,1232,1232,1232,1227,1227,1227,1273,1273,1273,1274,1274,1272,1272,1221,1221,1232,1232,1232,1229,1229,1229,1287,1287,1287))我需要使用WEEK和END列中的值插入一个列WEEK和END,并插入一个截止值( 1287 )。Exit.time应该基于以下逻辑具有0或1的值:
如果WEEK = 1287,则Exit.time = 0。
如果Week不等于1287,但WEEK = END,则Exit.time = 1,否则Exit.time = 0。
为此,我尝试了下面的For循环,它执行上述虚拟数据集中所需的操作。
i=0
for(i in 1:length(data2$NEW_UPC)){
  if (data2$WEEK[i]==1287) {
    data2$Exit.time[i] <- 0
  } else if(data2$WEEK[i]==data2$END[i]) {
    data2$Exit.time[i] <- 1
  } else {
    data2$Exit.time[i] <- 0
  }
}问题是,当我在实际数据集中使用上面的循环时,即使在一个小时之后,我也得不到输出。考虑到数据集的大小,我想循环是没有效率的。有别的办法来做我想做的事吗?我更喜欢在data1中保持行的顺序,因为我需要稍后进行一些合并操作。
发布于 2018-06-27 18:49:41
因为当Exit.time为1时,如果(WEEK == END) & WEEK != 1287为0,则可以对(WEEK == END) & WEEK != 1287的结果使用as.numeric,这会将TRUE更改为1,FALSE更改为0。
data1$Exit.time <- with(data1, as.numeric(WEEK != 1287 & WEEK == END))发布于 2018-06-27 18:51:20
有多种方法对此进行编码,主要是在语义上不同,它们基本上是在做相同的事情。
R基地:
data1$Exit.time <- (data1$WEEK != 1287 & data1$WEEK == data1$END)*1这涉及到大量输入data1,因此有一个捷径:
data1 <- within(data1, {
  Exit.time <- (WEEK != 1287 & WEEK == END)*1
})Tidyverse: Tidyverse是一组擅长处理数据的软件包。我们使用的是包dplyr,它是tidyverse的一部分,所以您可以加载整个程序,也可以只加载dplyr。
library(tidyverse)
data1 <- data1 %>%
   mutate(
     Exit.time = (WEEK != 1287 & WEEK == END)*1
   )(我通过乘以1将真/假转换为0/1,键入的次数较少)
发布于 2018-06-27 19:19:48
使用data.table
setDT(data1)[, Exit.time := ifelse(WEEK == 1287, 0, ifelse(WEEK != 1287 & WEEK == END, 1, 0))]https://stackoverflow.com/questions/51069534
复制相似问题