我的数据如下所示:
ROW ID DATE
1 1 09/20/2014
2 1 09/21/2014
3 1 09/22/2014
4 1 09/22/2014
5 2 09/19/2014
6 2 09/20/2014
7 2 09/21/2014
8 2 09/21/2014
9 2 09/21/2014
10 3 09/18/2014
11 3 09/19/2014
12 3 09/20/2014
13 3 09/20/2014我想创建一个FL列来通过ID来标识重复日期,我知道duplicated()将标识后续的重复值,但我也想在它开始重复之前标识第一个值
我的数据应该如下所示:
ROW ID DATE FL
1 1 09/20/2014 0
2 1 09/21/2014 0
3 1 09/22/2014 1
4 1 09/22/2014 1
5 2 09/19/2014 0
6 2 09/20/2014 0
7 2 09/21/2014 1
8 2 09/21/2014 1
9 2 09/21/2014 1
10 3 09/18/2014 0
11 3 09/19/2014 0
12 3 09/20/2014 1
13 3 09/20/2014 1因此,对于每个ID,非重复日期的FL值为0,重复日期的FL值为1。如果您能帮助我使用R代码来做到这一点,那就太好了。谢谢。
编辑
下面是数据的dput:
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
DATE = structure(c(3L, 4L, 5L, 5L, 2L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 3L),
.Label = c("9/18/2014", "9/19/2014", "9/20/2014", "9/21/2014", "9/22/2014"),
class = "factor")),
.Names = c("ID", "DATE"), class = "data.frame", row.names = c(NA, -13L)) 发布于 2014-09-25 02:19:19
在data.table包中使用类似以下内容:
library(data.table)
setDT(dat)[,FL := (duplicated(DATE) | duplicated(DATE, fromLast = TRUE))*1,ID]
ID DATE FL
1: 1 9/20/2014 0
2: 1 9/21/2014 0
3: 1 9/22/2014 1
4: 1 9/22/2014 1
5: 2 9/19/2014 0
6: 2 9/20/2014 0
7: 2 9/21/2014 1
8: 2 9/21/2014 1
9: 2 9/21/2014 1
10: 3 9/18/2014 0
11: 3 9/19/2014 0
12: 3 9/20/2014 1
13: 3 9/20/2014 1或者在base R中(使用@akrun propsition):
transform(dat, ave(as.numeric(factor(DATE)), ID,
FUN=function(x) duplicated(x)|duplicated(x,fromLast=TRUE)))发布于 2014-09-25 05:58:55
Fwiw,这里有一个解决这个问题的原始方法。
# your original data frame
dat <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
DATE = structure(c(3L, 4L, 5L, 5L, 2L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 3L),
.Label = c("9/18/2014", "9/19/2014", "9/20/2014", "9/21/2014", "9/22/2014"),
class = "factor")),
.Names = c("ID", "DATE"), class = "data.frame", row.names = c(NA, -13L))
# glue the columns to avoid need of grouping by ID first
dat2 <- paste(dat$ID, dat$DATE, sep='/')
# alternatively, you can use following for string comparison, if needed.
# dat2<-paste(as.character(dat$ID),as.character(dat$DATE),sep='/')
# create a lookup table for counts of each ID+DATE combo
lookup<-table(dat2)
# add a column based on counts. If count is 1 then ID+DATE is not duplicated.
dat$FL <- sapply(dat2,FUN = function(x) { if (lookup[x] == 1) 0 else 1})
# output
print(dat)这应该会给你你想要的东西。
ID DATE FL
1 1 9/20/2014 0
2 1 9/21/2014 0
3 1 9/22/2014 1
4 1 9/22/2014 1
5 2 9/19/2014 0
6 2 9/20/2014 0
7 2 9/21/2014 1
8 2 9/21/2014 1
9 2 9/21/2014 1
10 3 9/18/2014 0
11 3 9/19/2014 0
12 3 9/20/2014 1
13 3 9/20/2014 1有更复杂的方法可以做到这一点,table()有它的局限性,但在大多数情况下,这很简单,易于阅读,应该可以为您做这项工作。
发布于 2014-09-26 13:57:45
dplyr + magrittr替代方案:
dat %>%
group_by(ID, DATE) %>%
mutate(FL = ifelse(n() > 1, 1, 0))一个小缺点:它将导致每个重复的日期为1,而不仅仅是最新的日期。看看这是不是不方便。
https://stackoverflow.com/questions/26023681
复制相似问题