这是我的数据集:
structure(list(Date = structure(c(14609, 14609, 14609, 14609, 14699, 14699, 14699, 14699, 14790, 14790, 14790, 14790), class = "Date"),
ID = structure(c(5L, 4L, 6L, 10L, 9L, 3L, 10L, 8L, 7L, 1L,
10L, 2L), .Label = c("B00NYQ2", "B03J9L7", "B05DZD1", "B06HC42",
"B09V3X7", "B09YCC8", "X6114659", "X6478816", "X6556701",
"X6812555"), class = "factor"), Name = structure(c(10L, 4L,
9L, 8L, 7L, 3L, 8L, 6L, 2L, 5L, 8L, 1L), .Label = c("AIRA",
"BOUS", "CSCS", "EVF", "GTB", "JER", "MGB", "MPR", "NVB",
"TTNP"), class = "factor"), Score = c(55.075, 54.5, 53.325,
52.175, 70.275, 69.825, 60.15, 60.025, 56.175, 52.65, 52.175,
52.125), Score.rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L)), .Names = c("Date", "ID", "Name", "Score", "Score.rank"), row.names = c(1L, 2L, 3L, 4L, 71L, 72L, 73L, 74L, 156L, 157L, 158L, 159L), class = "data.frame")当我们进入一个新的时期时,我想找出哪些身份证进出。
我的意思是.我想比较一下ID是否存在于上一个时期,用“日期”来表示。
如果它存在于上一期间(日期),则不应返回任何内容。
如果在上一期间不存在,则应返回" in“。
我也想说明,如果在下一个时期不存在,它应该返回一个“退出”。这段时间的时间应该等于下一次的周期
我期望的dataframe看起来像这样
Date ID Name Score Score.rank THIS PERIOD NEXT PERIOD
31/12/2009 B09V3X7 TTNP 55.075 1 OUT
31/12/2009 B06HC42 EVF 54.5 2 OUT
31/12/2009 B09YCC8 NVB 53.325 3 OUT
31/12/2009 X6812555 MPR 52.175 4
31/3/2010 X6556701 MGB 70.275 1 IN
31/3/2010 B05DZD1 CSCS 69.825 2 IN OUT
31/3/2010 X6812555 MPR 60.15 3
31/3/2010 X6478816 JER 60.025 4 IN OUT
30/6/2010 X6114659 BOUS 56.175 1 IN
30/6/2010 B00NYQ2 GTB 52.65 2 IN
30/6/2010 X6812555 MPR 52.175 3
30/6/2010 B03J9L7 AIRA 52.125 4 IN 有人能为我指出正确的方向吗?提前感谢
发布于 2017-01-13 05:32:41
不幸的是,你的描述和例子不匹配。考虑到您的描述,您似乎希望标记it的输入和退出条件。
可以实现的目标是:
dft %>%
group_by(ID) %>%
dplyr::mutate( This_period = if_else(Date == min(Date), "IN", NULL) ) %>%
dplyr::mutate( Next_period = if_else(Date == max(Date), "OUT", NULL))和返回:
#Source: local data frame [12 x 7]
#Groups: ID [10]
#
# Date ID Name Score Score.rank This_period Next_period
# <date> <fctr> <fctr> <dbl> <int> <chr> <chr>
#1 2009-12-31 B09V3X7 TTNP 55.075 1 IN OUT
#2 2009-12-31 B06HC42 EVF 54.500 2 IN OUT
#3 2009-12-31 B09YCC8 NVB 53.325 3 IN OUT
#4 2009-12-31 X6812555 MPR 52.175 4 IN <NA>
#5 2010-03-31 X6556701 MGB 70.275 1 IN OUT
#6 2010-03-31 B05DZD1 CSCS 69.825 2 IN OUT
#7 2010-03-31 X6812555 MPR 60.150 3 <NA> <NA>
#8 2010-03-31 X6478816 JER 60.025 4 IN OUT
#9 2010-06-30 X6114659 BOUS 56.175 1 IN OUT
#10 2010-06-30 B00NYQ2 GTB 52.650 2 IN OUT
#11 2010-06-30 X6812555 MPR 52.175 3 <NA> OUT
#12 2010-06-30 B03J9L7 AIRA 52.125 4 IN OUT但是,您的示例建议从min(Date)检查中排除this_period,从Next_period检查中排除max(Date)。是这样吗?如果是,score.rank是否与Date有某种联系?请澄清。
https://stackoverflow.com/questions/41627091
复制相似问题