示例数据:
df <- data.frame(loc.id = rep(1:2, each = 11),
x = c(35,51,68,79,86,90,92,93,95,98,100,35,51,68,79,86,90,92,92,93,94,94))对于每个loc.id,我想过滤掉x <= 95。
df %>% group_by(loc.id) %>% filter(row_number() <= which.max(x >= 95))
loc.id x
<int> <dbl>
1 1 35
2 1 51
3 1 68
4 1 79
5 1 86
6 1 90
7 1 92
8 1 93
9 1 95
10 2 35然而,组2的问题是所有的值都小于95。因此,我希望保留组2的所有x值。但是,上面的代码行并没有做到这一点。
发布于 2018-05-31 05:39:55
您可以使用match获取第一个TRUE索引,如果通过nomatch参数未找到匹配项,则返回组的长度:
df %>%
group_by(loc.id) %>%
filter(row_number() <= match(TRUE, x >= 95, nomatch=n()))
# A tibble: 20 x 2
# Groups: loc.id [2]
# loc.id x
# <int> <dbl>
# 1 1 35
# 2 1 51
# 3 1 68
# 4 1 79
# 5 1 86
# 6 1 90
# 7 1 92
# 8 1 93
# 9 1 95
#10 2 35
#11 2 51
#12 2 68
#13 2 79
#14 2 86
#15 2 90
#16 2 92
#17 2 92
#18 2 93
#19 2 94
#20 2 94或者反向cumsum作为过滤条件:
df %>% group_by(loc.id) %>% filter(!lag(cumsum(x >= 95), default=FALSE))发布于 2018-05-31 05:48:00
使用all和dplyr包的解决方案可以实现为:
library(dplyr)
df %>% group_by(loc.id) %>%
filter((x > 95) | all(x<=95)) # All x in group are <= 95 OR x > 95
# # Groups: loc.id [2]
# loc.id x
# <int> <dbl>
# 1 1 98.0
# 2 1 100
# 3 2 35.0
# 4 2 51.0
# 5 2 68.0
# 6 2 79.0
# 7 2 86.0
# 8 2 90.0
# 9 2 92.0
# 10 2 92.0
# 11 2 93.0
# 12 2 94.0
# 13 2 94.0https://stackoverflow.com/questions/50613621
复制相似问题