实际上,我想按v1进行分组,并且只过滤最小(V3)和最大值(V3)之间的差值大于6i的数据帧
v1 v2 v3
a 2 13
b 5 3
c 2 1
d 2 1
e 1 2
a 2 4
a 8 1
e 1 9
b 0 1
c 2 8
d 1 5
如果我们计算一下,我们会发现:
a “13-1” =12
b “3-1” =2
c “8-1” =7
d “5-1” =4
e “9-2” =7
因此,预期的结果将是保持组a、c和e的值,因为它们都是>=6。
v1 v2 v3
a 2 13
c 2 1
e 1 2
a 2 4
a 8 1
e 1 9
c 2 8
发布于 2020-01-09 02:57:11
我们可以将group_by
v1
和diff
用于range
of v3
值。
library(dplyr)
df %>% group_by(v1) %>% filter(diff(range(v3)) >= 6)
# v1 v2 v3
# <fct> <int> <int>
#1 a 2 13
#2 c 2 1
#3 e 1 2
#4 a 2 4
#5 a 8 1
#6 e 1 9
#7 c 2 8
或者我们也可以使用max
- min
df %>% group_by(v1) %>% filter(max(v3) - min(v3) >= 6)
我们可以在R基ave
中使用相同的
subset(df, ave(v3, v1, FUN = function(x) diff(range(x))) >= 6)
发布于 2020-01-09 02:57:55
我们可以使用dplyr
按“v1”进行分组,并得到“v3”的range
的diff
值小于6
library(dplyr)
df1 %>%
group_by(v1)%>%
filter(abs(diff(range(v3))) >= 6)
# A tibble: 7 x 3
# Groups: v1 [3]
# v1 v2 v3
# <chr> <int> <int>
#1 a 2 13
#2 c 2 1
#3 e 1 2
#4 a 2 4
#5 a 8 1
#6 e 1 9
#7 c 2 8
或者我们可以使用arrange
列'v3‘,然后通过first
和last
值的差异来确定filter
df1 %>%
arrange(v1, v3) %>%
group_by(v1) %>%
filter(last(v3) - first(v3) >=6)
或使用data.table
library(data.table)
setDT(df1)[, .SD[abs(diff(range(v3))) >= 6], by = v1]
或者另一个选择是.I
setDT(df1)[df1[, .I[abs(diff(range(v3))) >= 6], by = v1]$V1]
或者另一种选择是来自ave
的base R
i1 <- with(df1, ave(v3, v1, FUN = function(x) abs(diff(range(x)))) >= 6)
df1[i1,]
或者使用subset
和tapply
subset(df1, v1 %in% names(which(tapply(v3, v1,
function(x) diff(range(x))) >=6)))
数据
df1 <- structure(list(v1 = c("a", "b", "c", "d", "e", "a", "a", "e",
"b", "c", "d"), v2 = c(2L, 5L, 2L, 2L, 1L, 2L, 8L, 1L, 0L, 2L,
1L), v3 = c(13L, 3L, 1L, 1L, 2L, 4L, 1L, 9L, 1L, 8L, 5L)),
class = "data.frame", row.names = c(NA,
-11L))
https://stackoverflow.com/questions/59656597
复制相似问题