下面是一个测试DF:
test_df <- structure(list(plant_sp = c("plant_1", "plant_1", "plant_2", "plant_2", "plant_3",
"plant_3", "plant_3", "plant_3", "plant_3", "plant_4",
"plant_4", "plant_4", "plant_4", "plant_4", "plant_4",
"plant_5", "plant_5", "plant_5", "plant_5", "plant_5"),
site = c("a", "a", "a", "a", "a",
"b", "b", "b", "b", "b",
"a", "a", "a", "a", "a",
"b", "b", "b", "b", "b"),
sp_rich = c(5, 3, 5, 3, 5,
7, 8, 8, 8, 10,
1, 4, 5, 6, 3,
7, 3, 12, 12,11)),
row.names = c(NA, -20L), class = "data.frame",
.Names = c("plant_sp", "site", "sp_rich"))如果组中的行数大于3,我希望group_by plant_sp并提取3行随机行。
换句话说:取每一组,如果组大小大于3,则该组中仅随机保留3行。
我正在尝试使用if_else,但我无法做到这一点:
test_df <- test_df %>% group_by(plant_sp) %>%
if_else(length(plant_sp) > 3, sample_n(size =3))我想我没有正确地使用length()函数。
你能帮帮我吗?
谢谢你,伊藤
发布于 2020-12-01 00:39:34
如果您使用的是slice_sample 1.0.0或更高版本,则可以使用dplyr。它将在每组中保留3行。如果每个组中的行数小于3,则保留所有行。
library(dplyr)
test_df %>% group_by(plant_sp) %>% slice_sample(n = 3)
# plant_sp site sp_rich
# <chr> <chr> <dbl>
# 1 plant_1 a 3
# 2 plant_1 a 5
# 3 plant_2 a 5
# 4 plant_2 a 3
# 5 plant_3 b 8
# 6 plant_3 b 8
# 7 plant_3 b 7
# 8 plant_4 b 10
# 9 plant_4 a 5
#10 plant_4 a 4
#11 plant_5 b 7
#12 plant_5 b 12
#13 plant_5 b 3https://stackoverflow.com/questions/65073527
复制相似问题