我正在尝试选择两个数据帧的通用值。我有一个big_df和一个small_df
我试图获得的是一个数据帧,其中只有"ID“值在两个数据帧中是相同的,并且我只对保留big_df而不是small_df感兴趣。
library(dplyr)
df3 <- merge(big_df, small_df, by =("ID"))
> df3
ID Age Name Colour
1 1 21 a blue
2 4 20 d green
3 8 87 h red
4 9 9 i blackbig_df <- data.frame("ID" = 1:10, "Age" = c(21,15,1,20,34,45,67,87,9,77), "Name" = c("a","b","c","d","e","f","g","h","i","l"))
> big_df
ID Age Name
1 1 21 a
2 2 15 b
3 3 1 c
4 4 20 d
5 5 34 e
6 6 45 f
7 7 67 g
8 8 87 h
9 9 9 i
10 10 77 l
small_df <- data.frame("ID" = c(1,4,8,9), "Colour" = c("blue","green","red","black"))
> small_df
ID Colour
1 1 blue
2 4 green
3 8 red
4 9 black我想要不带颜色信息的。
> df3
ID Age Name
1 1 21 a
2 4 20 d
3 8 87 h
4 9 9 i 发布于 2019-06-12 19:36:57
dplyr的semi_join()正是为此而设计的。
big_df <- data.frame("ID" = 1:10, "Age" = c(21,15,1,20,34,45,67,87,9,77), "Name" = c("a","b","c","d","e","f","g","h","i","l"))
small_df <- data.frame("ID" = c(1,4,8,9), "Colour" = c("blue","green","red","black"))
library(dplyr)
semi_join(big_df,small_df,by='ID')
#
# ID Age Name
# 1 1 21 a
# 2 4 20 d
# 3 8 87 h
# 4 9 9 i发布于 2019-06-12 19:35:39
我觉得你真正需要的是:
#check which big IDs exist in small IDs and subset
big_df[big_df$ID %in% unique(small_df$ID), ]
# ID Age Name
#1 1 21 a
#4 4 20 d
#8 8 87 h
#9 9 9 i所以,我认为在这种情况下你不需要加入。
https://stackoverflow.com/questions/56561221
复制相似问题