我有两个这样的数据帧
df1 <- data.frame(Colors = c("Yellow","Pink","Green","Blue","White","Red"
,"Cyan","Brown","Violet","Orange","Gray"))
df2 <- data.frame(Colors = c("Yellow,Pink","Green","Gold","White","Red,Cyan,Brown",
"Violet","Magenta","Gray"))
我正在尝试合并这两个数据帧并在df2中返回行,这也存在于df1中。我还需要确保
我的所需的输出是
Colors
Yellow,Pink
Green
White
Red,Cyan,Brown
Violet
Gray
如果我执行df <- inner_join(df2,df1)
,那么我不会得到行Yellow,Pink
& Red,Cyan,Brown
这里我漏掉了什么?谁能给我指个方向?
发布于 2018-06-02 01:21:33
在每个拆分项目上使用pmatch
的基本R
解决方案:
split_list <- strsplit(as.character(df2$Colors),",")
keep_lgl <- sapply(split_list,function(x) !anyNA(pmatch(x,df1$Colors)))
df2[keep_lgl,,drop=FALSE]
# Colors
# 1 Yellow,Pink
# 2 Green
# 4 White
# 5 Red,Cyan,Brown
# 6 Violet
# 8 Gray
注意:只有当所有颜色在df1
中都可用时,我才能匹配颜色序列。
一些tidyverse
方法:
library(tidyverse)
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
add_count(keep) %>%
inner_join(df1) %>%
add_count(keep) %>% # doesn't do anything here but important in general
filter(n==nn) %>% # same
distinct(keep) %>%
rename(Colors=keep)
# # A tibble: 6 x 1
# Colors
# <fctr>
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
left_join(df1 %>% mutate(Colors2=Colors,.)) %>%
group_by(keep) %>%
summarize(filt=anyNA(Colors2)) %>%
filter(!filt) %>%
select(-2)
# # A tibble: 6 x 1
# keep
# <fctr>
# 1 Gray
# 2 Green
# 3 Red,Cyan,Brown
# 4 Violet
# 5 White
# 6 Yellow,Pink
发布于 2018-06-02 02:55:52
您可以使用fuzzyjoin
包中的regex_inner_join
连接df1
和df2
。最后,从df2
列中选择唯一行。
library(dplyr)
library(fuzzyjoin)
regex_inner_join(df2, df1, by=c(Colors = "Colors")) %>%
select(Colors = Colors.x) %>% distinct()
# Colors
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
# Just to demonstrate, result of joined tables using regex_inner_join. One,
# can work-out to convert data in desired format afterwards.
regex_inner_join(df2, df1, by=c(Colors = "Colors"))
# Colors.x Colors.y
# 1 Yellow,Pink Yellow
# 2 Yellow,Pink Pink
# 3 Green Green
# 4 White White
# 5 Red,Cyan,Brown Red
# 6 Red,Cyan,Brown Cyan
# 7 Red,Cyan,Brown Brown
# 8 Violet Violet
# 9 Gray Gray
https://stackoverflow.com/questions/50648128
复制相似问题