首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >合并其中一个数据帧中以逗号分隔的值的2个数据帧

合并其中一个数据帧中以逗号分隔的值的2个数据帧
EN

Stack Overflow用户
提问于 2018-06-02 01:06:14
回答 2查看 152关注 0票数 1

我有两个这样的数据帧

代码语言:javascript
复制
df1 <- data.frame(Colors = c("Yellow","Pink","Green","Blue","White","Red"
                            ,"Cyan","Brown","Violet","Orange","Gray"))

df2 <- data.frame(Colors = c("Yellow,Pink","Green","Gold","White","Red,Cyan,Brown",
                             "Violet","Magenta","Gray"))

我正在尝试合并这两个数据帧并在df2中返回行,这也存在于df1中。我还需要确保

我的所需的输出

代码语言:javascript
复制
          Colors
     Yellow,Pink
           Green
           White
  Red,Cyan,Brown
          Violet
            Gray

如果我执行df <- inner_join(df2,df1),那么我不会得到行Yellow,Pink & Red,Cyan,Brown

这里我漏掉了什么?谁能给我指个方向?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-02 01:21:33

在每个拆分项目上使用pmatch的基本R解决方案:

代码语言:javascript
复制
split_list <- strsplit(as.character(df2$Colors),",")
keep_lgl   <- sapply(split_list,function(x) !anyNA(pmatch(x,df1$Colors)))
df2[keep_lgl,,drop=FALSE]

#           Colors
# 1    Yellow,Pink
# 2          Green
# 4          White
# 5 Red,Cyan,Brown
# 6         Violet
# 8           Gray

注意:只有当所有颜色在df1中都可用时,我才能匹配颜色序列。

一些tidyverse方法:

代码语言:javascript
复制
library(tidyverse)
df2 %>% mutate(keep=Colors) %>%
  separate_rows(Colors) %>%
  add_count(keep) %>%
  inner_join(df1) %>%
  add_count(keep) %>% # doesn't do anything here but important in general
  filter(n==nn)   %>% # same
  distinct(keep)  %>%
  rename(Colors=keep)

# # A tibble: 6 x 1
# Colors
# <fctr>
# 1    Yellow,Pink
# 2          Green
# 3          White
# 4 Red,Cyan,Brown
# 5         Violet
# 6           Gray

df2 %>% mutate(keep=Colors) %>%
  separate_rows(Colors) %>%
  left_join(df1 %>% mutate(Colors2=Colors,.)) %>%
  group_by(keep) %>%
  summarize(filt=anyNA(Colors2)) %>%
  filter(!filt) %>%
  select(-2)

# # A tibble: 6 x 1
#             keep
#           <fctr>
# 1           Gray
# 2          Green
# 3 Red,Cyan,Brown
# 4         Violet
# 5          White
# 6    Yellow,Pink
票数 2
EN

Stack Overflow用户

发布于 2018-06-02 02:55:52

您可以使用fuzzyjoin包中的regex_inner_join连接df1df2。最后,从df2列中选择唯一行。

代码语言:javascript
复制
library(dplyr)
library(fuzzyjoin)

regex_inner_join(df2, df1, by=c(Colors = "Colors")) %>%
  select(Colors = Colors.x) %>% distinct()

#           Colors
# 1    Yellow,Pink
# 2          Green
# 3          White
# 4 Red,Cyan,Brown
# 5         Violet
# 6           Gray

# Just to demonstrate, result of joined tables using regex_inner_join. One,
# can work-out to convert data in desired format afterwards.

regex_inner_join(df2, df1, by=c(Colors = "Colors")) 
#         Colors.x Colors.y
# 1    Yellow,Pink   Yellow
# 2    Yellow,Pink     Pink
# 3          Green    Green
# 4          White    White
# 5 Red,Cyan,Brown      Red
# 6 Red,Cyan,Brown     Cyan
# 7 Red,Cyan,Brown    Brown
# 8         Violet   Violet
# 9           Gray     Gray
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50648128

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档