首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >提取不同表中两列之间的常用单词python

提取不同表中两列之间的常用单词python
EN

Stack Overflow用户
提问于 2020-05-07 12:04:41
回答 2查看 537关注 0票数 4

我想提取df1中与df2匹配的所有单词。

代码语言:javascript
复制
df1 = pd.DataFrame(['Dog has 4 legs.It has 2 eyes.','Fish has fins','Cat has paws.It eats fish','Monkey has tail'],columns=['Description'])

df2 = pd.DataFrame(['Fish','Legs','Eyes'],columns=['Parts'])


 Df1                                             Df2
|---------------------------------|             |---------------------------------|
|         **Description**         |             |          Parts                  |     
|---------------------------------|             |---------------------------------|
|  Dog has 4 legs.It has 2 eyes.  |             | Fish                            |
|---------------------------------|             |---------------------------------|
|  Fish has fins                  |             | Legs                            | 
|---------------------------------|             |---------------------------------|
|  Cat has paws.It eats fish.     |             | Tail                            |  
|---------------------------------|             |---------------------------------| 

期望产出:

代码语言:javascript
复制
|---------------------------------|-----------|
|         **Description**         |Parts      |
|---------------------------------|-----------|
|  Dog has 4 legs.It has 2 eyes.  |Legs,Tail  |
|---------------------------------|-----------|
|  Fish has fins                  |Fish       |   
|---------------------------------|-----------|
|  Cat has paws.It eats fish.     |Fish       | 
|---------------------------------|-----------|
|  Monkey has tail                |           |   
|---------------------------------|-----------|
EN

Stack Overflow用户

发布于 2020-05-07 12:23:29

@Datanovice的解决方案更好,因为一切都在Pandas之内。这是另一种选择,而且速度更快(在Pandas中字符串操作不是那么快):

代码语言:javascript
复制
from itertools import product
from collections import defaultdict
res = df2.Parts.str.lower().array
d = defaultdict(list)
for description, word in product(df1.Description, res):
    if word in description.lower():
        d[description].append(word)

d

defaultdict(list,
            {'Dog has 4 legs.It has 2 eyes.': ['legs', 'eyes'],
             'Fish has fins': ['fish'],
             'Cat has paws.It eats fish': ['fish']})

df1['parts'] = df1.Description.map(d).str.join(',')
       Description                    parts
0   Dog has 4 legs.It has 2 eyes.   legs,eyes
1   Fish has fins                   fish
2   Cat has paws.It eats fish       fish
3   Monkey has tail 
票数 1
EN
查看全部 2 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61657342

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档