文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在python中从整列字符串中找到最常用的单词

问如何在python中从整列字符串中找到最常用的单词
EN

Stack Overflow用户

提问于 2020-10-29 22:51:55

回答 2查看 41关注 0票数 0

我是Python的新手，正在学习它来进行数据分析。

我遇到了一个问题，其中我有一个包含列名标记的数据集。这个Youtube标签是一个包含各种单词的字符串。我需要找出整个专栏中最常用的单词。

数据集名称: youtube_df

列名:标签

tags_split = youtube_df.tags.head(3)
tags_split

import re
from collections import Counter

for t in tags_split:
   #print(t)
   split_strng = re.findall(r"[\w]+",t)
   print(split_strng)
   counter = Counter(split_strng)
   most_common = counter.most_common(3)
   print(most_common)

输出

['Eminem', 'Walk', 'On', 'Water', 'Aftermath', 'Shady', 'Interscope', 'Rap']
[('Eminem', 1), ('Walk', 1), ('On', 1)]
['plush', 'bad', 'unboxing', 'unboxing', 'fan', 'mail', 'idubbbztv', 'idubbbztv2', 'things', 
'best', 'packages', 'plushies', 'chontent', 'chop']
[('unboxing', 2), ('plush', 1), ('bad', 1)]
['racist', 'superman', 'rudy', 'mancuso', 'king', 'bach', 'racist', 'superman', 'love', 'rudy', 
'mancuso', 'poo', 'bear', 'black', 'white', 'official', 'music', 'video', 'iphone', 'x', 'by', 
'pineapple', 'lelepons', 'hannahstocking', 'rudymancuso', 'inanna', 'anwar', 'sarkis', 'shots', 
'shotsstudios', 'alesso', 'anitta', 'brazil', 'Getting', 'My', 'Driver', 's', 'License', 'Lele', 
'Pons']
[('racist', 2), ('superman', 2), ('rudy', 2)]

我想要计算整个列中的特定单词被使用了多少次。因此，我可以预测，这是标签中最常用的单词。

有没有人能建议最好的方法呢？我真的很感谢你的帮助。

data-cleaning

python

pandas

data-analysis

回答 2

Stack Overflow用户

发布于 2020-10-29 22:58:07

据我所知，您正在尝试对tags_split中的所有标签使用计数器

查看更新方法：https://docs.python.org/2/library/collections.html#collections.Counter.update

tags_split = youtube_df.tags.head(3)
tags_split

import re
from collections import Counter

counter = Counter()

for t in tags_split:
   #print(t)
   split_strng = re.findall(r"[\w]+",t)
   counter.update(split_strng)

most_common = counter.most_common(3)
print(most_common)

票数 0

Stack Overflow用户

发布于 2020-10-29 22:58:40

您可以尝试这样做：

m=[]
for t in tags_split:
   split_strng = re.findall(r"[\w]+",t)
   m.extend(split_strng)

l=Counter(m)
most_common=max([(i,k) for i,k in l.items()], key=lambda x: x[1])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64593557

复制

相似问题

问如何在python中从整列字符串中找到最常用的单词
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中从整列字符串中找到最常用的单词EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中从整列字符串中找到最常用的单词
EN