我有一个DataFrame,有一个网站,类别,并为该网站关键字。
Url | categories | keywords
Espn | [sport, nba, nfl] | [half, touchdown, referee, player, goal]
Tmz | [entertainment, sport] | [gossip, celebrity, player]
Goal [ [sport, premier_league, champions_league] | [football, goal, stadium, player, referee]
可以使用以下代码创建:
data = [{ 'Url': 'ESPN', 'categories': ['sport', 'nba', 'nfl'] ,
'keywords': ["half", "touchdown", "referee", "player", "goal"] },
{ 'Url': 'TMZ', 'categories': ["entertainment", "sport"] ,
'keywords': ["gossip", "celebrity", "player"] },
{ 'Url': 'Goal', 'categories': ["sport", "premier_league", "champions_league"] ,
'keywords': ["football", "goal", "stadium", "player", "referee"]},
]
df =pd.DataFrame(data)
对于关键字列中的所有单词,我希望获得与其相关的类别的频率。结果可能如下所示:
{体育: 1,nba: 1,nfl: 1},触地: 1,nba: 1,nfl: 1},裁判:{体育: 2,nba: 1,nfl: 1,premier_league: 1,champions_league:1 },球员:{体育运动: 3,nba: 1,nfl: 1,premier_league: 1,champions_league:1 },绯闻:{体育:1,娱乐:1},名人:{体育:1,娱乐:1,目标:{体育:2,premier_league:1,champions_league:1,nba: 1,nfl: 1},体育场:{体育运动:1,premier_league:1,champions_league:1 }
发布于 2022-10-21 18:11:54
由于列包含列表,因此可以对每个列表的每个元素重复一次行:
result = (
df.explode("keywords")
.explode("categories")
.groupby(["keywords", "categories"])
.size()
)
https://stackoverflow.com/questions/74157446
复制相似问题