我希望根据字典的键和值保留具有最大值的字符串。对如何有效地做到这一点,有什么建议吗?
fruit_dict = {
"Apple": 10,
"Watermelon": 20,
"Cherry": 30
}
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Apple, Watermelon",
"Cherry, Watermelon",
"Apple",
"Cherry, Apple",
"Cherry",
],
}
)
ID name
0 1 Apple, Watermelon
1 2 Cherry, Watermelon
2 3 Apple
3 4 Cherry, Apple
4 5 Cherry
预期产出:
ID name
0 1 Watermelon
1 2 Cherry
2 3 Apple
3 4 Cherry
4 5 Cherry
发布于 2021-12-02 13:55:47
使用apply
和max
和fruit_dict.get
作为密钥的一种方法是:
new_df = (df.assign(name=df['name'].str.split(', ')
.apply(lambda l: max(l, key=fruit_dict.get)))
)
或者,如果您期望字典中缺少一些名称:
new_df = (df.assign(name=df['name'].str.split(', ')
.apply(lambda l: max(l, key=lambda x: fruit_dict.get(x, float('-inf'))))
)
产出:
ID name
0 1 Watermelon
1 2 Cherry
2 3 Apple
3 4 Cherry
4 5 Cherry
发布于 2021-12-02 13:53:18
使用:
df = (df.assign(name= df['name'].str.split(', '))
.explode('name')
.assign(new = lambda x: x['name'].map(fruit_dict))
.sort_values(['ID', 'new'], ascending=[True, False])
.drop_duplicates('ID')
)
print (df)
ID name new
0 1 Watermelon 20
1 2 Cherry 30
2 3 Apple 10
3 4 Cherry 30
4 5 Cherry 30
或者:
df['new'] = df['name'].apply(lambda x: max(x.split(', '), key=fruit_dict.get))
print (df)
ID name new
0 1 Apple, Watermelon Watermelon
1 2 Cherry, Watermelon Cherry
2 3 Apple Apple
3 4 Cherry, Apple Cherry
4 5 Cherry Cherry
编辑:如果未返回匹配值,则为第一个值:
fruit_dict = {
"Apple": 10,
"Watermelon": 20,
"Cherry": 30
}
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Apple, Watermelon",
"Cherry, Watermelon",
"Apple",
"Cherry, Apple",
"ooo, Cherry2, aaaa", <- changed data
],
}
)
print (df)
df1 = (df.assign(name= df['name'].str.split(', '))
.explode('name')
.assign(new = lambda x: x['name'].map(fruit_dict))
.sort_values(['ID', 'new'], ascending=[True, False])
.drop_duplicates('ID')
)
print (df1)
ID name new
0 1 Watermelon 20.0
1 2 Cherry 30.0
2 3 Apple 10.0
3 4 Cherry 30.0
4 5 ooo NaN
如果需要NaN
,如果没有匹配:
df1['name'] = df1['name'].mask(df1.pop('new').isna())
print (df1)
ID name
0 1 Watermelon
1 2 Cherry
2 3 Apple
3 4 Cherry
4 5 NaN
df['new1'] = df['name'].apply(lambda x: max(x.split(', '), key=lambda x: fruit_dict.get(x, float('-inf'))))
df['new2'] = df['name'].apply(lambda x: max(x.split(', '), key=lambda x: fruit_dict.get(x, 0)))
df['new3'] = df['name'].apply(lambda x: max(x.split(', '), key=lambda x: fruit_dict.get(x, 1000)))
print (df)
ID name new1 new2 new3
0 1 Apple, Watermelon Watermelon Watermelon Watermelon
1 2 Cherry, Watermelon Cherry Cherry Cherry
2 3 Apple Apple Apple Apple
3 4 Cherry, Apple Cherry Cherry Cherry
4 5 ooo, Cherry2, aaaa ooo ooo ooo
发布于 2022-11-24 02:13:28
fruit_dict = {
"Apple": 10,
"Watermelon": 20,
"Cherry": 30
}
df.assign(name=df.name.str.split(',')).name.map(lambda x:pd.Series(fruit_dict)[x].nlargest().index.values[0])
0 Watermelon
1 Cherry
2 Apple
3 Cherry
4 Cherry
Name: name, dtype: object
https://stackoverflow.com/questions/70200649
复制相似问题