我想根据字符串的字典值删除它们。条件是,如果字符串的值是单元格中字符串中最小的,则需要删除字符串,并保留字符串列不超过两个字符串。我怎样才能做到这一点?
fruit_dict = {
"Apple": 10,
"Watermelon": 20,
"Cherry": 30,
"Orange": 40,
"Lemon": 50
}
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Apple, Watermelon, Cherry, Lemon",
"Cherry, Watermelon, Orange",
"Apple, Cherry, Watermelon",
"Cherry",
"Cherry, Orange",
],
}
)
预期产出:
ID name
0 1 Cherry, Lemon
1 2 Cherry, Orange
2 3 Cherry, Watermelon
3 4 Cherry
4 5 Cherry, Orange
发布于 2021-12-07 12:30:30
对于获得每个组的top2值,似乎应该简化解决方案,如:
df = (df.assign(name= df['name'].str.split(', '))
.explode('name')
.assign(new = lambda x: x['name'].map(fruit_dict))
.sort_values(['ID', 'new'], ascending=[True, False])
.groupby('ID')['name']
.agg(lambda x: ','.join(x.head(2)))
.reset_index()
)
print (df)
ID name
0 1 Lemon,Cherry
1 2 Orange,Cherry
2 3 Cherry,Watermelon
3 4 Cherry
4 5 Orange,Cherry
或者将sorted
与key
和reverse=True
结合使用
f = lambda x: ','.join(sorted(x.split(', '),
key=lambda x: fruit_dict.get(x, float('-inf')),
reverse = True)[:2])
df['name'] = df['name'].apply(f)
print (df)
ID name
0 1 Lemon,Cherry
1 2 Orange,Cherry
2 3 Cherry,Watermelon
3 4 Cherry
4 5 Orange,Cherry
https://stackoverflow.com/questions/70260053
复制相似问题