当mode()返回多个值(即多模式分布中的值)时,是否有一种打破联系的简洁方法。就上下文而言,我正在实施一个投票系统,我想以一种定制的、非随机的方式打破联系。
在下面的人工示例中,我想要每一行的模式,但是如果有几种模式,我希望选择名称最短的模式。
import pandas as pd
import random
random.seed(0)
fruits = ['apple', 'banana', 'cherry', 'date', 'elderberry', 'fig']
d = pd.DataFrame(
{c : [random.choice(fruits)
for _ in range(10)]
for c in "ABCDEF"}
)
print(d,"\n")
print(d.mode(axis=1))
输出显示初始数据帧和pandas.DataFrame.mode()
的默认输出。
A B C D E F
0 date elderberry fig elderberry date fig
1 date banana elderberry apple elderberry fig
2 apple elderberry banana cherry cherry apple
3 cherry banana cherry date apple elderberry
4 elderberry cherry apple cherry elderberry date
5 date banana fig elderberry apple cherry
6 date apple apple fig apple banana
7 cherry elderberry fig banana fig fig
8 date cherry cherry elderberry date cherry
9 cherry elderberry date date fig fig
0 1 2 3 4 5
0 date elderberry fig NaN NaN NaN
1 elderberry NaN NaN NaN NaN NaN
2 apple cherry NaN NaN NaN NaN
3 cherry NaN NaN NaN NaN NaN
4 cherry elderberry NaN NaN NaN NaN
5 apple banana cherry date elderberry fig
6 apple NaN NaN NaN NaN NaN
7 fig NaN NaN NaN NaN NaN
8 cherry NaN NaN NaN NaN NaN
9 date fig NaN NaN NaN NaN
我已经张贴了我自己的尝试作为一个答案,但是否有一个更整洁的方法来做到这一点?无论如何,我希望它能在同样的问题上帮助到其他人。
发布于 2021-09-20 08:42:58
在我的尝试中,我在每一行上使用apply()
来收集所有模式值(首先删除NAs ),然后使用我自己的(名称长度)得分作为python sorted()
的key
参数进行自定义排序。第一项是我们想要的。当然,这个解决方案不处理“它有两个相同长度的模态值?”,但是对于任何给定的场景,可以根据需要来开发断线函数。
def myscore(fruitname):
return len(fruitname)
def breakties(row):
modes = list(row.dropna())
return sorted(modes, key=myscore)[0]
print("Mode with ties broken")
print(d.mode(axis=1).apply(breakties, axis=1))
print("OR, more succinctly")
print(
d.mode(axis=1).apply(
lambda row: sorted(list(row.dropna()),key=lambda v:len(v))[0],
axis=1
)
)
Mode with ties broken
0 fig
1 elderberry
2 apple
3 cherry
4 cherry
5 fig
6 apple
7 fig
8 cherry
9 fig
dtype: object
OR, more succinctly
0 fig
1 elderberry
2 apple
3 cherry
4 cherry
5 fig
6 apple
7 fig
8 cherry
9 fig
dtype: object```
https://stackoverflow.com/questions/69251461
复制相似问题