假设下面的元组列表表示来自3种不同方法的情绪估计:
[('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]
我想知道找到多数人情绪的最有效方法是什么,并为此计算其平均值,即:
result=('pos', 0.3)
谢谢
发布于 2017-07-14 19:29:41
import itertools
l = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]
您可以首先按情感进行分组(请注意,它们需要首先进行排序)
sentiments = [list(j[1]) for j in itertools.groupby(sorted(l), lambda i: i[0])]
# sentiments = [[('neu', 0.1)], [('pos', 0.2), ('pos', 0.4)]]
然后找出哪种情绪最常见(aka有最长的一组)
majority = max(sentiments, key=len)
# majority = [('pos', 0.2), ('pos', 0.4)]
最后计算平均值
values = [i[1] for i in majority]
average = (majority[0][0], sum(values)/len(values))
# average = ('pos', 0.30000000000000004)
发布于 2017-07-14 19:31:01
使用collections
和statistics
模块,您可以做到这一点:
from collections import Counter
from statistics import mean
lst = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]
count = Counter(item[0] for item in lst) # Counter({'pos': 2, 'neu': 1})
maj = count.most_common(1)[0][0] # pos
mn = mean(item[1] for item in lst if item[0] == maj)
result = (maj, mn)
print(result) # ('pos', 0.30000000000000004)
尽管您正在寻找效率,但我更喜欢CoryKramer's answer。
发布于 2017-07-14 19:26:44
import collections
reports = [('pos', 0.2), ('neu', 0.1), ('pos', 0.4)]
oracle = collections.defaultdict(list)
for mood, score in reports:
oracle[mood].append(score)
counts = {mood: len(scores) for mood, scores in oracle.items()}
mood = max(counts) # gives `'pos'`
sum(oracle[mood]) / len(oracle[mood]) # gives 0.3
https://stackoverflow.com/questions/45101690
复制相似问题