我正在试图计算数据文件中包含来自wordlist的单词的产品数量,然后找出这些产品的平均价格。以下的尝试-
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe[dframe['Product'].str.contains(word)]['Price']
print(dframe[dframe['Product'].str.contains(word)]['Price'])
average_price = total_price / total_count将average_price作为Series([], Name: Price, dtype: float64)返回,而不是按预期返回浮点值。
我做错了什么?
谢谢!
发布于 2018-02-18 11:42:41
在标量值的每个条件下,需要列sum的Price:
total_count, total_price = 0, 0
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe.loc[dframe['Product'].str.contains(word), 'Price'].sum()
average_price = total_price / total_count或者chache mask用于变量以获得更好的可达性和性能:
total_count, total_price = 0, 0
for word in wordlist:
mask = dframe.Product.str.contains(word, case=False)
total_count += mask.sum()
total_price += dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_count解决方案应该用regex word1|word2|word3 - | means or简化
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
total_count = mask.sum()
total_price = dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_countmask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()样本
dframe = pd.DataFrame({
'Product': ['a1','a2','a3','c1','c1','b','b2','c3','d2'],
'Price': [1,3,5,6,3,2,3,5,2]
})
print (dframe)
Price Product
0 1 a1
1 3 a2
2 5 a3
3 6 c1
4 3 c1
5 2 b
6 3 b2
7 5 c3
8 2 d2
wordlist = ['b','c']
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
print (average_price)
3.8发布于 2018-02-18 11:48:05
您可以使用value函数来避免级数。
total_count += dframe.Product.str.contains(word,case=False).value.sum()
dframedframe['Product'].str.contains(word).value total_price +=
https://stackoverflow.com/questions/48851238
复制相似问题