我正在做一个项目,试图将函数和操作从Excel移植到pandas中的python。
我有很多SUMIF函数要在数据中复制。问题是,我不认为熊猫有特别类似的功能。我可能有一个excel表达式,如:
=(SUMIFS('Sheetname'!BI$146:BI$282,'Sheetname'!$B$146:$B$282,$I1836))
其中第一个参数是需要求和的区域。第二个区域是我们检查匹配条件的区域,最后一个参数是我们要查找的特定值。
我现在正在做的是运行一个嵌套循环,它迭代所有的行和列,并检查第一次迭代,找到匹配的行,而内部循环找到匹配的列。然后将这些值相加并输入到pandas函数中。
类似于:
table_dict_temp是我要填充的表,table_temp中的值是要引用的表
for i in range(len(table_dict_temp)):
cog_loss = table_temp.loc[table_temp[COLUMN OF COMPARISON]==table_dict_temp[COLUMN OF COMPARISON][i]]
for j in range(10, len(table_dict_temp.columns)):
cog_loss_temp = cog_loss[table_dict_temp.columns[j]].sum()
table_dict_temp.iloc[i,j]=cog_loss_temp我遇到的问题是,这似乎是一种非pythonic方式,而且也需要很多时间。如果有任何建议能让我写的函数更快,我将不胜感激!
发布于 2020-12-09 13:38:17
Excel示例数据:
https://support.microsoft.com/en-us/office/sumifs-function-c9e748f5-7ea7-455d-9406-611cebce642b
Quantity Sold Product Salesperson
5 Apples Tom
4 Apples Sarah
15 Artichokes Tom
3 Artichokes Sarah
22 Bananas Tom
12 Bananas Sarah
10 Carrots Tom
33 Carrots Sarah
Description
=SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
Adds the number of products that begin with A and were sold by Tom.
It uses the wildcard character * in Criteria1, "=A*" to look for matching product names in Criteria_range1 B2:B9,
and looks for the name "Tom" in Criteria_range2 C2:C9.
It then adds the numbers in Sum_range A2:A9 that meet both conditions.
The result is 20.
=SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
Adds the number of products that aren’t bananas and are sold by Tom.
It excludes bananas by using <> in the Criteria1, "<>Bananas",
and looks for the name "Tom" in Criteria_range2 C2:C9.
It then adds the numbers in Sum_range A2:A9 that meet both conditions.
The result is 30.pythonic解决方案:
import io
import pandas as pd
data_str = '''
Quantity Sold Product Salesperson
5 Apples Tom
4 Apples Sarah
15 Artichokes Tom
3 Artichokes Sarah
22 Bananas Tom
12 Bananas Sarah
10 Carrots Tom
33 Carrots Sarah
'''.strip()
df = pd.read_csv(io.StringIO(data_str), sep='\t')
# =SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
cond = True
cond &= df['Product'].str.startswith('A')
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()
# =SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
cond = True
cond &= df['Product'] != 'Bananas'
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()https://stackoverflow.com/questions/65210027
复制相似问题