嗨,我是新手,正在尝试了解如何在python中使用嵌套循环。我试图理解求和相同的值,并学习使用group_by函数(基于我今天在stackoverflow中看到的另一个问题)。我想学习pytonic dataframe方法。
现在,我想用下面的方法总结一下工作日。我根据情景对单位进行汇总,例如:情景= 1,公司= A,国家=美国,单位= HR+Corporate客户端,汇总工作时间= 65+63 = 128等等。在原始数据之后,我包括了输出应该是什么样子。我不确定这是否也适用于group_by,它更像是一种枢轴方式。
我从一个嵌套循环开始,但在索引日期时遇到了问题。因此,我的代码只在日期的基础上过滤,这虽然效率不高,但却很有效。我了解到嵌套循环对于数据帧来说是不够的,但我不确定我可以走哪条路。代码如下所示:
import pandas as pd
working_date_start = '2017-07-14'
working_date_end = '2017-07-15'
flag_scenario = 0
Scenario = 0
df = pd.read_csv('C:/Comapny_WorkingHours.csv', encoding='cp1252', sep=';', index_col=None).dropna()
df = df[(df['working_date'] >= working_date_start) & (df['working_date'] < working_date_end) & (df['flag'] == flag_scenario) & (df['Scenario'] >= Scenario)]
pd_date = pd.DatetimeIndex(df['working_date'].values)
df['working_date'] = pd_date
index_data = df.set_index('working_date')
for current_date in index_data.index.unique():
print('calculating date: ' +str(current_date))
for i in range(0, len(df)):
for j in range(i+1, len(df)):
if df.iloc[i]['Scenario'] == df.iloc[j]['Scenario'] and df.iloc[i]['Unit'] != df.iloc[j]['Unit'] and df.iloc[i]['Company'] == 'Company A' and df.iloc[j]['Company'] == 'Company A' and df.iloc[i]['Country'] == 'USA' and df.iloc[j]['Country'] == 'USA':
print(df.iloc[i]['Scenario'], df.iloc[j]['Scenario'])
print(df.iloc[i]['Unit'], df.iloc[j]['Unit'])
原始数据如下所示:
working_date flag Scenario Working Hours Country Company Unit
2017-07-14 0 1 65 USA Company A HR
2017-07-14 0 2 75 USA Company A HR
2017-07-14 0 3 73 USA Company A HR
2017-07-14 0 4 66 USA Company A HR
2017-07-14 0 1 63 USA Company A Corporate Client
2017-07-14 0 2 51 USA Company A Corporate Client
2017-07-14 0 3 60 USA Company A Corporate Client
2017-07-14 0 4 55 USA Company A Corporate Client
2017-07-14 0 1 71 USA Company A Controlling
2017-07-14 0 2 45 USA Company A Controlling
2017-07-14 0 3 76 USA Company A Controlling
2017-07-14 0 4 62 USA Company A Controlling
2017-07-14 0 1 57 USA Company A Corporate Center
2017-07-14 0 2 64 USA Company A Corporate Center
2017-07-14 0 3 68 USA Company A Corporate Center
2017-07-14 0 4 69 USA Company A Corporate Center
2017-07-14 0 1 54 USA Company B Private and Business Customers
2017-07-14 0 2 62 USA Company B private and business customers
2017-07-14 0 3 47 USA Company B private and business customers
2017-07-14 0 4 62 USA Company B private and business customers
2017-07-14 0 1 45 USA Company B Marketing
2017-07-14 0 2 78 USA Company B Marketing
2017-07-14 0 3 59 USA Company B Marketing
2017-07-14 0 4 78 USA Company B Marketing
2017-07-14 0 1 49 USA Company B IT
2017-07-14 0 2 74 USA Company B IT
2017-07-14 0 3 78 USA Company B IT
2017-07-14 0 4 55 USA Company B IT
2017-07-14 0 1 66 USA Company B Project Management
2017-07-14 0 2 76 USA Company B Project Management
2017-07-14 0 3 53 USA Company B Project Management
2017-07-14 0 4 58 USA Company B Project Management
2017-07-15 0 1 56 USA Company A HR
2017-07-15 0 2 54 USA Company A HR
2017-07-15 0 3 77 USA Company A HR
2017-07-15 0 4 58 USA Company A HR
2017-07-15 0 1 78 USA Company A Corporate Client
2017-07-15 0 2 76 USA Company A Corporate Client
2017-07-15 0 3 59 USA Company A Corporate Client
2017-07-15 0 4 56 USA Company A Corporate Client
2017-07-15 0 1 57 USA Company A Controlling
2017-07-15 0 2 54 USA Company A Controlling
2017-07-15 0 3 56 USA Company A Controlling
2017-07-15 0 4 74 USA Company A Controlling
2017-07-15 0 1 71 USA Company A Corporate Center
2017-07-15 0 2 75 USA Company A Corporate Center
2017-07-15 0 3 79 USA Company A Corporate Center
2017-07-15 0 4 78 USA Company A Corporate Center
2017-07-15 0 1 74 USA Company B Private and Business Customers
2017-07-15 0 2 72 USA Company B private and business customers
2017-07-15 0 3 66 USA Company B private and business customers
2017-07-15 0 4 66 USA Company B private and business customers
2017-07-15 0 1 69 USA Company B Marketing
2017-07-15 0 2 69 USA Company B Marketing
2017-07-15 0 3 63 USA Company B Marketing
2017-07-15 0 4 59 USA Company B Marketing
2017-07-15 0 1 57 USA Company B IT
2017-07-15 0 2 67 USA Company B IT
2017-07-15 0 3 77 USA Company B IT
2017-07-15 0 4 60 USA Company B IT
2017-07-15 0 1 55 USA Company B Project Management
2017-07-15 0 2 57 USA Company B Project Management
2017-07-15 0 3 80 USA Company B Project Management
2017-07-15 0 4 59 USA Company B Project Management
我想要的输出如下所示:
working_date Scenario Units Working Hours Summed Up
2017-07-14 1 HR_Corporate Client 128
2017-07-14 1 HR_Controlling 136
2017-07-14 1 HR_Corporate Center 122
2017-07-14 2 HR_Corporate Client 126
2017-07-14 2 HR_Controlling 120
2017-07-14 2 HR_Corporate Center 139
2017-07-14 3 HR_Corporate Client 133
2017-07-14 3 HR_Controlling 149
2017-07-14 3 HR_Corporate Center 141
2017-07-14 4 HR_Corporate Client 121
2017-07-14 4 HR_Controlling 128
2017-07-14 4 HR_Corporate Center 135
2017-07-14 1 Corporate Client_Controlling 134
2017-07-14 1 Corporate Client_Corporate Center 120
2017-07-14 2 Corporate Client_Controlling 96
2017-07-14 2 Corporate Client_Corporate Center 115
2017-07-14 3 Corporate Client_Controlling 136
2017-07-14 3 Corporate Client_Corporate Center 128
2017-07-14 4 Corporate Client_Controlling 117
2017-07-14 4 Corporate Client_Corporate Center 124
2017-07-14 1 Controlling_Corporate Center 128
2017-07-14 2 Controlling_Corporate Center 109
2017-07-14 3 Controlling_Corporate Center 144
2017-07-14 4 Controlling_Corporate Center 131
发布于 2018-09-04 00:29:55
import pandas as pd
df = pd.read_csv('C:/Comapny_WorkingHours.csv', encoding='cp1252', sep=';', index_col=None).dropna()
df = df.reset_index(drop=False)
# this will give you the unique combinations of two units
from itertools import combinations
scenario_list = df['Scenario'].unique().tolist()
# this creates a dict containing the scene and corresponidng units combos
combos_dict = {}
for scene in scenario_list:
units_list = df[df['Scenario'] == scene]['Unit'].unique().tolist()
combos_dict[scene] = list(combinations(units_list, 2))
new_df = pd.DataFrame()
for key in combos_dict.keys():
# filters the dataframe by the scenario matched in the combo_dict
filter_df = df[df['Scenario'] == key]
for combo in combos_dict[key]:
# itterates through the combo_dict values to create a sub_filter
# that is used to create a new final dataframe
sub_filter = filter_df[(filter_df['Unit'] == combo[0]) |
(filter_df['Unit'] == combo[1])]
sub_df = pd.DataFrame(data=[[sub_filter['working_date'].iloc[0],
key,
'{}_{}'.format(combo[0], combo[1]),
sum(sub_filter['Working Hours'])]],
columns=['working_date',
'Scenario',
'Units',
'Working Hours Summed Up'])
# creates a new dataframe with the desired output
new_df = new_df.append(sub_df)
https://stackoverflow.com/questions/52153390
复制相似问题