首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何在Pandas中使用Groupby在聚合后解栈

如何在Pandas中使用Groupby在聚合后解栈
EN

Stack Overflow用户
提问于 2020-10-24 06:36:51
回答 1查看 50关注 0票数 0

你好,数据科学家和熊猫专家们,

我需要一些帮助来弄清楚如何在应用groupby聚合方法后更好地组织我的数据。我尝试过对新的数据帧进行拆分,但没有产生预期的结果。

这是我的数据框:

代码语言:javascript
复制
df = [{'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's2ad3', 'Level': 2, 'duties': 'C'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's3ad1', 'Level': 2, 'duties': 'C'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
 {'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2ac1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's2cd3', 'Level': 3, 'duties': 'C'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's3cd1', 'Level': 3, 'duties': 'C'},\
 {'Store': 's3', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3cd1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's3', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd3', 'Level': 3, 'duties': 'C'},\
 {'Store': 's3', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's3', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'd','Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\
 {'Store': 's3', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd1', 'Level': 3, 'duties': 'C'}]

我想组织我的输出,这样它就可以很好地堆叠在Store --> Department --> Employee中。如下所示(抱歉,输出行不是很好):

代码语言:javascript
复制
Store           s1                      s2                                     s3
Department      d1          d2          d1          d3          d1          d3
Employee    ID   Level duties   first shift   last shift #ofshift   first shift   last shift #ofshift   first shift   last shift #ofshift   first shift   last shift #ofshift   first shift   last shift #ofshift   first shift   last shift #ofshift
a   s1ad1 2 O   2020-08-01 2020-08-04 3 
a   s1ad2 2 C               2020-08-02 2020-08-05 2
a   s2ad1 2 O                           2020-08-08 2020-08-11 3
a   s2ad3 2 O                                       2020-08-09 2020-08-12 2
b   s1bd1 1 O   2020-08-01 2020-08-04 3 
b   s1bd2 1 C               2020-08-02 2020-08-05 2
c   s2cd1 3 O                           2020-08-08 2020-08-11 3
c   s3ad3 3 O                                       2020-08-09 2020-08-12 2
d   s3dd1 3 O                                                   2020-08-08 2020-08-11 3
d   s3dd3 3 O                                                               2020-08-09 2020-08-12 2

因此,我尝试使用Group by表达式:

代码语言:javascript
复制
df = df.groupby(['Employee', 'Store', 'Department'])\
                .agg({'Date':['first', 'last', 'size'],
                      'ID': 'first',
                      'Level': 'first',
                      'duties': 'first'})

# Join the Each Column with its operation.
df.columns = df.columns.map('_'.join)     

# Reset the Index
df = df.reset_index().set_index('Employee')

# Renaming Columns of Dataframe.
df.rename(columns={'Date_first':'First Shift',
        'Date_last':'Last Shift',
        'Date_size':'# of shift',
        'ID_first':'ID',
        'Level_first':'Level',
        'duties_first':'duties'},
        inplace=True)

这将打印以下结果:

代码语言:javascript
复制
            Store  Department  First Shift       Last Shift      # of shift    ID_first     Level    duties
Employee
a               s1         d1                    2020-08-01    2020-08-04           3          s1ad1             2      O
a               s1         d2                    2020-08-02   2020-08-05           2           s1ad2            2      C
a               s2         d1                    2020-08-08   2020-08-11            3           s2ad1            2      O
a               s2         d3                   2020-08-09    2020-08-12           2           s2ad3           2      C
b               s1         d1                    2020-08-01     2020-08-04           3          s1bd1            1      O
b               s1         d2                   2020-08-02     2020-08-05           2          s1bd2           1      C
c               s2         d1                   2020-08-08     2020-08-11            3           s2ac1           3      O
c               s2         d3                  2020-08-09     2020-08-12           2           s2cd3          3      C
d               s3         d1                   2020-08-08    2020-08-11           3           s3cd1           3      O
d               s3         d3                  2020-08-09     2020-08-12           2         s3dd3           3      C

然后我应用了unstack表达式,如下所示:

代码语言:javascript
复制
df = df.groupby(['Employee', 'Store', 'Department', 'First Shift', 'Last Shift', '# of shift', 'ID_first',\
                     'Level', 'duties'])\
            .size()\
            .unstack(['Store', 'Department']).fillna(0)

它按如下方式输出结果:

代码语言:javascript
复制
Store                                                              s1        s2        s3     
Department                                                         d1   d2   d1   d3   d1   d3
Employee First Shift Last Shift # of shift ID_first Level duties
a        2020-08-01  2020-08-04 3          s1ad1    2     O       1.0  0.0  0.0  0.0  0.0  0.0
         2020-08-02  2020-08-05 2          s1ad2    2     C       0.0  1.0  0.0  0.0  0.0  0.0
         2020-08-08  2020-08-11 3          s2ad1    2     O       0.0  0.0  1.0  0.0  0.0  0.0
         2020-08-09  2020-08-12 2          s2ad3    2     C       0.0  0.0  0.0  1.0  0.0  0.0
b        2020-08-01  2020-08-04 3          s1bd1    1     O       1.0  0.0  0.0  0.0  0.0  0.0
         2020-08-02  2020-08-05 2          s1bd2    1     C       0.0  1.0  0.0  0.0  0.0  0.0
c        2020-08-08  2020-08-11 3          s2ac1    3     O       0.0  0.0  1.0  0.0  0.0  0.0
         2020-08-09  2020-08-12 2          s2cd3    3     C       0.0  0.0  0.0  1.0  0.0  0.0
d        2020-08-08  2020-08-11 3          s3cd1    3     O       0.0  0.0  0.0  0.0  1.0  0.0
         2020-08-09  2020-08-12 2          s3dd3    3     C       0.0  0.0  0.0  0.0  0.0  1.0

我想我错误地使用了Size和unstack。然而,我似乎想不出如何重新组织数据。

我非常感谢专家对如何正确组织我的数据的意见。

再次感谢您的帮助和体贴。

谢谢。

EN

回答 1

Stack Overflow用户

发布于 2020-10-24 09:01:06

问题的症结在于,您需要在使用.unstack()之前重新构造数据,因为您需要的格式是一个矩阵,其值是三个重复的列。因此,您需要将数据帧从wide更改为long,并在一个列Values中创建一个包含这三个值的新列,另一个列将它们分类为Shift

代码语言:javascript
复制
# Step 1: Named Groupby Agregation naming columns ins specific format required for `pd.wide_to_long`. Must end with integer.
df = (df.groupby(['Employee', 'Store', 'Department'])
        .agg(Shift_1=('Date','first'),
             Shift_2=('Date','last'),
             Shift_3=('Date','size'),
             ID=('ID', 'first'),
             Level=('Level', 'first'),
             duties=('duties', 'first'))
        .reset_index())


# Step 2: In preparation for a matrix The data must be transformed so that the three columns that are values in the matrix must be in long format.
df = pd.wide_to_long(df, stubnames='Shift_', i='ID', j='Shift').reset_index().rename(columns={'Shift_':'Values'})

# Step 3: 1,2,3 were required integer suffixes for wide_to_long but now let's change to what we want the columns to be called.
df['Shift'] = df['Shift'].replace([1,2,3],['First Shift','Last Shift','# of shift'])

# Step 4: Create the matrix be setting index and unstacking to columns
df = (df.sort_values(['Employee', 'Store', 'Department']) #values must be sorted in order for how we want columns to appear in matrix format
        .set_index(['Employee', 'ID', 'Level', 'duties', 'Store', 'Department', 'Shift'])
        .unstack(['Store', 'Department', 'Shift']).fillna(0)) 

# Step 5: Cleanup of Multi-index into desred format
df.columns = df.columns.reorder_levels([1,2,3,0]).droplevel(3)
df = df.reset_index()
df
Out[1]: 
Store      Employee     ID Level duties                   s1  \
Department                                                d1   
Shift                                            First Shift   
0                 a  s1ad1     2      O  2020-08-01 00:00:00   
1                 a  s1ad2     2      C                    0   
2                 a  s2ad1     2      O                    0   
3                 a  s2ad3     2      C                    0   
4                 b  s1bd1     1      O  2020-08-01 00:00:00   
5                 b  s1bd2     1      C                    0   
6                 c  s2ac1     3      O                    0   
7                 c  s2cd3     3      C                    0   
8                 d  s3cd1     3      O                    0   
9                 d  s3dd3     3      C                    0   

Store                                                            \
Department                                                   d2   
Shift                Last Shift # of shift          First Shift   
0           2020-08-04 00:00:00          3                    0   
1                             0          0  2020-08-02 00:00:00   
2                             0          0                    0   
3                             0          0                    0   
4           2020-08-04 00:00:00          3                    0   
5                             0          0  2020-08-02 00:00:00   
6                             0          0                    0   
7                             0          0                    0   
8                             0          0                    0   
9                             0          0                    0   

Store                                       ...         s2  \
Department                                  ...         d1   
Shift                Last Shift # of shift  ... # of shift   
0                             0          0  ...          0   
1           2020-08-05 00:00:00          2  ...          0   
2                             0          0  ...          3   
3                             0          0  ...          0   
4                             0          0  ...          0   
5           2020-08-05 00:00:00          2  ...          0   
6                             0          0  ...          3   
7                             0          0  ...          0   
8                             0          0  ...          0   
9                             0          0  ...          0   

Store                                                            \
Department                   d3                                   
Shift               First Shift           Last Shift # of shift   
0                             0                    0          0   
1                             0                    0          0   
2                             0                    0          0   
3           2020-08-09 00:00:00  2020-08-12 00:00:00          2   
4                             0                    0          0   
5                             0                    0          0   
6                             0                    0          0   
7           2020-08-09 00:00:00  2020-08-12 00:00:00          2   
8                             0                    0          0   
9                             0                    0          0   

Store                        s3                                  \
Department                   d1                                   
Shift               First Shift           Last Shift # of shift   
0                             0                    0          0   
1                             0                    0          0   
2                             0                    0          0   
3                             0                    0          0   
4                             0                    0          0   
5                             0                    0          0   
6                             0                    0          0   
7                             0                    0          0   
8           2020-08-08 00:00:00  2020-08-11 00:00:00          3   
9                             0                    0          0   

Store                                                            
Department                   d3                                  
Shift               First Shift           Last Shift # of shift  
0                             0                    0          0  
1                             0                    0          0  
2                             0                    0          0  
3                             0                    0          0  
4                             0                    0          0  
5                             0                    0          0  
6                             0                    0          0  
7                             0                    0          0  
8                             0                    0          0  
9           2020-08-09 00:00:00  2020-08-12 00:00:00          2  

[10 rows x 22 columns]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64508095

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档