问计算每个月的活动项目数，从熊猫DataFrame中每个项目的开始和结束日期计算得出
EN

Stack Overflow用户

提问于 2018-06-03 06:05:07

回答 4查看 252关注 0票数 3

假设我们有以下DataFrame，它详细描述了错误跟踪系统中的错误：

import pandas as pd

bugs = pd.DataFrame([
    {'key': 'ABC-1', 'priority': 'high', 'start': pd.Timestamp(2018, 1, 1), 'end': pd.Timestamp(2018,3,20)},
    {'key': 'ABC-2', 'priority': 'med',  'start': pd.Timestamp(2018, 1, 2), 'end': pd.Timestamp(2018,1,20)},
    {'key': 'ABC-3', 'priority': 'high', 'start': pd.Timestamp(2018, 2, 3), 'end': pd.Timestamp(2018,3,20)},
    {'key': 'ABC-4', 'priority': 'med',  'start': pd.Timestamp(2018, 1, 4), 'end': pd.Timestamp(2018,3,20)},
    {'key': 'ABC-5', 'priority': 'high', 'start': pd.Timestamp(2018, 2, 5), 'end': pd.Timestamp(2018,2,20)},
    {'key': 'ABC-6', 'priority': 'med',  'start': pd.Timestamp(2018, 3, 6), 'end': pd.Timestamp(2018,3,20)}
], columns=['key', 'priority', 'start', 'end'])

在这里，start和end表示首次发现错误的日期，以及关闭错误的日期。

我们如何计算每个月的“开放”bug的数量，并按优先级细分？也就是说，输出如下所示：

           High   Med
Month
January       1   2
February      3   1
March         2   2

挑战是同时考虑“开始”和“结束”日期。因此，在1月5日打开并在2月3日关闭的优先级为“高”的bug应该计入1月和2月的“高”优先级bug，而不是3月。诸若此类。

python

pandas

datetime

pandas-groupby

回答 4

Stack Overflow用户

发布于 2018-06-03 06:50:15

我在resample中使用stack

from pandas.tseries.offsets import MonthEnd

s=bugs.set_index(['key','priority']).stack() # faltten your dataframe , make start and end in the same row , since we do need a range of date
s=pd.to_datetime(s)+MonthEnd(1) # change the date to same scale , month end , since you need monthly data only 
s=s.reset_index().drop_duplicates(['key',0]) # if the start is same with end , we only need keep one of them. 

s=s.groupby('key').apply(lambda x : x.set_index(0).resample('M').ffill()).reset_index(level=1)    # groupby the key then we resample , adding the value between the start and end prepare for the frequency count  

pd.crosstab(s[0].dt.month,s['priority'])# count the frequency 
Out[149]: 
priority  high  med
0                  
1            1    2
2            3    1
3            2    2

票数 1

Stack Overflow用户

发布于 2018-06-03 07:04:45

简单而简短:)

这个想法是每月选择bug每月重叠的行。

months = ['January', 'February', 'March', 
          'April']  # of course  you can complete this list

bugs[months] = pd.concat([((bugs['start'].dt.month <= i) & 
                           (i <= bugs['end'].dt.month)).astype(int) 
                          for i in range(1, len(months) + 1)], axis=1)

bugs.groupby('priority')[months].sum()

结果：

          January  February  March  April
priority                                 
high            1         3      2      0
med             2         1      2      0

票数 1

Stack Overflow用户

发布于 2018-06-03 07:08:52

from pandas.tseries.offsets import MonthEnd

# Last day of the previous month
bugs['start1'] = bugs.start + MonthEnd(-1)

# The first days of the months that > start1 and <= end
bugs['months']  = bugs[['start1', 'end']].apply(lambda x: tuple(pd.date_range(x[0], x[1], freq='MS')), axis=1, raw=True)

# Create dummy columns
dummies = bugs.months.apply(lambda x: pd.Series({k:1 for k in x})).fillna(0)
bugs = bugs.join(dummies)

# Aggregation
bugs.groupby('priority')[dummies.columns].sum().T


priority    high  med
2018-01-01   1.0  2.0
2018-02-01   3.0  1.0
2018-03-01   2.0  2.0

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50661541

复制

相似问题

问计算每个月的活动项目数，从熊猫DataFrame中每个项目的开始和结束日期计算得出
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算每个月的活动项目数，从熊猫DataFrame中每个项目的开始和结束日期计算得出EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算每个月的活动项目数，从熊猫DataFrame中每个项目的开始和结束日期计算得出
EN