文章/答案/技术大牛

发布

社区首页 >问答首页 >带有条件的pandas by列值

问带有条件的pandas by列值
EN

Stack Overflow用户

提问于 2020-09-28 02:11:01

回答 1查看 43关注 0票数 3

我有数据帧，我需要按“sev”列分组，条件是1&2，3，4&5，并找到它的计数。

有什么方法吗？

我已经厌倦了这一点，但它给了sev列中每个单独的值

df.groupby(['sev']).ids.agg('count').to_frame('count').reset_index()

pandas dataFrame -

df = pd.DataFrame({'ids': {0: 'D1791272223',  1: 'V25369085223',  2: 'V25117230523',  3: 'V25104327323',  4: 'V24862169823',  5: 'P3944221523',  6: 'V24776335823',  7: 'V24722584123',  8: 'V24716191923',  9: 'V24575876123',  10: 'V24791923'}, 'status': {0: 'Resolved',  1: 'Resolved',  2: 'Resolved',  3: 'Resolved',  4: 'Open',  5: 'Open',  6: 'Closed',  7: 'Resolved',  8: 'Resolved',  9: 'Open',  10: 'Resolved'}, 'action': {0: 'Comment',  1: 'Implementation',  2: 'Comment',  3: 'Implementation',  4: 'Comment',  5: 'Implementation',  6: 'Comment',  7: 'Comment',  8: 'Implementation',  9: 'Comment',  10: 'Implementation'}, 'sev': {0: 3, 1: 2, 2: 1, 3: 3, 4: 4, 5: 4, 6: 3, 7: 2, 8: 2, 9: 1, 10: 5}})

| ids          | status   | action         | sev |
|--------------|----------|----------------|-----|
| D1791272223  | Resolved | Comment        | 3   |
| V25369085223 | Resolved | Implementation | 2   |
| V25117230523 | Resolved | Comment        | 1   |
| V25104327323 | Resolved | Implementation | 3   |
| V24862169823 | Open     | Comment        | 4   |
| P3944221523  | Open     | Implementation | 4   |
| V24776335823 | Closed   | Comment        | 3   |
| V24722584123 | Resolved | Comment        | 2   |
| V24716191923 | Resolved | Implementation | 2   |
| V24575876123 | Open     | Comment        | 1   |
| V24791923    | Resolved | Implementation | 5   |

预期输出

| sev    | count | Open count | Closed and   Resolved count |
|--------|-------|------------|-----------------------------|
| 1&2    | 5     | 1          | 4                           |
| 3      | 3     | 0          | 3                           |
| 4&5    | 3     | 2          | 1                           |

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-28 02:44:59

主要问题是，您需要将严重性级别聚合到较少的类别中，这可以使用pd.cut完成，因为sev是数字，并且您希望按连续的时间间隔进行。如果它不是数字或间隔不连续(例如1&4、2、3和5)，您将需要具有映射字典的df.replace。

然后，可以使用df.pivot_table或使用groupby/unstack“手动”完成整形。我更喜欢groupby，因为它在其他情况下更灵活。

df['sev_group'] = pd.cut(df['sev'], bins=[0, 2, 3, 5],
    labels=['1&2', '3', '4&5'])


summary = df.groupby(['sev_group', 'status']).size().unstack()
# or
# summary = df.pivot_table(values='ids', index='sev_group',
    # columns='status', aggfunc='count', fill_value=0)


summary['count'] = summary.sum(axis=1)
summary['Closed/Resolved'] = summary['Closed'] + summary['Resolved']
summary = summary[['count', 'Open', 'Closed/Resolved']]

输出

status     count  Open  Closed/Resolved
sev_group
1&2            5     1                4
3              3     0                3
4&5            3     2                1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64091744

复制

相似问题

问带有条件的pandas by列值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有条件的pandas by列值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问带有条件的pandas by列值
EN