首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >迭代两只熊猫之间的日期范围以进行类别计数

迭代两只熊猫之间的日期范围以进行类别计数
EN

Stack Overflow用户
提问于 2015-12-23 10:45:30
回答 1查看 373关注 0票数 1

我有两只熊猫(df1和df2):

df1有12列,其中a1、a2、.、a9是空列。下面是df1的示例:

代码语言:javascript
运行
复制
Stock Start_Date          End_Date        a1 a2 a3 a4 .... a9
A   09-12-2015 20:04    10-12-2015 23:04                
B   09-12-2015 10:04    09-12-2015 20:14                
A   11-12-2015 00:22    11-12-2015 08:04                
C   08-12-2015 06:56    10-12-2015 20:54                

df2有4列。以下是一个样本:

代码语言:javascript
运行
复制
Stock   date_time     Opening   closing
A   09-12-2015 21:24    144.3   10
A   09-12-2015 21:27    225.51  24
B   09-12-2015 10:20    134.42  11
A   09-12-2015 20:04    231.22  17
B   09-12-2015 10:24    399.55  32
A   09-12-2015 20:04    246.77  21
B   09-12-2015 14:22    76.23   8
C   08-12-2015 09:44    232.22  15
C   09-12-2015 20:04    222.91  12
A   11-12-2015 02:06    93.21   7
B   09-12-2015 20:04    211.36  26
C   09-12-2015 20:04    111.21  8

现在,我希望输出是这样的,df1:

代码语言:javascript
运行
复制
Stock   Start_Date       End_Date          a1   a2  a3  a4 ....a9
A   09-12-2015 20:04    10-12-2015 23:04    0   2   2   0      0
B   09-12-2015 10:04    09-12-2015 20:14    1   1   2   0      0
A   11-12-2015 00:22    11-12-2015 08:04    1   0   0   0      0
C   08-12-2015 06:56    10-12-2015 20:54    0   0   0   1      0

也就是说,对于每个股票,Start_Date和End_Date组合的df1,结果应该有每个类别的日期-时间范围内从df2计数。

在最终输出中,a1 =计数(0-100)和闭锁(0-10),a2 =计数(101-200)和闭锁(11-20),a3 =计数(201-400)和闭锁(21-50),a4 =计数(0-100)和闭锁(11-20)等等。

我有这方面的R代码,但对于一个更大的数据,set.Can,任何人都会帮助我如何在python/大熊猫中做到这一点。任何帮助都将不胜感激!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-12-23 12:26:41

您可以尝试此解决方案,其中删除df1的空列,但它也适用于它们:

代码语言:javascript
运行
复制
#merge dataframes by Stock, select datetimes between start and end
df = df1.merge(df2,on='Stock', how='left')
df = df[(df.date_time >= df.Start_Date) & (df.date_time <= df.End_Date)]
#remove column date_time
df = df.drop(['date_time'], axis=1)
print df
#   Stock          Start_Date            End_Date  Opening  closing
#0      A 2015-09-12 20:04:00 2015-10-12 23:04:00   144.30       10
#1      A 2015-09-12 20:04:00 2015-10-12 23:04:00   225.51       24
#2      A 2015-09-12 20:04:00 2015-10-12 23:04:00   231.22       17
#3      A 2015-09-12 20:04:00 2015-10-12 23:04:00   246.77       21
#5      B 2015-09-12 10:04:00 2015-09-12 20:14:00   134.42       11
#6      B 2015-09-12 10:04:00 2015-09-12 20:14:00   399.55       32
#7      B 2015-09-12 10:04:00 2015-09-12 20:14:00    76.23        8
#8      B 2015-09-12 10:04:00 2015-09-12 20:14:00   211.36       26
#13     A 2015-11-12 00:22:00 2015-11-12 08:04:00    93.21        7
#14     C 2015-08-12 06:56:00 2015-10-12 20:54:00   232.22       15
#15     C 2015-08-12 06:56:00 2015-10-12 20:54:00   222.91       12
#16     C 2015-08-12 06:56:00 2015-10-12 20:54:00   111.21        8

#values to new columns by conditions - cast boolean to integers
df['a1'] = ((df.Opening.between(0,100)) & (df.closing.between(0,10))).astype(int)
df['a2'] = ((df.Opening.between(100,200)) & (df.closing.between(11,20))).astype(int)
#add other columns like a1 and a2
print df
#   Stock          Start_Date            End_Date  Opening  closing  a1  a2
#0      A 2015-09-12 20:04:00 2015-10-12 23:04:00   144.30       10   0   0
#1      A 2015-09-12 20:04:00 2015-10-12 23:04:00   225.51       24   0   0
#2      A 2015-09-12 20:04:00 2015-10-12 23:04:00   231.22       17   0   0
#3      A 2015-09-12 20:04:00 2015-10-12 23:04:00   246.77       21   0   0
#5      B 2015-09-12 10:04:00 2015-09-12 20:14:00   134.42       11   0   1
#6      B 2015-09-12 10:04:00 2015-09-12 20:14:00   399.55       32   0   0
#7      B 2015-09-12 10:04:00 2015-09-12 20:14:00    76.23        8   1   0
#8      B 2015-09-12 10:04:00 2015-09-12 20:14:00   211.36       26   0   0
#13     A 2015-11-12 00:22:00 2015-11-12 08:04:00    93.21        7   1   0
#14     C 2015-08-12 06:56:00 2015-10-12 20:54:00   232.22       15   0   0
#15     C 2015-08-12 06:56:00 2015-10-12 20:54:00   222.91       12   0   0
#16     C 2015-08-12 06:56:00 2015-10-12 20:54:00   111.21        8   0   0

#groupby and sum rows
df= df.groupby(['Stock', 'Start_Date', 'End_Date']).sum()
df = df.drop(['Opening', 'closing'], axis=1)
print df.reset_index()
#  Stock          Start_Date            End_Date  a1  a2
#0     A 2015-09-12 20:04:00 2015-10-12 23:04:00   0   0
#1     A 2015-11-12 00:22:00 2015-11-12 08:04:00   1   0
#2     B 2015-09-12 10:04:00 2015-09-12 20:14:00   1   1
#3     C 2015-08-12 06:56:00 2015-10-12 20:54:00   0   0
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34433886

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档