因此,我有一组50个日期,我在这里指定了7个日期,例如
df["CreatedDate"] = pd.DataFrame('09-08-16 0:00','22-08-16 0:00','23-08-16 0:00',28-08-16 0:00,'29-08-16 0:00','30-08-16 0:00','31-08-16 0:00')
df["CreatedDate"] = pd.to_datetime(df4.CreatedDate)
df4["DAY"] = df4.CreatedDate.dt.day
如何找到形成条纹范围1-3,4-7,8-15,>=16的连续日?
Streak Count
1-3 3 #(9),(22,23) are in range [1-3]
4-7 1 #(28,29,30,31) are in range [4-7]
8-15 0
>=16 0
假设产品( pen )已经推出两年了,我们从今天开始使用数据集,我想要发现的是,如果人们连续购买这支笔1、2或3天,如果是的,将计数1-3放置,如果他们连续购买4、5、6或7天,我们会将计数放在4- 7,等等。
我不知道该指定哪一个条件与标准相匹配
发布于 2018-06-01 09:42:08
这里有一个尝试,binning和@jezrael是一样的(除了最后一个我不确定应该限制在31
.是否有一种使用pd.cut
的开放间隔的方法?)
import pandas as pd
df = pd.DataFrame({ "CreatedDate": ['09-08-16 0:00','22-08-16 0:00','23-08-16 0:00','28-08-16 0:00','29-08-16 0:00','30-08-16 0:00','31-08-16 0:00']})
df["CreatedDate"] = pd.to_datetime(df.CreatedDate)
# sort by date
df = df.sort_values("CreatedDate")
# group consecutive dates
oneday = pd.Timedelta("1 day")
df["groups"] = (df.diff() > oneday).cumsum()
counts = df.groupby("groups").count()["CreatedDate"]
# bin
streaks = (pd.cut(counts, bins=[0,3,7,15,1000000], labels=['1-3', '4-7','8-15', '>=16'])
.value_counts()
.rename_axis("streak")
.reset_index(name="count"))
print(streaks)
streak count
0 1-3 2
1 4-7 1
2 >=16 0
3 8-15 0
发布于 2018-06-01 09:23:38
我认为需要:
df4 = pd.DataFrame({'CreatedDate':['09-08-16 0:00','22-08-16 0:00','23-08-16 0:00','28-08-16 0:00','29-08-16 0:00','30-08-16 0:00','31-08-16 0:00']})
df4["CreatedDate"] = pd.to_datetime(df4.CreatedDate)
df4 = df4.sort_values("CreatedDate")
count = df4.groupby((df4["CreatedDate"].diff().dt.days > 1).cumsum()).size()
print (count)
CreatedDate
0 2
1 4
2 1
dtype: int64
a = (pd.cut(count, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
.value_counts()
.sort_index()
.rename_axis('Streak')
.reset_index(name='Count'))
print (a)
Streak Count
0 1-3 2
1 4-7 1
2 8-15 0
3 >=16 0
https://stackoverflow.com/questions/50637603
复制相似问题