我有一个数据框架。我想为每个人创建一个唯一的ID号,并基于person和date(每周)创建一个列。
import pandas as pd
df = pd.DataFrame({ 'name':['one','one','two','two','two','three','four'],
'date':['2019-05-01','2019-05-08','2019-05-01','2019-05-08','2019-05-15','2019-05-01','2019-05-15'],
"a":range(7)})
df['date'] = pd.to_datetime(df['date'],yearfirst=True)
df = df.sort_values(['name','date'])
print(df)
以下是数据:
name date a
6 four 2019-05-15 6
0 one 2019-05-01 0
1 one 2019-05-08 1
5 three 2019-05-01 5
2 two 2019-05-01 2
3 two 2019-05-08 3
4 two 2019-05-15 4
预期结果为
name date a id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
如何获取"id“和"week"?谢谢!
发布于 2019-05-24 13:41:14
就像@cs95评论的那样,通过7
与numpy.ceil
一起使用带有分区天数的GroupBy.ngroup
df["Id"] = df.groupby("name").ngroup() + 1
df['week'] = np.ceil(df.date.dt.day / 7).astype(int)
print (df)
name date a Id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
或者:
df["Id"] = df.groupby("name").ngroup() + 1
df['week'] = df.groupby("date").ngroup() + 1
print (df)
name date a Id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
发布于 2019-05-24 13:45:58
我使用cumsum
获取df['id']
,在df.date
上使用groupby
获取df['week']
df['id'] = df.name.ne(df.name.shift()).cumsum()
df['week'] = df.date.groupby(df.date).ngroup() + 1
Out[408]:
name date a id week
6 four 2019-05-15 6 1 3
0 one 2019-05-01 0 2 1
1 one 2019-05-08 1 2 2
5 three 2019-05-01 5 3 1
2 two 2019-05-01 2 4 1
3 two 2019-05-08 3 4 2
4 two 2019-05-15 4 4 3
https://stackoverflow.com/questions/56286315
复制相似问题