我有以下数据框架(包含更多列)
ID CAT ACTION D1
1 PRE opened 111
1 PRE phone 111
1 PRE opened 111
1 PRE phone 111
2 POS opened 222
2 POS navi 222
1 PRE opened 111
1 PRE phone 111
3 POS opened 111
3 POS navi 222
3 POS phone 222
3 POS navi 222
2 POS opened 222
2 POS phone 222
5 PRE opened 111
6 PRE opened 111
6 PRE opened 111
6 PRE phone 111
6 PRE navi 111
这个表有活动的注册表,我想为每个打开的操作标记操作,如下所示
ID CAT ACTION D1 flg
1 PRE opened 111 1
1 PRE phone 111 1
1 PRE opened 111 2
1 PRE phone 111 2
2 POS opened 222 3
2 POS navi 222 3
1 PRE opened 111 4
1 PRE phone 111 4
3 POS opened 111 5
3 POS navi 222 5
3 POS phone 222 5
3 POS navi 222 5
2 POS opened 222 6
2 POS phone 222 6
5 PRE opened 111 7
6 PRE opened 111 8
6 PRE opened 111 9
6 PRE phone 111 9
6 PRE navi 111 9
我使用这个代码,但它需要很长时间才能完成
i=0
for y in range(1, len(df)):
if(df.loc[y,'ACTION']=='opened'):
i+=1
df.loc[y,'new_column']=i
else:
df.loc[y,'new_column']=i
我的最终目标是拥有类似这样的东西,我假装使用pivot
flg ID CAT D1 Int ACTION
1 1 PRE 111 phone opened-phone
2 1 PRE 111 phone opened-phone
3 2 POS 222 navi opened-navi
4 1 PRE 111 phone opened-phone
5 3 POS 222 phone opened-navi-phone-navi
6 2 POS 222 phone opened-phone
7 5 PRE 111 opened opened
8 6 PRE 111 opened opened
9 6 PRE 111 phone opened-phone-navi
发布于 2018-06-13 06:45:57
您可以将cumsum
与Boolean一起使用
df['flag']=df.ACTION.eq('opened').cumsum()
df1=df.groupby(['flag','ID','CAT']).agg({'ACTION':['last',lambda x : x.str.cat(sep='-')],'D1':'last'})# you can rename it by df1.columns=['Your columns name']
df1
Out[358]:
ACTION D1
last <lambda> last
flag ID CAT
1 1 PRE phone opened-phone 111
2 1 PRE phone opened-phone 111
3 2 POS navi opened-navi 222
4 1 PRE phone opened-phone 111
5 3 POS navi opened-navi-phone-navi 222
6 2 POS phone opened-phone 222
7 5 PRE opened opened 111
8 6 PRE opened opened 111
9 6 PRE navi opened-phone-navi 111
更多信息
df.ACTION.eq('opened').cumsum()
Out[352]:
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
8 5
9 5
10 5
11 5
12 6
13 6
14 7
15 8
16 9
17 9
18 9
Name: ACTION, dtype: int32
https://stackoverflow.com/questions/50826832
复制相似问题