我有一个pandas数据帧,其中一列包含不同长度的列表。pandas中分解列表的解决方案都假设要分解的列表都是相同长度的。
这是我的df:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]这就是我要找的:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985有人有什么建议吗?
发布于 2017-08-26 00:19:59
与@coldspeed非常相似。我采取了几个不同的步骤。
s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985发布于 2017-08-25 23:59:36
一种可能是使用np.repeat和np.hstack
print(df)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]
x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)
out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985发布于 2017-08-26 00:54:20
好的,我会再发一次,以获取更多信息和其他天才解决方案,请查看link1和link2
df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3',1)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985此外,使用pd.wide_to_long,个人不建议造成过度杀伤力。
df1=df['Shared Codes'].apply(pd.Series)
df1.columns='sur'+df1.columns.astype(str)
df=pd.concat([df,df1],axis=1)
pd.wide_to_long(df,['sur'],['Dep','Exp','Fl-No'],'lol').reset_index().drop(['lol','Shared Codes'],axis=1).dropna()
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985https://stackoverflow.com/questions/45885143
复制相似问题