我有一个这样的数据帧:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9
200100001 23 1 2 4 4 1 5 5 5
200100002 21 1 12 3 1 55 7 7
200100003 12 3 3 6 3
200100004 4
200100005 6 5 3 9 3 5 6
200100005 23 4 4 2 4 3 6 5
我想知道每个人的旅行次数,所以我想创建一个新列,这样新表可能看起来像这样:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9 Chains
200100001 23 1 2 4 4 1 5 5 5 9
200100002 21 1 12 3 1 55 7 7 8
200100003 12 3 3 6 3 5
200100004 4 1
200100005 6 5 3 9 3 5 6 7
200100005 23 4 4 2 4 3 6 5 8
有什么可能的解决方案吗?如果有人能帮上忙,我将不胜感激!提前感谢!
发布于 2018-08-21 08:52:29
将所有空值替换为NaN
,然后使用sum(1)
逐行计算notnull
值
df['Chains'] = df.iloc[:,1:].replace('',np.nan).notnull().sum(1)
>>> df
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 \
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0
Trip9 Chains
0 5.0 9
1 NaN 8
2 NaN 5
3 NaN 1
4 NaN 7
5 NaN 8
发布于 2018-08-21 08:57:53
使用
df.ne('').sum(1)-1
Out[287]:
0 9
1 8
2 5
3 1
4 7
5 8
dtype: int64
如果是使用info
的NaN
df.iloc[:,1:].T.info()
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, Trip1 to Trip9
Data columns (total 6 columns):
0 9 non-null float64
1 8 non-null float64
2 5 non-null float64
3 1 non-null float64
4 7 non-null float64
5 8 non-null float64
dtypes: float64(6)
memory usage: 504.0+ bytes
发布于 2018-08-21 08:51:56
只需查找非空项,然后对各行求和:
df['Chains'] = df.notnull().sum(axis=1) - 1
我不得不减去一个来计算你的IndividualID
专栏。这是我得到的结果:
IndividualID Trip1 Trip2 Trip3 Trip4 Trip5 Trip6 Trip7 Trip8 Trip9 Chains
0 200100001 23 1.0 2.0 4.0 4.0 1.0 5.0 5.0 5.0 9
1 200100002 21 1.0 12.0 3.0 1.0 55.0 7.0 7.0 NaN 8
2 200100003 12 3.0 3.0 6.0 3.0 NaN NaN NaN NaN 5
3 200100004 4 NaN NaN NaN NaN NaN NaN NaN NaN 1
4 200100005 6 5.0 3.0 9.0 3.0 5.0 6.0 NaN NaN 7
5 200100005 23 4.0 4.0 2.0 4.0 3.0 6.0 5.0 NaN 8
https://stackoverflow.com/questions/51940179
复制相似问题