我有df
orgs feature1 feature2 feature3
0 org1 True True NaN
1 org1 NaN True NaN
2 org2 NaN True True
3 org3 True True NaN
4 org4 True True True
5 org4 True True True 现在,我想数一数每个特性不同的orgs的数量。基本上是为了像这样拥有一个df_Result:
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2 有人知道怎么做吗?
发布于 2016-10-13 08:47:59
您可以将sum添加到以前的solution中
df1 = df.groupby('orgs')
.apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())).sum().reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2aggregate Series.nunique的另一种解决方案
df1 = df.groupby('orgs')
.agg(lambda x: pd.Series.nunique(x))
.sum()
.astype(int)
.reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2使用stack的解决方案可以工作,但返回警告:
C:\Anaconda3\lib\site-packages\pandas\core\groupby.py:2937: FutureWarning: numpy not_equal将来不会检查对象标识。比较没有返回标识(
is)所建议的相同结果,并且会发生变化。公司= np.r_[1,val1:!= val:-1]
df1 = df.set_index('orgs').stack(dropna=False)
df1 = df1.groupby(level=[0,1]).nunique().unstack().sum().reset_index()
df1.columns = ['features','count_distinct_orgs']
print (df1)
features count_distinct_orgs
0 feature1 3
1 feature2 4
2 feature3 2https://stackoverflow.com/questions/40016097
复制相似问题