,这会是个bug吗?当我对groupby对象使用describe()或std()时,我得到了不同的答案--
import pandas as pd
import numpy as np
import random as rnd
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
...: 'foo', 'bar', 'foo', 'foo'],
...: 'B' : ['one', 'one', 'two', 'three',
...: 'two', 'two', 'one', 'three'],
...: 'C' : 1*(np.random.randn(8)>0.5),
...: 'D' : np.random.randn(8)})
df.head()
df[['C','D']].groupby(['C'],as_index=False).describe()
# this line gives me the standard deviation of 'C' to be 0,0. Within each group value of C is constant, so that makes sense.
df[['C','D']].groupby(['C'],as_index=False).std()
# This line gives me the standard deviation of 'C' to be 0,1. I think this is wrong发布于 2018-04-12 18:07:38
我的朋友mukherjees和我用这个做了更多的试验,并认为std()确实有问题。您可以在下面的链接中看到"std()与.apply(np.std,ddof=1)“的显示方式。在注意到之后,我们还发现了以下相关的bug报告:
https://stackoverflow.com/questions/49420444
复制相似问题