blocks|key|2854031|text|这不足为奇。在第二种情况下，您只计算列D的D。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2854032|多么?这正是groupby的工作方式。你|2854033|C和D上的切片|ordered-list-item|2854034|groupby+on+C|2854035|调用GroupBy.std|2854036|在步骤3中，您没有指定任何列，因此假定std是在不是石斑鱼的列上计算的。又名，D列。|2854037|至于你为什么看到C和0,+1..。这是因为您指定了as_index=False，因此插入了C列，其中包含来自原始dataFrame的值。在本例中是0,+1。|2854038|运行这个，它就会变得清晰。|2854039|df[['C','D']].groupby(['C']).std()

++++++++++D
C++++++++++
0++0.998201
1+++++++NaN|code-block|syntax|javascript|2854040|当您指定as_index=False时，您看到的索引将作为一个列插入。把这个和，|2854041|df[['C','D']].groupby(['C'])[['C',+'D']].std()

+++++C+++++++++D
C+++++++++++++++
0++0.0++0.998201
1++NaN+++++++NaN|2854042|这正是describe给出的内容，也是您要寻找的内容。|2854043|entityMap^0|J|1|L|1|0|6|7|0|0|1|2|1|0|0|7|B|1|0|2|B|0|J|3|13|1|0|8|1|A|4|P|E|19|1|21|4|0|0|0|4|E|0|0|3|8|0^^$0|@$1|2|3|4|5|6|7|17|8|@$9|18|A|19|B|C]|$9|1A|A|1B|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|1C|8|@$9|1D|A|1E|B|C]]|D|@]|E|$]]|$1|H|3|I|5|J|7|1F|8|@$9|1G|A|1H|B|C]|$9|1I|A|1J|B|C]]|D|@]|E|$]]|$1|K|3|L|5|J|7|1K|8|@$9|1L|A|1M|B|C]|$9|1N|A|1O|B|C]]|D|@]|E|$]]|$1|M|3|N|5|J|7|1P|8|@$9|1Q|A|1R|B|C]]|D|@]|E|$]]|$1|O|3|P|5|6|7|1S|8|@$9|1T|A|1U|B|C]|$9|1V|A|1W|B|C]]|D|@]|E|$]]|$1|Q|3|R|5|6|7|1X|8|@$9|1Y|A|1Z|B|C]|$9|20|A|21|B|C]|$9|22|A|23|B|C]|$9|24|A|25|B|C]|$9|26|A|27|B|C]]|D|@]|E|$]]|$1|S|3|T|5|6|7|28|8|@]|D|@]|E|$]]|$1|U|3|V|5|W|7|29|8|@]|D|@]|E|$X|Y]]|$1|Z|3|10|5|6|7|2A|8|@$9|2B|A|2C|B|C]]|D|@]|E|$]]|$1|11|3|12|5|W|7|2D|8|@]|D|@]|E|$X|Y]]|$1|13|3|14|5|6|7|2E|8|@$9|2F|A|2G|B|C]]|D|@]|E|$]]|$1|15|3|-4|5|6|7|2H|8|@]|D|@]|E|$]]]|16|$]]

It makes sense. In the second case, you only compute the <code>std</code> of column <code>D</code>.

How? That's just how the <code>groupby</code> works. You 

<ol>
<li>slice on <code>C</code> and <code>D</code></li>
<li><code>groupby</code> on <code>C</code></li>
<li>call <code>GroupBy.std</code></li>
</ol>

At step 3, you did not specify any column, so <code>std</code> was assumed to be computed on the column that was not the grouper... aka, column <code>D</code>.

As for why you see <code>C</code> with <code>0, 1</code>... that's because you specify <code>as_index=False</code>, so the <code>C</code> column is inserted with values coming in from the original dataFrame... which in this case is <code>0, 1</code>.

Run this and it'll become clear.

<pre><code>df[['C','D']].groupby(['C']).std()

 D
C 
0 0.998201
1 NaN
</code></pre>

When you specify <code>as_index=False</code>, the index you see above is inserted as a column. Contrast this with,

<pre><code>df[['C','D']].groupby(['C'])[['C', 'D']].std()

 C D
C 
0 0.0 0.998201
1 NaN NaN
</code></pre>

Which is exactly what <code>describe</code> gives, and what you're looking for.

blocks|key|187965|text|即使使用std()，您也将在每个组中得到C的零标准差。我只是在您的代码中添加了一个种子，以使其可复制。我不知道有什么问题-|type|unstyled|depth|inlineStyleRanges|entityRanges|data|187966|import+pandas+as+pd
import+numpy+as+np
import+random+as+rnd

np.random.seed=1987
df+=+pd.DataFrame({'A'+:+['foo',+'bar',+'foo',+'bar',
+++++'foo',+'bar',+'foo',+'foo'],
+++++'B'+:+['one',+'one',+'two',+'three',
+++++'two',+'two',+'one',+'three'],
+++++'C'+:+1*(np.random.randn(8)>0.5),
+++++'D'+:+np.random.randn(8)})
df

df[['C','D']].groupby(['C'],as_index=False).describe()|code-block|syntax|javascript|187967|​|187968|📷|atomic|offset|length|187969|187970|df[['C','D']].groupby(['C'],as_index=False).std()|187971|187972|187973|187974|为了进一步深入，如果您查看从DataFrame.describe继承的的源代码，|187975|def+describe_numeric_1d(series):
++++++++++++stat_index+=+(['count',+'mean',+'std',+'min']+%2B
++++++++++++++++++++++++++formatted_percentiles+%2B+['max'])
++++++++++++d+=+([series.count(),+series.mean(),+series.std(),+series.min()]+%2B
+++++++++++++++++[series.quantile(x)+for+x+in+percentiles]+%2B+[series.max()])
++++++++++++return+pd.Series(d,+index=stat_index,+name=series.name)|187976|上面的代码只显示了std()的结果|187977|entityMap|0|IMAGE|mutability|IMMUTABLE|imageUrl|https://developer.qcloudimg.com/http-save/yehe-900000/b644cec59402f057c6ff84f074f2d8fa.png|imageAlt|1|https://developer.qcloudimg.com/http-save/yehe-900000/3c985bf2c6e90374704e62ba8aa9f9d0.png^0|0|0|0|0|1|0|0|0|0|0|0|1|1|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|1A|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|1B|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|1C|8|@]|9|@]|A|$]]|$1|I|3|J|5|K|7|1D|8|@]|9|@$L|1E|M|1F|1|1G]]|A|$]]|$1|N|3|H|5|6|7|1H|8|@]|9|@]|A|$]]|$1|O|3|P|5|D|7|1I|8|@]|9|@]|A|$E|F]]|$1|Q|3|H|5|6|7|1J|8|@]|9|@]|A|$]]|$1|R|3|J|5|K|7|1K|8|@]|9|@$L|1L|M|1M|1|1N]]|A|$]]|$1|S|3|H|5|6|7|1O|8|@]|9|@]|A|$]]|$1|T|3|U|5|6|7|1P|8|@]|9|@]|A|$]]|$1|V|3|W|5|D|7|1Q|8|@]|9|@]|A|$E|F]]|$1|X|3|Y|5|6|7|1R|8|@]|9|@]|A|$]]|$1|Z|3|-4|5|6|7|1S|8|@]|9|@]|A|$]]]|10|$11|$5|12|13|14|A|$15|16|17|-4]]|18|$5|12|13|14|A|$15|19|17|-4]]]]

Even with the std(), you will get the zero standard deviation of C within each group. I just added a seed to your code to make it replicable. I am not sure what is the issue - 

<pre><code>import pandas as pd
import numpy as np
import random as rnd

np.random.seed=1987
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
 'foo', 'bar', 'foo', 'foo'],
 'B' : ['one', 'one', 'two', 'three',
 'two', 'two', 'one', 'three'],
 'C' : 1*(np.random.randn(8)&gt;0.5),
 'D' : np.random.randn(8)})
df

df[['C','D']].groupby(['C'],as_index=False).describe()
</code></pre>

<a href="https://i.stack.imgur.com/3GzOX.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/3GzOX.png" alt="enter image description here"></a>

<pre><code>df[['C','D']].groupby(['C'],as_index=False).std()
</code></pre>

<a href="https://i.stack.imgur.com/fr0p1.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/fr0p1.png" alt="enter image description here"></a>

To go further deep, if you look at the source code of describe for groupby which inherits from DataFrame.describe, 

<pre><code>def describe_numeric_1d(series):
 stat_index = (['count', 'mean', 'std', 'min'] +
 formatted_percentiles + ['max'])
 d = ([series.count(), series.mean(), series.std(), series.min()] +
 [series.quantile(x) for x in percentiles] + [series.max()])
 return pd.Series(d, index=stat_index, name=series.name)
</code></pre>

Above code shows that describe just shows the result of std() only

blocks|key|188034|text|我的朋友mukherjees和我用这个做了更多的试验，并认为std()确实有问题。您可以在下面的链接中看到"std()与.apply(np.std，ddof=1)“的显示方式。在注意到之后，我们还发现了以下相关的bug报告：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|188035|https://github.com/pandas-dev/pandas/issues/10355|offset|length|188036|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|1D|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|N|8|@]|9|@$D|O|E|P|1|Q]]|A|$]]|$1|F|3|-4|5|6|7|R|8|@]|9|@]|A|$]]]|G|$H|$5|I|J|K|A|$L|C]]]]

My friend mukherjees and I have done my more trials with this one and decided that there is really an issue with std(). You can see in the following link, how we show "std() is not the same as .apply(np.std, ddof=1). " After noticing, we also found the following related bug report:

<a href="https://github.com/pandas-dev/pandas/issues/10355" rel="nofollow noreferrer">https://github.com/pandas-dev/pandas/issues/10355</a>

Could this be a bug? When I used describe() or std() for a groupby object, I get different answers

<pre><code>import pandas as pd
import numpy as np
import random as rnd

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
 ...: 'foo', 'bar', 'foo', 'foo'],
 ...: 'B' : ['one', 'one', 'two', 'three',
 ...: 'two', 'two', 'one', 'three'],
 ...: 'C' : 1*(np.random.randn(8)&gt;0.5),
 ...: 'D' : np.random.randn(8)})
df.head()

df[['C','D']].groupby(['C'],as_index=False).describe()
# this line gives me the standard deviation of 'C' to be 0,0. Within each group value of C is constant, so that makes sense. 

df[['C','D']].groupby(['C'],as_index=False).std()
# This line gives me the standard deviation of 'C' to be 0,1. I think this is wrong
</code></pre>

std() groupby Pandas issue

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

，这会是个bug吗？当我对groupby对象使用describe()或std()时，我得到了不同的答案--import pandas as pdimport numpy as npimport random as rnddf = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'ba...

问std() Pandas发行
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问std() Pandas发行EN