blocks|key|2523439|text|你是说像这样的东西吗？|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2523440|>>>+df2.pivot_table(values='X',+index='Y',+columns='Z',+aggfunc=lambda+x:+len(x.unique()))

Z+++Z1++Z2++Z3
Y+++++++++++++
Y1+++1+++1+NaN
Y2+NaN+NaN+++1|code-block|syntax|javascript|2523441|请注意，使用len假设您的DataFrame中没有NA。否则，您可以使用x.value_counts().count()或len(x.dropna().unique())。|offset|length|style|CODE|2523442|entityMap^0|0|0|6|3|P|2|10|O|1P|O|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@$I|R|J|S|K|L]|$I|T|J|U|K|L]|$I|V|J|W|K|L]|$I|X|J|Y|K|L]]|9|@]|A|$]]|$1|M|3|-4|5|6|7|Z|8|@]|9|@]|A|$]]]|N|$]]

Do you mean something like this?
<pre><code>&gt;&gt;&gt; df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=lambda x: len(x.unique()))

Z Z1 Z2 Z3
Y 
Y1 1 1 NaN
Y2 NaN NaN 1
</code></pre>
Note that using <code>len</code> assumes you don't have <code>NA</code>s in your DataFrame. You can do <code>x.value_counts().count()</code> or <code>len(x.dropna().unique())</code> otherwise.

blocks|key|2732965|text|这是对.pivot_table中的条目进行计数的好方法|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2732966|>>>+df2.pivot_table(values='X',+index=['Y','Z'],+columns='X',+aggfunc='count')

++++++++X1++X2
Y+++Z+++++++
Y1++Z1+++1+++1
++++Z2+++1++NaN
Y2++Z3+++1++NaN|code-block|syntax|javascript|2732967|entityMap^0|3|C|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

This is a good way of counting entries within <code>.pivot_table</code>:
<pre><code>&gt;&gt;&gt; df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')

 X1 X2
Y Z 
Y1 Z1 1 1
 Z2 1 NaN
Y2 Z3 1 NaN
</code></pre>

blocks|key|2733016|text|由于pandas的最低版本为0.16，因此它不接受参数“row”|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2733017|从0.23开始，解决方案是：|2733018|df2.pivot_table(values='X',+index='Y',+columns='Z',+aggfunc=pd.Series.nunique)|code-block|syntax|javascript|2733019|它返回：|2733020|Z++++Z1+++Z2+++Z3
Y++++++++++++++++
Y1++1.0++1.0++NaN
Y2++NaN++NaN++1.0|2733021|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|P|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|Q|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|R|8|@]|9|@]|A|$]]|$1|K|3|L|5|F|7|S|8|@]|9|@]|A|$G|H]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

Since at least version 0.16 of pandas, it does not take the parameter "rows"

As of 0.23, the solution would be:

<pre><code>df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)
</code></pre>

which returns:

<pre><code>Z Z1 Z2 Z3
Y 
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
</code></pre>

blocks|key|46182|text|aggfunc=pd.Series.nunique提供distinct计数。完整代码如下：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|46183|df2.pivot_table(values='X',+rows='Y',+cols='Z',+aggfunc=pd.Series.nunique)|code-block|syntax|javascript|46184|将此解决方案归功于@hume+(请参阅公认答案下的注释)。在这里添加作为答案是为了更好地发现。|46185|entityMap^0|0|P|0|0|0^^$0|@$1|2|3|4|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|S|8|@]|D|@]|E|$]]|$1|M|3|-4|5|6|7|T|8|@]|D|@]|E|$]]]|N|$]]

<code>aggfunc=pd.Series.nunique</code> provides distinct count. Full code is following:
<pre><code>df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)
</code></pre>
Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.

blocks|key|46139|text|在|type|unstyled|depth|inlineStyleRanges|entityRanges|data|46140|pandas.Series.nunique或pandas.core.groupby.DataFrameGroupBy.nunique|unordered-list-item|offset|length|style|CODE|46141|中，|46142|+pandas.DataFrame.pivot_table中的aggfunc参数将把'nunique'作为string使用|46143|46144|46145|Tested+pandas+1.3.1|BOLD|46146|中的|46147|out+=+df2.pivot_table(values='X',+index='Y',+columns='Z',+aggfunc=['nunique',+'count',+lambda+x:+len(x.unique()),+len])

[out]:
+++++++++++++nunique+++++++++++count+++++++++++<lambda>++++++++++++len++++++++++
Z+++++++Z1+++Z2+++Z3++++Z1+++Z2+++Z3+++++++Z1+++Z2+++Z3+++Z1+++Z2+++Z3
Y+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Y1+++++1.0++1.0++NaN+++2.0++1.0++NaN++++++1.0++1.0++NaN++2.0++1.0++NaN
Y2+++++NaN++NaN++1.0+++NaN++NaN++1.0++++++NaN++NaN++1.0++NaN++NaN++1.0


out+=+df2.pivot_table(values='X',+index='Y',+columns='Z',+aggfunc='nunique')

[out]:
Z++++Z1+++Z2+++Z3
Y++++++++++++++++
Y1++1.0++1.0++NaN
Y2++NaN++NaN++1.0

out+=+df2.pivot_table(values='X',+index='Y',+columns='Z',+aggfunc=['nunique'])

[out]:
+++++++++++++nunique++++++++++
Z+++++++Z1+++Z2+++Z3
Y+++++++++++++++++++
Y1+++++1.0++1.0++NaN
Y2+++++NaN++NaN++1.0|code-block|syntax|javascript|46148|entityMap|0|LINK|mutability|MUTABLE|url|https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html|1|https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.nunique.html|2|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html^0|0|0|L|M|18|0|L|0|M|18|1|0|0|1|S|V|7|16|9|1H|6|1|S|2|0|0|0|0|J|7|C|0|0|2|0|0^^$0|@$1|2|3|4|5|6|7|1A|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|1B|8|@$E|1C|F|1D|G|H]|$E|1E|F|1F|G|H]]|9|@$E|1G|F|1H|1|1I]|$E|1J|F|1K|1|1L]]|A|$]]|$1|I|3|J|5|6|7|1M|8|@]|9|@]|A|$]]|$1|K|3|L|5|D|7|1N|8|@$E|1O|F|1P|G|H]|$E|1Q|F|1R|G|H]|$E|1S|F|1T|G|H]|$E|1U|F|1V|G|H]]|9|@$E|1W|F|1X|1|1Y]]|A|$]]|$1|M|3|-4|5|6|7|1Z|8|@]|9|@]|A|$]]|$1|N|3|-4|5|6|7|20|8|@]|9|@]|A|$]]|$1|O|3|P|5|D|7|21|8|@$E|22|F|23|G|Q]|$E|24|F|25|G|H]]|9|@]|A|$]]|$1|R|3|S|5|6|7|26|8|@$E|27|F|28|G|Q]]|9|@]|A|$]]|$1|T|3|U|5|V|7|29|8|@]|9|@]|A|$W|X]]|$1|Y|3|-4|5|6|7|2A|8|@]|9|@]|A|$]]]|Z|$10|$5|11|12|13|A|$14|15]]|16|$5|11|12|13|A|$14|17]]|18|$5|11|12|13|A|$14|19]]]]

<ul>
<li>The <code>aggfunc</code> parameter in <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html" rel="nofollow noreferrer"><code>pandas.DataFrame.pivot_table</code></a> will take <code>'nunique'</code> as a <code>string</code>, or in a <code>list</code>
<ul>
<li><a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html" rel="nofollow noreferrer"><code>pandas.Series.nunique</code></a> or <a href="https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.nunique.html" rel="nofollow noreferrer"><code>pandas.core.groupby.DataFrameGroupBy.nunique</code></a></li>
</ul>
</li>
<li>Tested in <code>pandas 1.3.1</code></li>
</ul>
<pre class="lang-py prettyprint-override"><code>out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])

[out]:
 nunique count &lt;lambda&gt; len 
Z Z1 Z2 Z3 Z1 Z2 Z3 Z1 Z2 Z3 Z1 Z2 Z3
Y 
Y1 1.0 1.0 NaN 2.0 1.0 NaN 1.0 1.0 NaN 2.0 1.0 NaN
Y2 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0 NaN NaN 1.0


out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')

[out]:
Z Z1 Z2 Z3
Y 
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])

[out]:
 nunique 
Z Z1 Z2 Z3
Y 
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
</code></pre>

blocks|key|46079|text|您可以为每个不同的X值构造一个数据透视表。在这种情况下，|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|46080|for+xval,+xgroup+in+g:
++++ptable+=+pd.pivot_table(xgroup,+rows='Y',+cols='Z',+
++++++++margins=False,+aggfunc=numpy.size)|code-block|syntax|javascript|46081|将为X的每个值构造一个数据透视表。您可能希望使用xvalue对ptable进行索引。使用这段代码，我得到(对于X1)|46082|+++++X++++++++
Z+++Z1++Z2++Z3
Y+++++++++++++
Y1+++2+++1+NaN
Y2+NaN+NaN+++1|46083|entityMap^0|9|1|0|0|2|1|O|6|V|6|1J|2|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|U|8|@$9|V|A|W|B|C]|$9|X|A|Y|B|C]|$9|Z|A|10|B|C]|$9|11|A|12|B|C]]|D|@]|E|$]]|$1|M|3|N|5|H|7|13|8|@]|D|@]|E|$I|J]]|$1|O|3|-4|5|6|7|14|8|@]|D|@]|E|$]]]|P|$]]

You can construct a pivot table for each distinct value of <code>X</code>. In this case, 

<pre><code>for xval, xgroup in g:
 ptable = pd.pivot_table(xgroup, rows='Y', cols='Z', 
 margins=False, aggfunc=numpy.size)
</code></pre>

will construct a pivot table for each value of <code>X</code>. You may want to index <code>ptable</code> using the <code>xvalue</code>. With this code, I get (for <code>X1</code>)

<pre><code> X 
Z Z1 Z2 Z3
Y 
Y1 2 1 NaN
Y2 NaN NaN 1
</code></pre>

blocks|key|46870|text|由于Pandas的最新版本没有最新的答案，我正在为这个问题写另一个解决方案：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|46871|import+pandas+as+pd

#+Set+example
df2+=+(
++++pd.DataFrame({
++++++++'X'+:+['X1',+'X1',+'X1',+'X1'],+
++++++++'Y'+:+['Y2',+'Y1',+'Y1',+'Y1'],+
++++++++'Z'+:+['Z3',+'Z1',+'Z1',+'Z2']
++++})
)

#+Pivot
pd.crosstab(index=df2['Y'],+columns=df2['Z'],+values=df2['X'],+aggfunc=pd.Series.nunique)|code-block|syntax|javascript|46872|它返回：|46873|Z+++Z1++Z2++Z3
Y+++++++++++
Y1++1.0+1.0+NaN
Y2++NaN+NaN+1.0|46874|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:
<pre><code>import pandas as pd

# Set example
df2 = (
 pd.DataFrame({
 'X' : ['X1', 'X1', 'X1', 'X1'], 
 'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
 'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
 })
)

# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)
</code></pre>
which returns:
<pre><code>Z Z1 Z2 Z3
Y 
Y1 1.0 1.0 NaN
Y2 NaN NaN 1.0
</code></pre>

blocks|key|2523521|text|为了获得最佳性能，我建议在aggfunc='count'之后使用DataFrame.drop_duplicates。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2523522|其他人是正确的，即aggfunc=pd.Series.nunique将会工作。但是，如果您拥有的index组数量很大(>1000)，则此过程可能会很慢。|2523523|所以不是(引用@Javier)|2523524|df2.pivot_table('X',+'Y',+'Z',+aggfunc=pd.Series.nunique)|code-block|syntax|javascript|2523525|我建议|2523526|df2.drop_duplicates(['X',+'Y',+'Z']).pivot_table('X',+'Y',+'Z',+aggfunc='count')|2523527|这是可行的，因为它保证每个子组(('Y',+'Z')的每个组合)都有唯一的(非重复的)+'X'值。|2523528|entityMap^0|D|F|W|P|0|9|P|1C|5|0|0|0|0|0|G|A|18|3|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]|$9|Z|A|10|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|11|8|@$9|12|A|13|B|C]|$9|14|A|15|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|16|8|@]|D|@]|E|$]]|$1|J|3|K|5|L|7|17|8|@]|D|@]|E|$M|N]]|$1|O|3|P|5|6|7|18|8|@]|D|@]|E|$]]|$1|Q|3|R|5|L|7|19|8|@]|D|@]|E|$M|N]]|$1|S|3|T|5|6|7|1A|8|@$9|1B|A|1C|B|C]|$9|1D|A|1E|B|C]]|D|@]|E|$]]|$1|U|3|-4|5|6|7|1F|8|@]|D|@]|E|$]]]|V|$]]

For best performance I recommend doing <code>DataFrame.drop_duplicates</code> followed up <code>aggfunc='count'</code>.

Others are correct that <code>aggfunc=pd.Series.nunique</code> will work. This can be slow, however, if the number of <code>index</code> groups you have is large (>1000).

So instead of (to quote @Javier)

<pre><code>df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)
</code></pre>

I suggest

<pre><code>df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')
</code></pre>

This works because it guarantees that every subgroup (each combination of <code>('Y', 'Z')</code>) will have unique (non-duplicate) values of <code>'X'</code>.

blocks|key|46886|text|aggfunc=pd.Series.nunique将只计算序列的唯一值-在本例中计算列的唯一值。但这并不能完全反映出它是aggfunc='count'的替代方案|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|46887|对于简单的计数，最好使用aggfunc=pd.Series.count|46888|entityMap^0|0|P|1O|F|0|C|N|0^^$0|@$1|2|3|4|5|6|7|J|8|@$9|K|A|L|B|C]|$9|M|A|N|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|O|8|@$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|H|3|-4|5|6|7|R|8|@]|D|@]|E|$]]]|I|$]]

<code>aggfunc=pd.Series.nunique</code>
will only count unique values for a series - in this case count the unique values for a column. But this doesn't quite reflect as an alternative to <code>aggfunc='count'</code>
For simple counting, it better to use <code>aggfunc=pd.Series.count</code>

This code:
<pre><code>df2 = (
 pd.DataFrame({
 'X' : ['X1', 'X1', 'X1', 'X1'], 
 'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
 'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
 })
)
g = df2.groupby('X')
pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')
</code></pre>
returns the following error:
<pre><code>Traceback (most recent call last): ... 
AttributeError: 'Index' object has no attribute 'index'
</code></pre>
How do I get a Pivot Table with counts of unique values of one DataFrame column for two other columns? 
Is there <code>aggfunc</code> for count unique? Should I be using <code>np.bincount()</code>?
NB. I am aware of <code>pandas.Series.values_counts()</code> however I need a pivot table.
<hr />
EDIT: The output should be:
<pre><code>Z Z1 Z2 Z3
Y 
Y1 1 1 NaN
Y2 NaN NaN 1
</code></pre>

Python Pandas : pivot table with aggfunc = count unique distinct

这段代码：df2 = (    pd.DataFrame({        'X' : ['X1', 'X1', 'X1', 'X1'],         'Y' : ['Y2', 'Y1', 'Y1', 'Y1'],         'Z' : ['Z3', 'Z1', 'Z1', 'Z2']    }))g = df2.groupby('X')pd.pivot_table(g, values=

问Python Pandas :带有aggfunc = count唯一distinct的数据透视表
EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas :带有aggfunc = count唯一distinct的数据透视表EN

回答 9

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas :带有aggfunc = count唯一distinct的数据透视表
EN