blocks|key|141376|text|使用groupby和count|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|141377|In+[37]:
df+=+pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()

Out[37]:

+++a
a+++
a++2
b++3
s++2

[3+rows+x+1+columns]|code-block|syntax|javascript|141378|请参阅在线文档：https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html|141379|还有@DSM评论的value_counts()，这里有很多种剥猫皮的方法。|141380|In+[38]:
df['a'].value_counts()

Out[38]:

b++++3
a++++2
s++++2
dtype:+int64|141381|如果您希望将频率添加回原始数据帧，请使用transform返回对齐的索引：|141382|In+[41]:
df['freq']+=+df.groupby('a')['a'].transform('count')
df

Out[41]:

+++a+freq
0++a++++2
1++b++++3
2++s++++2
3++s++++2
4++b++++3
5++a++++2
6++b++++3

[7+rows+x+2+columns]|141383|entityMap|0|LINK|mutability|MUTABLE|url|https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html|1|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html^0|2|7|A|5|0|0|8|1W|0|0|9|E|9|E|1|0|0|K|9|0|0^^$0|@$1|2|3|4|5|6|7|14|8|@$9|15|A|16|B|C]|$9|17|A|18|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|19|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|1A|8|@]|D|@$9|1B|A|1C|1|1D]]|E|$]]|$1|M|3|N|5|6|7|1E|8|@$9|1F|A|1G|B|C]]|D|@$9|1H|A|1I|1|1J]]|E|$]]|$1|O|3|P|5|H|7|1K|8|@]|D|@]|E|$I|J]]|$1|Q|3|R|5|6|7|1L|8|@$9|1M|A|1N|B|C]]|D|@]|E|$]]|$1|S|3|T|5|H|7|1O|8|@]|D|@]|E|$I|J]]|$1|U|3|-4|5|6|7|1P|8|@]|D|@]|E|$]]]|V|$W|$5|X|Y|Z|E|$10|11]]|12|$5|X|Y|Z|E|$10|13]]]]

Use <code>groupby</code> and <code>count</code>:
<pre><code>In [37]:
df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()

Out[37]:

 a
a 
a 2
b 3
s 2

[3 rows x 1 columns]
</code></pre>
See the online docs: <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html" rel="noreferrer">https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html</a>
Also <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html" rel="noreferrer"><code>value_counts()</code></a> as @DSM has commented, many ways to skin a cat here
<pre><code>In [38]:
df['a'].value_counts()

Out[38]:

b 3
a 2
s 2
dtype: int64
</code></pre>
If you wanted to add frequency back to the original dataframe use <code>transform</code> to return an aligned index:
<pre><code>In [41]:
df['freq'] = df.groupby('a')['a'].transform('count')
df

Out[41]:

 a freq
0 a 2
1 b 3
2 s 2
3 s 2
4 b 3
5 a 2
6 b 3

[7 rows x 2 columns]
</code></pre>

blocks|key|4647974|text|如果要应用于所有列，可以使用：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4647975|df.apply(pd.value_counts)|code-block|syntax|javascript|4647976|这将对每一列应用一个基于列的聚合函数(在本例中为value_counts)。|4647977|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

If you want to apply to all columns you can use:

<pre><code>df.apply(pd.value_counts)
</code></pre>

This will apply a column based aggregation function (in this case value_counts) to each of the columns.

blocks|key|4648222|text|df.category.value_counts()|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|4648223|这一小段代码将为您提供所需的输出。|unstyled|4648224|如果您的列名包含空格，则可以使用|4648225|df['category'].value_counts()|4648226|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|N|8|@]|9|@]|A|$]]|$1|G|3|H|5|F|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|P|8|@]|9|@]|A|$B|C]]|$1|K|3|-4|5|F|7|Q|8|@]|9|@]|A|$]]]|L|$]]

<pre><code>df.category.value_counts()
</code></pre>

This short little line of code will give you the output you want.

If your column name has spaces you can use

<pre><code>df['category'].value_counts()
</code></pre>

blocks|key|4648131|text|df.apply(pd.value_counts).fillna(0)|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|4648132|value_counts+-返回包含唯一值计数的对象|unstyled|offset|length|4648133|apply+-统计每列中的频率。如果设置了axis=1，则会获得每行中的频率|style|CODE|4648134|fillna(0)+-使输出更花哨。将NaN更改为0|4648135|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html|1|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html^0|0|0|C|0|0|L|6|0|5|1|0|0^^$0|@$1|2|3|4|5|6|7|Y|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|Z|8|@]|9|@$G|10|H|11|1|12]]|A|$]]|$1|I|3|J|5|F|7|13|8|@$G|14|H|15|K|L]]|9|@$G|16|H|17|1|18]]|A|$]]|$1|M|3|N|5|F|7|19|8|@]|9|@]|A|$]]|$1|O|3|-4|5|F|7|1A|8|@]|9|@]|A|$]]]|P|$Q|$5|R|S|T|A|$U|V]]|W|$5|R|S|T|A|$U|X]]]]

<pre><code>df.apply(pd.value_counts).fillna(0)
</code></pre>

<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html" rel="noreferrer">value_counts</a> - Returns object containing counts of unique values

<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html" rel="noreferrer">apply</a> - count frequency in every column. If you set <code>axis=1</code>, you get frequency in every row

fillna(0) - make output more fancy. Changed NaN to 0

blocks|key|141499|text|在0.18.1中，groupby和count不给出唯一值的频率：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|141500|>>>+df
+++a
0++a
1++b
2++s
3++s
4++b
5++a
6++b

>>>+df.groupby('a').count()
Empty+DataFrame
Columns:+[]
Index:+[a,+b,+s]|code-block|syntax|javascript|141501|但是，使用size可以轻松确定唯一值及其频率|141502|>>>+df.groupby('a').size()
a
a++++2
b++++3
s++++2|141503|对于df.a.value_counts()，默认返回排序后的值(按降序排列，即先返回最大值)。|141504|entityMap^0|9|7|H|5|0|0|5|4|0|0|2|J|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|Y|8|@$9|Z|A|10|B|C]]|D|@]|E|$]]|$1|M|3|N|5|H|7|11|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|12|8|@$9|13|A|14|B|C]]|D|@]|E|$]]|$1|Q|3|-4|5|6|7|15|8|@]|D|@]|E|$]]]|R|$]]

In 0.18.1 <code>groupby</code> together with <code>count</code> does not give the frequency of unique values:

<pre><code>&gt;&gt;&gt; df
 a
0 a
1 b
2 s
3 s
4 b
5 a
6 b

&gt;&gt;&gt; df.groupby('a').count()
Empty DataFrame
Columns: []
Index: [a, b, s]
</code></pre>

However, the unique values and their frequencies are easily determined using <code>size</code>:

<pre><code>&gt;&gt;&gt; df.groupby('a').size()
a
a 2
b 3
s 2
</code></pre>

With <code>df.a.value_counts()</code> sorted values (in descending order, i.e. largest value first) are returned by default.

blocks|key|4647912|text|对df中的多个列使用列表理解和value_counts|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4647913|[my_series[c].value_counts()+for+c+in+list(my_series.select_dtypes(include=['O']).columns)]|code-block|syntax|javascript|4647914|https://stackoverflow.com/a/28192263/786326|offset|length|4647915|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|0|17|0|0^^$0|@$1|2|3|4|5|6|7|R|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|S|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|T|8|@]|9|@$I|U|J|V|1|W]]|A|$]]|$1|K|3|-4|5|6|7|X|8|@]|9|@]|A|$]]]|L|$M|$5|N|O|P|A|$Q|H]]]]

Using list comprehension and value_counts for multiple columns in a df

<pre><code>[my_series[c].value_counts() for c in list(my_series.select_dtypes(include=['O']).columns)]
</code></pre>

<a href="https://stackoverflow.com/a/28192263/786326">https://stackoverflow.com/a/28192263/786326</a>

blocks|key|141653|text|如果您的DataFrame具有相同类型的值，您还可以在numpy.unique()中设置return_counts=True。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|141654|index,+counts+=+np.unique(df.values,return_counts=True)|141655|如果你的值是整数，np.bincount()可能会更快。|141656|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html|1|https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.bincount.html^0|18|I|R|E|0|0|0|1J|0|9|D|1|0^^$0|@$1|2|3|4|5|6|7|T|8|@$9|U|A|V|B|C]]|D|@$9|W|A|X|1|Y]]|E|$]]|$1|F|3|G|5|6|7|Z|8|@$9|10|A|11|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|12|8|@]|D|@$9|13|A|14|1|15]]|E|$]]|$1|J|3|-4|5|6|7|16|8|@]|D|@]|E|$]]]|K|$L|$5|M|N|O|E|$P|Q]]|R|$5|M|N|O|E|$P|S]]]]

If your DataFrame has values with the same type, you can also set <code>return_counts=True</code> in <a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.unique.html" rel="noreferrer">numpy.unique()</a>.

<code>index, counts = np.unique(df.values,return_counts=True)
</code>

<a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.bincount.html" rel="noreferrer">np.bincount()</a> could be faster if your values are integers.

blocks|key|141829|text|正如每个人所说，更快的解决方案是这样做：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|141830|df.column_to_analyze.value_counts()|code-block|syntax|javascript|141831|但是，如果您想在数据帧中使用以下模式的输出：|141832|df+input:

category
cat+a
cat+b
cat+a

df+output:+

category+++counts
cat+a++++++++2
cat+b++++++++1+
cat+a++++++++2|141833|您可以这样做：|141834|df['counts']+=+df.category.map(df.category.value_counts())
df+|141835|entityMap^0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|R|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|S|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|K|3|L|5|6|7|U|8|@]|9|@]|A|$]]|$1|M|3|N|5|D|7|V|8|@]|9|@]|A|$E|F]]|$1|O|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|P|$]]

As everyone said, the faster solution is to do:
<pre><code>df.column_to_analyze.value_counts()
</code></pre>
But if you want to use the output in your dataframe, with this schema:
<pre><code>df input:

category
cat a
cat b
cat a

df output: 

category counts
cat a 2
cat b 1 
cat a 2
</code></pre>
you can do this:
<pre><code>df['counts'] = df.category.map(df.category.value_counts())
df 
</code></pre>

blocks|key|4648058|text|在没有任何库的情况下，您可以这样做：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4648059|def+to_frequency_table(data):
++++frequencytable+=+{}
++++for+key+in+data:
++++++++if+key+in+frequencytable:
++++++++++++frequencytable[key]+%2B=+1
++++++++else:
++++++++++++frequencytable[key]+=+1
++++return+frequencytable|code-block|syntax|javascript|4648060|示例：|4648061|to_frequency_table([1,1,1,1,2,3,4,4])
>>>+{1:+4,+2:+1,+3:+1,+4:+2}|4648062|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

Without any libraries, you could do this instead:

<pre><code>def to_frequency_table(data):
 frequencytable = {}
 for key in data:
 if key in frequencytable:
 frequencytable[key] += 1
 else:
 frequencytable[key] = 1
 return frequencytable
</code></pre>

Example:

<pre><code>to_frequency_table([1,1,1,1,2,3,4,4])
&gt;&gt;&gt; {1: 4, 2: 1, 3: 1, 4: 2}
</code></pre>

blocks|key|4648249|text|您也可以通过首先将您的列作为类别广播来对pandas执行此操作，例如dtype="category"例如。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4648250|cats+=+['client',+'hotel',+'currency',+'ota',+'user_country']

df[cats]+=+df[cats].astype('category')|code-block|syntax|javascript|4648251|然后调用describe|4648252|df[cats].describe()|4648253|这将为您提供一个很好的值计数表以及更多内容:)：|4648254|++++client++hotel+++currency++++ota+user_country
count+++852845++852845++852845++852845++852845
unique++2554++++17477+++132+14++219
top+2198++++13202+++USD+Hades+++US
freq++++102562++8847++++516500++242734++340992|4648255|entityMap^0|Y|G|0|0|4|8|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|Y|8|@$9|Z|A|10|B|C]]|D|@]|E|$]]|$1|M|3|N|5|H|7|11|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|12|8|@]|D|@]|E|$]]|$1|Q|3|R|5|H|7|13|8|@]|D|@]|E|$I|J]]|$1|S|3|-4|5|6|7|14|8|@]|D|@]|E|$]]]|T|$]]

You can also do this with pandas by broadcasting your columns as categories first, e.g. <code>dtype="category"</code> e.g.

<pre><code>cats = ['client', 'hotel', 'currency', 'ota', 'user_country']

df[cats] = df[cats].astype('category')
</code></pre>

and then calling <code>describe</code>:

<pre><code>df[cats].describe()
</code></pre>

This will give you a nice table of value counts and a bit more :):

<pre><code> client hotel currency ota user_country
count 852845 852845 852845 852845 852845
unique 2554 17477 132 14 219
top 2198 13202 USD Hades US
freq 102562 8847 516500 242734 340992
</code></pre>

blocks|key|4648285|text|@metatoaster已经指出了这一点。去找Counter。它燃烧得很快。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4648286|import+pandas+as+pd
from+collections+import+Counter
import+timeit
import+numpy+as+np

df+=+pd.DataFrame(np.random.randint(1,+10000,+(100,+2)),+columns=["NumA",+"NumB"])|code-block|syntax|javascript|4648287|计时器|4648288|%25timeit+-n+10000+df['NumA'].value_counts()
#+10000+loops,+best+of+3:+715+µs+per+loop

%25timeit+-n+10000+df['NumA'].value_counts().to_dict()
#+10000+loops,+best+of+3:+796+µs+per+loop

%25timeit+-n+10000+Counter(df['NumA'])
#+10000+loops,+best+of+3:+74+µs+per+loop

%25timeit+-n+10000+df.groupby(['NumA']).count()
#+10000+loops,+best+of+3:+1.29+ms+per+loop|4648289|干杯!|4648290|entityMap^0|N|7|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|V|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|W|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|Y|8|@]|D|@]|E|$]]|$1|Q|3|-4|5|6|7|Z|8|@]|D|@]|E|$]]]|R|$]]

@metatoaster has already pointed this out.
 Go for <code>Counter</code>. It's blazing fast. 

<pre><code>import pandas as pd
from collections import Counter
import timeit
import numpy as np

df = pd.DataFrame(np.random.randint(1, 10000, (100, 2)), columns=["NumA", "NumB"])
</code></pre>

<h1>Timers</h1>

<pre><code>%timeit -n 10000 df['NumA'].value_counts()
# 10000 loops, best of 3: 715 µs per loop

%timeit -n 10000 df['NumA'].value_counts().to_dict()
# 10000 loops, best of 3: 796 µs per loop

%timeit -n 10000 Counter(df['NumA'])
# 10000 loops, best of 3: 74 µs per loop

%timeit -n 10000 df.groupby(['NumA']).count()
# 10000 loops, best of 3: 1.29 ms per loop
</code></pre>

Cheers!

blocks|key|4648319|text|我相信这对于任何DataFrame列列表都应该很好地工作。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4648320|def+column_list(x):
++++column_list_df+=+[]
++++for+col_name+in+x.columns:
++++++++y+=+col_name,+len(x[col_name].unique())
++++++++column_list_df.append(y)
return+pd.DataFrame(column_list_df)

column_list_df.rename(columns={0:+"Feature",+1:+"Value_count"})|code-block|syntax|javascript|4648321|函数"column_list“检查列名，然后检查每个列值的唯一性。|4648322|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

I believe this should work fine for any DataFrame columns list.

<pre><code>def column_list(x):
 column_list_df = []
 for col_name in x.columns:
 y = col_name, len(x[col_name].unique())
 column_list_df.append(y)
return pd.DataFrame(column_list_df)

column_list_df.rename(columns={0: "Feature", 1: "Value_count"})
</code></pre>

The function "column_list" checks the columns names and then checks the uniqueness of each column values.

blocks|key|4648344|text|下面的代码为名为"smaller_dat1“的数据帧中名为"Total_score”的列中的各个值创建频率表，然后返回值"300“在该列中出现的次数。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4648345|valuec+=+smaller_dat1.Total_score.value_counts()
valuec.loc[300]|code-block|syntax|javascript|4648346|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

The following code creates frequency table for the various values in a column called &quot;Total_score&quot; in a dataframe called &quot;smaller_dat1&quot;, and then returns the number of times the value &quot;300&quot; appears in the column.
<pre><code>valuec = smaller_dat1.Total_score.value_counts()
valuec.loc[300]
</code></pre>

blocks|key|4648296|text|your+data:

%7Ccategory%7C
cat+a
cat+b
cat+a|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|4648297|解决方案：|unstyled|4648298|+df['freq']+=+df.groupby('category')['category'].transform('count')
+df+=++df.drop_duplicates()|4648299|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$B|C]]|$1|D|3|E|5|F|7|L|8|@]|9|@]|A|$]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$B|C]]|$1|I|3|-4|5|F|7|N|8|@]|9|@]|A|$]]]|J|$]]

<pre><code>your data:

|category|
cat a
cat b
cat a
</code></pre>

solution:

<pre><code> df['freq'] = df.groupby('category')['category'].transform('count')
 df = df.drop_duplicates()
</code></pre>

I have a dataset
<pre><code>category
cat a
cat b
cat a
</code></pre>
I'd like to be able to return something like (showing unique values and frequency)
<pre><code>category freq 
cat a 2
cat b 1
</code></pre>

Count the frequency that a value occurs in a dataframe column

我有一个数据集categorycat acat bcat a我希望能够返回如下内容(显示唯一的值和频率)category   freq cat a       2cat b       1

问计算值在数据框列中出现的频率
EN

回答 14

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算值在数据框列中出现的频率EN

回答 14

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计算值在数据框列中出现的频率
EN