blocks|key|5810988|text|看起来您的列中可能有一些空值。您可以使用df+=+df.dropna(subset=['item'])删除它们。然后，df['item'].value_counts().max()应该给你最大计数，df['item'].value_counts().idxmax()应该给你最频繁的值。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|5810989|entityMap^0|K|V|1N|V|2R|Y|0^^$0|@$1|2|3|4|5|6|7|H|8|@$9|I|A|J|B|C]|$9|K|A|L|B|C]|$9|M|A|N|B|C]]|D|@]|E|$]]|$1|F|3|-4|5|6|7|O|8|@]|D|@]|E|$]]]|G|$]]

It looks like you may have some nulls in the column. You can drop them with <code>df = df.dropna(subset=['item'])</code>. Then <code>df['item'].value_counts().max()</code> should give you the max counts, and <code>df['item'].value_counts().idxmax()</code> should give you the most frequent value.

blocks|key|5811048|text|要继续@jonathanrocher回答，你可以在pandas+DataFrame中使用mode。它将给出行或列中最频繁的值(一个或两个)：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|5811049|import+pandas+as+pd
import+numpy+as+np
df+=+pd.DataFrame({"a":+[1,2,2,4,2],+"b":+[np.nan,+np.nan,+np.nan,+3,+3]})

In+[2]:+df.mode()
Out[2]:+
+++a++++b
0++2++3.0|code-block|syntax|javascript|5811050|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mode.html^0|18|4|18|4|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]]|D|@$9|V|A|W|1|X]]|E|$]]|$1|F|3|G|5|H|7|Y|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Z|8|@]|D|@]|E|$]]]|L|$M|$5|N|O|P|E|$Q|R]]]]

To continue to @jonathanrocher answer you could use <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mode.html" rel="noreferrer"><code>mode</code></a> in pandas DataFrame. It'll give a most frequent values (one or two) across the rows or columns:

<pre><code>import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2], "b": [np.nan, np.nan, np.nan, 3, 3]})

In [2]: df.mode()
Out[2]: 
 a b
0 2 3.0
</code></pre>

blocks|key|5811006|text|您还可以考虑使用忽略NaN的scipy的mode函数。使用它的解决方案可能如下所示：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|5811007|from+scipy.stats+import+mode
from+numpy+import+nan
df+=+DataFrame({"a":+[1,2,2,4,2],+"b":+[nan,+nan,+nan,+3,+3]})
print+mode(df)|code-block|syntax|javascript|5811008|输出将如下所示|5811009|(array([[+2.,++3.]]),+array([[+3.,++2.]]))|5811010|这意味着最常见的值是第一列的2和第二列的3，频率分别为3和2。|5811011|entityMap^0|K|4|0|0|0|0|E|1|K|1|R|1|T|1|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|V|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|W|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|Y|8|@$9|Z|A|10|B|C]|$9|11|A|12|B|C]|$9|13|A|14|B|C]|$9|15|A|16|B|C]]|D|@]|E|$]]|$1|Q|3|-4|5|6|7|17|8|@]|D|@]|E|$]]]|R|$]]

You may also consider using scipy's <code>mode</code> function which ignores NaN. A solution using it could look like:

<pre><code>from scipy.stats import mode
from numpy import nan
df = DataFrame({"a": [1,2,2,4,2], "b": [nan, nan, nan, 3, 3]})
print mode(df)
</code></pre>

The output would look like 

<pre><code>(array([[ 2., 3.]]), array([[ 3., 2.]]))
</code></pre>

meaning that the most common values are <code>2</code> for the first columns and <code>3</code> for the second, with frequencies <code>3</code> and <code>2</code> respectively.

blocks|key|751058|text|只需查看您的items_counts系列的第一行：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|751059|top+=+items_counts.head(1)++#+or+items_counts.iloc[[0]]
value,+count+=+top.index[0],+top.iat[0]|code-block|syntax|javascript|751060|这是因为pd.Series.value_counts在默认情况下有sort=True，因此已经按计数排序，最高计数在前。按位置从索引中提取一个值的复杂度为O(1)，而pd.Series.idxmax的复杂度为O(n)，其中n是类别的数量。|751061|仍然可以指定sort=False，然后建议使用idxmax：|751062|items_counts+=+df['item'].value_counts(sort=False)
top+=+items_counts.loc[[items_counts.idxmax()]]
value,+count+=+top.index[0],+top.iat[0]|751063|请注意，在这种情况下，您不需要分别调用max和idxmax，只需通过idxmax提取索引并提供给loc基于标签的索引器即可。|751064|entityMap|0|LINK|mutability|MUTABLE|url|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html|1|https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.idxmax.html^0|6|C|0|0|4|M|X|9|2B|G|4|M|0|2B|G|1|0|6|A|N|6|0|0|J|3|N|6|Y|6|1C|3|0^^$0|@$1|2|3|4|5|6|7|12|8|@$9|13|A|14|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|15|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|16|8|@$9|17|A|18|B|C]|$9|19|A|1A|B|C]|$9|1B|A|1C|B|C]]|D|@$9|1D|A|1E|1|1F]|$9|1G|A|1H|1|1I]]|E|$]]|$1|M|3|N|5|6|7|1J|8|@$9|1K|A|1L|B|C]|$9|1M|A|1N|B|C]]|D|@]|E|$]]|$1|O|3|P|5|H|7|1O|8|@]|D|@]|E|$I|J]]|$1|Q|3|R|5|6|7|1P|8|@$9|1Q|A|1R|B|C]|$9|1S|A|1T|B|C]|$9|1U|A|1V|B|C]|$9|1W|A|1X|B|C]]|D|@]|E|$]]|$1|S|3|-4|5|6|7|1Y|8|@]|D|@]|E|$]]]|T|$U|$5|V|W|X|E|$Y|Z]]|10|$5|V|W|X|E|$Y|11]]]]

Just take the first row of your <code>items_counts</code> series:

<pre><code>top = items_counts.head(1) # or items_counts.iloc[[0]]
value, count = top.index[0], top.iat[0]
</code></pre>

This works because <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html" rel="nofollow noreferrer"><code>pd.Series.value_counts</code></a> has <code>sort=True</code> by default and so is already ordered by counts, highest count first. Extracting a value from an index by location has O(1) complexity, while <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.idxmax.html" rel="nofollow noreferrer"><code>pd.Series.idxmax</code></a> has O(n) complexity where n is the number of categories.

Specifying <code>sort=False</code> is still possible and then <code>idxmax</code> is recommended:

<pre><code>items_counts = df['item'].value_counts(sort=False)
top = items_counts.loc[[items_counts.idxmax()]]
value, count = top.index[0], top.iat[0]
</code></pre>

Notice in this case you don't need to call <code>max</code> and <code>idxmax</code> separately, just extract the index via <code>idxmax</code> and feed to the <code>loc</code> label-based indexer.

blocks|key|5328860|text|添加以下代码行以查找最常用的值|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5328861|df["item"].value_counts().nlargest(n=1).values[0]|code-block|syntax|javascript|5328862|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Add this line of code to find the most frequent value

<pre><code>df["item"].value_counts().nlargest(n=1).values[0]
</code></pre>

blocks|key|5811165|text|在计算频率时，将省略NaN值。Please+check+your+code+functionality+here，但是您可以使用下面的代码来实现相同的功能。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|5811166|**>>+Code:**
++++#+Importing+required+module
++++from+collections+import+Counter

++++#+Creating+a+dataframe
++++df+=+pd.DataFrame({+'A':["jan","jan","jan","mar","mar","feb","jan","dec",
+++++++++++++++++++++++++++++"mar","jan","dec"]++})+
++++#+Creating+a+counter+object
++++count+=+Counter(df['A'])
++++#+Calling+a+method+of+Counter+object(count)
++++count.most_common(3)

**>>+Output:**

++++[('jan',+5),+('mar',+3),+('dec',+2)]|code-block|syntax|javascript|5811167|entityMap|0|LINK|mutability|MUTABLE|url|https://i.stack.imgur.com/I8zsO.png^0|F|15|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|E|5|F|7|U|8|@]|9|@]|C|$G|H]]|$1|I|3|-4|5|6|7|V|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

The NaN values are omitted for calculating frequencies. 
<a href="https://i.stack.imgur.com/I8zsO.png" rel="nofollow noreferrer">Please check your code functionality here</a>
But you can use the below code for same functionality.

<pre class="lang-py prettyprint-override"><code>**&gt;&gt; Code:**
 # Importing required module
 from collections import Counter

 # Creating a dataframe
 df = pd.DataFrame({ 'A':["jan","jan","jan","mar","mar","feb","jan","dec",
 "mar","jan","dec"] }) 
 # Creating a counter object
 count = Counter(df['A'])
 # Calling a method of Counter object(count)
 count.most_common(3)

**&gt;&gt; Output:**

 [('jan', 5), ('mar', 3), ('dec', 2)]
</code></pre>

I have a data frame and I would like to know how many times a given column has the most frequent value.

I try to do it in the following way:

<pre><code>items_counts = df['item'].value_counts()
max_item = items_counts.max()
</code></pre>

As a result I get:

<pre><code>ValueError: cannot convert float NaN to integer
</code></pre>

As far as I understand, with the first line I get series in which the values from a column are used as key and frequency of these values are used as values. So, I just need to find the largest value in the series and, because of some reason, it does not work. Does anybody know how this problem can be solved?

How to get the number of the most frequent value in a column?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我有一个数据框，我想知道给定列出现频率最高的次数。我试着用下面的方法来做：items_counts = df['item'].value_counts()max_item = items_counts.max()结果，我得到了：ValueError: cannot convert float NaN to integer据我所知，在第一行中，我得到了一列中的值被用作键，这些值的频率被用作值的序列。

问如何获取一列中最常用的数值？
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取一列中最常用的数值？EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何获取一列中最常用的数值？
EN