blocks|key|165860|text|整数列中缺少NaN表示是一个错误(+pandas+"gotcha"+)。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|165861|通常的解决方法是简单地使用浮点数。|165862|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#support-for-integer-na^0|I|F|0|0|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@$A|O|B|P|1|Q]]|C|$]]|$1|D|3|E|5|6|7|R|8|@]|9|@]|C|$]]|$1|F|3|-4|5|6|7|S|8|@]|9|@]|C|$]]]|G|$H|$5|I|J|K|C|$L|M]]]]

The lack of NaN rep in integer columns is a <a href="http://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#support-for-integer-na" rel="noreferrer">pandas "gotcha"</a>.

The usual workaround is to simply use floats.

blocks|key|4672425|text|在0.24.%2B版本中，pandas已经获得了保存缺少值的整数数据类型的能力。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672426|Nullable+Integer+Data+Type。|offset|length|4672427|Pandas可以使用arrays.IntegerArray表示可能缺少值的整数数据。这是一个在pandas中实现的扩展类型。它不是整数的默认数据类型，因此无法推断；必须显式地将该数据类型传递给array()或Series|style|CODE|4672428|arr+=+pd.array([1,+2,+np.nan],+dtype=pd.Int64Dtype())
pd.Series(arr)

0++++++1
1++++++2
2++++NaN
dtype:+Int64|code-block|syntax|javascript|4672429|对于将列转换为可以为空的整数，请使用：|4672430|df['myCol']+=+df['myCol'].astype('Int64')|4672431|entityMap|0|LINK|mutability|MUTABLE|url|http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html|1|http://pandas.pydata.org/pandas-docs/stable/reference/pandas.arrays.IntegerArray.html|2|http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.array.html#pandas.array^0|0|0|Q|0|0|A|J|2O|7|2W|6|A|J|1|2O|7|2|0|0|0|0^^$0|@$1|2|3|4|5|6|7|14|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|15|8|@]|9|@$D|16|E|17|1|18]]|A|$]]|$1|F|3|G|5|6|7|19|8|@$D|1A|E|1B|H|I]|$D|1C|E|1D|H|I]|$D|1E|E|1F|H|I]]|9|@$D|1G|E|1H|1|1I]|$D|1J|E|1K|1|1L]]|A|$]]|$1|J|3|K|5|L|7|1M|8|@]|9|@]|A|$M|N]]|$1|O|3|P|5|6|7|1N|8|@]|9|@]|A|$]]|$1|Q|3|R|5|L|7|1O|8|@]|9|@]|A|$M|N]]|$1|S|3|-4|5|6|7|1P|8|@]|9|@]|A|$]]]|T|$U|$5|V|W|X|A|$Y|Z]]|10|$5|V|W|X|A|$Y|11]]|12|$5|V|W|X|A|$Y|13]]]]

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values.

<a href="http://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html" rel="noreferrer">Nullable Integer Data Type</a>.

Pandas can represent integer data with possibly missing values using <a href="http://pandas.pydata.org/pandas-docs/stable/reference/pandas.arrays.IntegerArray.html" rel="noreferrer"><code>arrays.IntegerArray</code></a>. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass the dtype into <a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.array.html#pandas.array" rel="noreferrer"><code>array()</code></a> or <code>Series</code>:

<pre><code>arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype())
pd.Series(arr)

0 1
1 2
2 NaN
dtype: Int64
</code></pre>

For convert column to nullable integers use:

<pre><code>df['myCol'] = df['myCol'].astype('Int64')
</code></pre>

blocks|key|166155|text|我的用例是在加载到DB表之前转换数据：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|166156|df[col]+=+df[col].fillna(-1)
df[col]+=+df[col].astype(int)
df[col]+=+df[col].astype(str)
df[col]+=+df[col].replace('-1',+np.nan)|code-block|syntax|javascript|166157|删除NaNs，将其转换为int，再将其转换为str，然后重新插入NANs。|166158|它不是很漂亮，但它完成了工作！|166159|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|6|7|P|8|@]|9|@]|A|$]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

My use case is munging data prior to loading into a DB table:

<pre><code>df[col] = df[col].fillna(-1)
df[col] = df[col].astype(int)
df[col] = df[col].astype(str)
df[col] = df[col].replace('-1', np.nan)
</code></pre>

Remove NaNs, convert to int, convert to str and then reinsert NANs.

It's not pretty but it gets the job done!

blocks|key|4672260|text|如果您绝对希望在一列中组合整数和can，则可以使用'object‘数据类型：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672261|df['col']+=+(
++++df['col'].fillna(0)
++++.astype(int)
++++.astype(object)
++++.where(df['col'].notnull())
)|code-block|syntax|javascript|4672262|这将用一个整数替换NaNs+(不管是哪一个)，转换为int，转换为object，最后重新插入NaNs。|4672263|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

If you absolutely want to combine integers and NaNs in a column, you can use the 'object' data type:

<pre><code>df['col'] = (
 df['col'].fillna(0)
 .astype(int)
 .astype(object)
 .where(df['col'].notnull())
)
</code></pre>

This will replace NaNs with an integer (doesn't matter which), convert to int, convert to object and finally reinsert NaNs.

blocks|key|4672617|text|几周前，我遇到了这个问题，一些离散的特征被格式化为'object‘。这个解决方案似乎奏效了。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672618|for+col+in+discrete:
++++df[col]+=+pd.to_numeric(df[col],errors='coerce').astype(pd.Int64Dtype())|code-block|syntax|javascript|4672619|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

I had the problem a few weeks ago with a few discrete features which were formatted as 'object'. This solution seemed to work.
<pre><code>for col in discrete:
 df[col] = pd.to_numeric(df[col],errors='coerce').astype(pd.Int64Dtype())
</code></pre>

blocks|key|4671814|text|如果您可以修改存储的数据，请使用缺少的id的标记值。从列名推断出的一种常见用例是，id是一个严格大于零的整数，您可以使用0作为标记值，这样您就可以编写|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4671815|if+row['id']:
+++regular_process(row)
else:
+++special_process(row)|code-block|syntax|javascript|4671816|entityMap^0|J|2|15|2|1O|1|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|T|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|U|8|@]|D|@]|E|$]]]|L|$]]

If you can modify your stored data, use a sentinel value for missing <code>id</code>. A common use case, inferred by the column name, being that <code>id</code> is an integer, strictly greater than zero, you could use <code>0</code> as a sentinel value so that you can write

<pre><code>if row['id']:
 regular_process(row)
else:
 special_process(row)
</code></pre>

blocks|key|4672368|text|这里的大多数解决方案告诉您如何使用占位符整数来表示空值。但是，如果您不确定integer不会出现在源数据中，那么这种方法没有什么帮助。我的方法用来格式化不带小数值的浮点数，并将空值转换为None，结果是一个对象数据类型，当加载到CSV中时，它看起来就像一个带有空值的整型字段。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672369|keep_df[col]+=+keep_df[col].apply(lambda+x:+None+if+pandas.isnull(x)+else+'{0:.0f}'.format(pandas.to_numeric(x)))|code-block|syntax|javascript|4672370|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Most solutions here tell you how to use a placeholder integer to represent nulls. That approach isn't helpful if you're uncertain that integer won't show up in your source data though. My method with will format floats without their decimal values and convert nulls to None's. The result is an object datatype that will look like an integer field with null values when loaded into a CSV.

<pre><code>keep_df[col] = keep_df[col].apply(lambda x: None if pandas.isnull(x) else '{0:.0f}'.format(pandas.to_numeric(x)))
</code></pre>

blocks|key|166797|text|使用.fillna()将所有NaN值替换为0，然后使用astype(int)将其转换为int|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|166798|df['id']+=+df['id'].fillna(0).astype(int)|code-block|syntax|javascript|166799|entityMap^0|2|9|E|3|L|1|R|B|17|3|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|X|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|D|@]|E|$]]]|L|$]]

Use <code>.fillna()</code> to replace all <code>NaN</code> values with <code>0</code> and then convert it to <code>int</code> using <code>astype(int)</code>
<pre><code>df['id'] = df['id'].fillna(0).astype(int)
</code></pre>

blocks|key|166549|text|import+pandas+as+pd

df=+pd.read_csv("data.csv")
df['id']+=+pd.to_numeric(df['id'])|type|code-block|depth|inlineStyleRanges|entityRanges|data|syntax|javascript|166550|unstyled|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|G|8|@]|9|@]|A|$B|C]]|$1|D|3|-4|5|E|7|H|8|@]|9|@]|A|$]]]|F|$]]

<pre><code>import pandas as pd

df= pd.read_csv("data.csv")
df['id'] = pd.to_numeric(df['id'])
</code></pre>

blocks|key|4672536|text|从Panda1.0.0开始，你现在可以使用pandas.NA的值。这不会强制将缺少值的整型列设置为浮点型。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672537|当你读入你的数据时，你需要做的就是：|4672538|df=+pd.read_csv("data.csv",+dtype={'id':+'Int64'})++|code-block|syntax|javascript|4672539|注意'Int64‘是用引号括起来的，而i是大写的。这就区分了熊猫的“int64”和numpy的int64。|4672540|顺便说一句，这也适用于.astype()|4672541|df['id']+=+df['id'].astype('Int64')|4672542|文档在此处https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html|offset|length|4672543|entityMap|0|LINK|mutability|MUTABLE|url|https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html^0|0|0|0|0|0|0|5|1Z|0|0^^$0|@$1|2|3|4|5|6|7|10|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|11|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|12|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|13|8|@]|9|@]|A|$]]|$1|K|3|L|5|6|7|14|8|@]|9|@]|A|$]]|$1|M|3|N|5|F|7|15|8|@]|9|@]|A|$G|H]]|$1|O|3|P|5|6|7|16|8|@]|9|@$Q|17|R|18|1|19]]|A|$]]|$1|S|3|-4|5|6|7|1A|8|@]|9|@]|A|$]]]|T|$U|$5|V|W|X|A|$Y|Z]]]]

As of Pandas 1.0.0 you can now use pandas.NA values. This does not force integer columns with missing values to be floats.
When reading in your data all you have to do is:
<pre><code>df= pd.read_csv(&quot;data.csv&quot;, dtype={'id': 'Int64'}) 
</code></pre>
Notice the 'Int64' is surrounded by quotes and the I is capitalized. This distinguishes Panda's 'Int64' from numpy's int64.
As a side note, this will also work with .astype()
<pre><code>df['id'] = df['id'].astype('Int64')
</code></pre>
Documentation here
<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html" rel="nofollow noreferrer">https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html</a>

blocks|key|166734|text|如果你想在链接方法时使用它，你可以使用assign：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|166735|df+=+(
+++++df.assign(col+=+lambda+x:+x['col'].astype('Int64'))
)|code-block|syntax|javascript|166736|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

If you want to use it when you chain methods, you can use assign:
<pre><code>df = (
 df.assign(col = lambda x: x['col'].astype('Int64'))
)
</code></pre>

blocks|key|4672733|text|对于任何需要在包含NULL/NaN的列中包含int值的人，但在其他答案中提到的无法使用pandas版本0.24.0可空整数特性的约束下工作，我建议使用pd.where将列转换为对象类型：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672734|df+=+df.where(pd.notnull(df),+None)|code-block|syntax|javascript|4672735|这会将dataframe中的所有NaNs转换为None，将混合类型的列视为对象，但将int值保留为int，而不是float。|4672736|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

For anyone needing to have int values within NULL/NaN-containing columns, but working under the constraint of being unable to use pandas version 0.24.0 nullable integer features mentioned in other answers, I suggest converting the columns to object type using pd.where:
<pre><code>df = df.where(pd.notnull(df), None)
</code></pre>
This converts all NaNs in the dataframe to None, treating mixed-type columns as objects, but leaving the int values as int, rather than float.

blocks|key|4672877|text|无论您的pandas系列是object数据类型还是简单的float数据类型，下面的方法都将起作用|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4672878|df+=+pd.read_csv("data.csv")+
df['id']+=+df['id'].astype(float).astype('Int64')|code-block|syntax|javascript|4672879|entityMap^0|D|6|S|5|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|R|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|S|8|@]|D|@]|E|$]]]|L|$]]

Whether your pandas series is <code>object</code> datatype or simply <code>float</code> datatype the below method will work
<pre class="lang-py prettyprint-override"><code>df = pd.read_csv(&quot;data.csv&quot;) 
df['id'] = df['id'].astype(float).astype('Int64')
</code></pre>

blocks|key|4672113|text|我在使用pyspark时遇到了这个问题。因为这是运行在jvm上的代码的python前端，所以它需要类型安全，并且不能使用float而不是int。我通过将pandas+pd.read_csv包装在一个函数中解决了这个问题，该函数将在将用户定义的列转换为所需的类型之前，使用用户定义的填充值填充这些列。以下是我最终使用的内容：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4672114|def+custom_read_csv(file_path,+custom_dtype+=+None,+fill_values+=+None,+**kwargs):
++++if+custom_dtype+is+None:
++++++++return+pd.read_csv(file_path,+**kwargs)
++++else:
++++++++assert+'dtype'+not+in+kwargs.keys()
++++++++df+=+pd.read_csv(file_path,+dtype+=+{},+**kwargs)
++++++++for+col,+typ+in+custom_dtype.items():
++++++++++++if+fill_values+is+None+or+col+not+in+fill_values.keys():
++++++++++++++++fill_val+=+-1
++++++++++++else:
++++++++++++++++fill_val+=+fill_values[col]
++++++++++++df[col]+=+df[col].fillna(fill_val).astype(typ)
++++return+df|code-block|syntax|javascript|4672115|entityMap^0|2B|B|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|P|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|Q|8|@]|D|@]|E|$]]]|L|$]]

I ran into this issue working with pyspark. As this is a python frontend for code running on a jvm, it requires type safety and using float instead of int is not an option. I worked around the issue by wrapping the pandas <code>pd.read_csv</code> in a function that will fill user-defined columns with user-defined fill values before casting them to the required type. Here is what I ended up using:

<pre><code>def custom_read_csv(file_path, custom_dtype = None, fill_values = None, **kwargs):
 if custom_dtype is None:
 return pd.read_csv(file_path, **kwargs)
 else:
 assert 'dtype' not in kwargs.keys()
 df = pd.read_csv(file_path, dtype = {}, **kwargs)
 for col, typ in custom_dtype.items():
 if fill_values is None or col not in fill_values.keys():
 fill_val = -1
 else:
 fill_val = fill_values[col]
 df[col] = df[col].fillna(fill_val).astype(typ)
 return df
</code></pre>

blocks|key|166645|text|试试这个：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|166646|df[['id']]+=+df[['id']].astype(pd.Int64Dtype())|offset|length|style|CODE|166647|如果你打印的是dtypes，你会得到id+++++++Int64而不是普通的one+++++++int64|166648|entityMap^0|0|0|1B|0|7|6|I|E|12|F|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|M|8|@$D|N|E|O|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|P|8|@$D|Q|E|R|F|G]|$D|S|E|T|F|G]|$D|U|E|V|F|G]]|9|@]|A|$]]|$1|J|3|-4|5|6|7|W|8|@]|9|@]|A|$]]]|K|$]]

Try this:
<code>df[['id']] = df[['id']].astype(pd.Int64Dtype())</code>
If you print it's <code>dtypes</code>, you will get <code>id Int64</code> instead of normal <code>one int64</code>

blocks|key|4672803|text|首先，您需要指定可以处理空整数数据的较新整数类型Int8+(...Int64)+(pandas版本>=+0.24.0)。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672804|df+=+df.astype('Int8')|offset|length|style|CODE|4672805|但您可能希望仅针对具有混合了NaN/nulls的整数数据的特定列：|4672806|df+=+df.astype({'col1':'Int8','col2':'Int8','col3':'Int8')|4672807|此时，NaN将转换为<NA>，如果要使用df.fillna()更改缺省的NULL值，则需要在要更改的列上强制使用对象数据类型，否则将看到TypeError:+<U1+cannot+be+converted+to+an+IntegerDtype|4672808|如果您不介意将每个列的数据类型更改为object+(单独地，每个值的类型仍然保留)，则可以通过df+=+df.astype(object)完成此操作……如果您更喜欢以单个列为目标，则为df+=+df.astype({"col1":+object,"col2":+object})。|4672809|这应该有助于强制混合了空值的整数列保持为整数格式，并将空值更改为您喜欢的任何值。我不能评价这种方法的效率，但它可以满足我的格式化和打印目的。|4672810|entityMap^0|0|0|M|0|0|0|1M|0|A|4|1W|1H|0|1B|M|2K|1B|0|0^^$0|@$1|2|3|4|5|6|7|T|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|U|8|@$D|V|E|W|F|G]]|9|@]|A|$]]|$1|H|3|I|5|6|7|X|8|@]|9|@]|A|$]]|$1|J|3|K|5|6|7|Y|8|@$D|Z|E|10|F|G]]|9|@]|A|$]]|$1|L|3|M|5|6|7|11|8|@$D|12|E|13|F|G]|$D|14|E|15|F|G]]|9|@]|A|$]]|$1|N|3|O|5|6|7|16|8|@$D|17|E|18|F|G]|$D|19|E|1A|F|G]]|9|@]|A|$]]|$1|P|3|Q|5|6|7|1B|8|@]|9|@]|A|$]]|$1|R|3|-4|5|6|7|1C|8|@]|9|@]|A|$]]]|S|$]]

First you need to specify the newer integer type, Int8 (...Int64) that can handle null integer data (pandas version &gt;= 0.24.0)
<code>df = df.astype('Int8')</code>
But you may want to only target specific columns which have integer data mixed with NaN/nulls:
<code>df = df.astype({'col1':'Int8','col2':'Int8','col3':'Int8')</code>
At this point, the NaN's are converted into <code>&lt;NA&gt;</code> and if you want to change the default null value with df.fillna(), you need to coerce the object datatype on the columns you wish to change, otherwise you will see
<code>TypeError: &lt;U1 cannot be converted to an IntegerDtype</code>
You can do this by
<code>df = df.astype(object)</code> if you don't mind changing every column datatype to object (individually, each value's type is still preserved) ... OR
<code>df = df.astype({&quot;col1&quot;: object,&quot;col2&quot;: object})</code> if you prefer to target individual columns.
This should help with forcing your integer columns mixed with nulls to stay formatted as integers and change the null values to whatever you like. I can't speak to the efficiency of this method, but it worked for my formatting and printing purposes.

blocks|key|167081|text|下面的解决方案是我唯一的解决方案，我认为它是使用最新的Pandas版本时最好的解决方案。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|167082|df['A']+=+np.floor(pd.to_numeric(df['A'],
+++++++++++++++++++errors='coerce'))
+++++++++++++++++++.astype('Int64')|code-block|syntax|javascript|167083|我在StackOverflow上找到了解决方案，请参阅下面的链接以了解更多信息。https://stackoverflow.com/a/67021201/9294498|offset|length|167084|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/a/67021201/9294498^0|0|0|14|18|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|T|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|U|8|@]|9|@$I|V|J|W|1|X]]|A|$]]|$1|K|3|-4|5|6|7|Y|8|@]|9|@]|A|$]]]|L|$M|$5|N|O|P|A|$Q|R]]]]

The following solution is the only one that serves my purpose, and I think it is the best solution when using a recent Pandas version.
<pre><code>df['A'] = np.floor(pd.to_numeric(df['A'],
 errors='coerce'))
 .astype('Int64')
</code></pre>
I find the solution on StackOverflow see the link below for more information.
<a href="https://stackoverflow.com/a/67021201/9294498">https://stackoverflow.com/a/67021201/9294498</a>

blocks|key|4672177|text|首先删除包含NaN的行。然后对其余行执行整数转换。最后，再次插入删除的行。希望它能起作用|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4672178|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

First remove the rows which contain NaN. Then do Integer conversion on remaining rows.
At Last insert the removed rows again.
Hope it will work

blocks|key|166573|text|使用pd.to_numeric()|type|unstyled|depth|inlineStyleRanges|entityRanges|data|166574|df["DateColumn"]+=+pd.to_numeric(df["DateColumn"])|code-block|syntax|javascript|166575|简单明了|166576|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

use pd.to_numeric()
<pre><code>df[&quot;DateColumn&quot;] = pd.to_numeric(df[&quot;DateColumn&quot;])
</code></pre>
simple and clean

blocks|key|4672678|text|与许多其他解决方案一样，Int64的问题是，如果您有null值，它们将被<NA>值替换，后者不适用于熊猫默认的“NaN”函数，如isnull()或fillna()。或者，如果您将值转换为-1，您最终可能会删除您的信息。我的解决方案有点差劲，但是会提供带有np.nan的int值，允许nan函数在不影响您的值的情况下工作。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4672679|++++++++++++def+to_int(x):
++++++++++++++++try:
++++++++++++++++++++return+int(x)
++++++++++++++++except:
++++++++++++++++++++return+np.nan

++++++++++++df[column]+=+df[column].apply(to_int)|code-block|syntax|javascript|4672680|entityMap^0|C|5|Q|4|10|4|1S|8|21|8|2L|2|3J|6|3Q|3|3X|3|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@$9|N|A|O|B|C]|$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]|$9|V|A|W|B|C]|$9|X|A|Y|B|C]|$9|Z|A|10|B|C]|$9|11|A|12|B|C]|$9|13|A|14|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|15|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|16|8|@]|D|@]|E|$]]]|L|$]]

The issue with <code>Int64</code>, like many other's solutions, is that if you have <code>null</code> values, they get replaced with <code>&lt;NA&gt;</code> values, which do not work with pandas default 'NaN' functions, like <code>isnull()</code> or <code>fillna()</code>. Or if you convert values to <code>-1</code> you end up in a situation where you may be deleting your information. My solution is a little lame, but will provide <code>int</code> values with <code>np.nan</code>, allowing for <code>nan</code> functions to work without compromising your values.
<pre><code> def to_int(x):
 try:
 return int(x)
 except:
 return np.nan

 df[column] = df[column].apply(to_int)
</code></pre>

blocks|key|166934|text|我认为对于Pandas+1.2.%2B版本，@Digestible1010101的方法更合适，像这样的东西应该可以完成这项工作：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|166935|df+=+df.astype({
++++++++++++'col_1':+'Int64',
++++++++++++'col_2':+'Int64',
++++++++++++'col_3':+'Int64',
++++++++++++'col_4':+'Int64',+})|code-block|syntax|javascript|166936|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/21287624/convert-pandas-column-containing-nans-to-dtype-int/67929601#67929601^0|K|I|0|0|0^^$0|@$1|2|3|4|5|6|7|Q|8|@]|9|@$A|R|B|S|1|T]]|C|$]]|$1|D|3|E|5|F|7|U|8|@]|9|@]|C|$G|H]]|$1|I|3|-4|5|6|7|V|8|@]|9|@]|C|$]]]|J|$K|$5|L|M|N|C|$O|P]]]]

I think the approach of <a href="https://stackoverflow.com/questions/21287624/convert-pandas-column-containing-nans-to-dtype-int/67929601#67929601">@Digestible1010101</a> is the more appropriate for Pandas 1.2.+ versions, something like this should do the job:
<pre><code>df = df.astype({
 'col_1': 'Int64',
 'col_2': 'Int64',
 'col_3': 'Int64',
 'col_4': 'Int64', })
</code></pre>

blocks|key|167014|text|既然我在这里没有看到答案，我不妨加上它：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|167015|使用一行代码将NAN转换为空字符串如果由于某种原因，您仍然不能像我一样处理np.na或pd.NA，而依赖于使用旧版本pandas的库：|167016|df.select_dtypes('number').fillna(-1).astype(str).replace('-1',+'')|offset|length|style|CODE|167017|entityMap^0|0|0|0|1V|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|M|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|N|8|@$F|O|G|P|H|I]]|9|@]|A|$]]|$1|J|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|K|$]]

Since I didn't see the answer here, I might as well add it:
One-liner to convert NANs to empty string if you for some reason you still can't handle np.na or pd.NA like me when relying on a library with an older version of pandas:
<code>df.select_dtypes('number').fillna(-1).astype(str).replace('-1', '')</code>

blocks|key|4672924|text|对于pandas+>.24版本，类型Int64支持nan。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|4672925|如果你的浮动没有被四舍五入，倒地，天花板，或四舍五入，你可能会遇到错误。|4672926|df['A']+=+np.floor(pd.to_numeric(df['A'],+errors='coerce')).astype('Int64')|code-block|syntax|javascript|4672927|来源：https://stackoverflow.com/a/67021201/1363742|4672928|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/a/67021201/1363742^0|I|5|0|0|0|3|18|0|0^^$0|@$1|2|3|4|5|6|7|W|8|@$9|X|A|Y|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|Z|8|@]|D|@]|E|$]]|$1|H|3|I|5|J|7|10|8|@]|D|@]|E|$K|L]]|$1|M|3|N|5|6|7|11|8|@]|D|@$9|12|A|13|1|14]]|E|$]]|$1|O|3|-4|5|6|7|15|8|@]|D|@]|E|$]]]|P|$Q|$5|R|S|T|E|$U|V]]]]

With pandas &gt;.24 version, type <code>Int64</code> supports nan.
You may run into an error if your floats haven't been rounded, floored, ceilinged, or rounded.
<pre><code>df['A'] = np.floor(pd.to_numeric(df['A'], errors='coerce')).astype('Int64')
</code></pre>
Source:
<a href="https://stackoverflow.com/a/67021201/1363742">https://stackoverflow.com/a/67021201/1363742</a>

blocks|key|166069|text|假设您的DateColumn格式化3312018.0应该转换为03/31/2018年作为字符串。并且，某些记录丢失或为0。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|166070|df['DateColumn']+=+df['DateColumn'].astype(int)
df['DateColumn']+=+df['DateColumn'].astype(str)
df['DateColumn']+=+df['DateColumn'].apply(lambda+x:+x.zfill(8))
df.loc[df['DateColumn']+==+'00000000','DateColumn']+=+'01011980'
df['DateColumn']+=+pd.to_datetime(df['DateColumn'],+format="%25m%25d%25Y")
df['DateColumn']+=+df['DateColumn'].apply(lambda+x:+x.strftime('%25m/%25d/%25Y'))|code-block|syntax|javascript|166071|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Assuming your DateColumn formatted 3312018.0 should be converted to 03/31/2018 as a string. And, some records are missing or 0.

<pre><code>df['DateColumn'] = df['DateColumn'].astype(int)
df['DateColumn'] = df['DateColumn'].astype(str)
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.zfill(8))
df.loc[df['DateColumn'] == '00000000','DateColumn'] = '01011980'
df['DateColumn'] = pd.to_datetime(df['DateColumn'], format="%m%d%Y")
df['DateColumn'] = df['DateColumn'].apply(lambda x: x.strftime('%m/%d/%Y'))
</code></pre>

I read data from a .csv file to a Pandas dataframe as below. For one of the columns, namely <code>id</code>, I want to specify the column type as <code>int</code>. The problem is the <code>id</code> series has missing/empty values.

When I try to cast the <code>id</code> column to integer while reading the .csv, I get:

<pre><code>df= pd.read_csv("data.csv", dtype={'id': int}) 
error: Integer column has NA values
</code></pre>

Alternatively, I tried to convert the column type after reading as below, but this time I get:

<pre><code>df= pd.read_csv("data.csv") 
df[['id']] = df[['id']].astype(int)
error: Cannot convert NA to integer
</code></pre>

How can I tackle this?

Convert Pandas column containing NaNs to dtype `int`

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我将数据从.csv文件读取到Pandas数据帧，如下所示。对于其中一列，即id，我希望将列类型指定为int。问题是id系列的值缺失/为空。在读取.csv时，当我尝试将id列转换为整数时，我得到：df= pd.read_csv("data.csv", dtype={'id': int}) error: Integer column has NA values或者，我尝试在阅读后转换列类型，如下所示，

问将包含NaNs的Pandas列转换为dtype `int`
EN

回答 24

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将包含NaNs的Pandas列转换为dtype `int`EN

回答 24

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将包含NaNs的Pandas列转换为dtype `int`
EN