早上好!
我正在尝试使用Python编写一个SPSS文件(.sav)。
这是我的代码:
import pandas as pd
df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
df.head()
我得到了这个错误:
df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
File "C:\Users\bonif\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\spss.py", line 44, in read_spss
df, _ = pyreadstat.read_sav(
File "pyreadstat\pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
File "pyreadstat\_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat\_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat\_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)
我发现错误可能是因为有一些单词包含字母"ñ“,或者一些单词包含以下字符"á”。我该如何解决这个问题呢?
数据库在这个谷歌驱动器中:https://drive.google.com/drive/folders/1P8v5NWE-GdAEJRZdmrp5KiL-DODClmfU?usp=sharing
非常感谢
发布于 2021-04-25 21:27:02
正如ti7建议的那样,使用latin1,并且您需要指定编码,在本例中,pyreadstat将完成此操作:
>>> import pyreadstat
# This raises an error
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)
# This is fine
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav", encoding="latin1")
>>>
发布于 2021-04-23 00:41:35
Pandas调用pyreadstat
读取SPSS文件src
您可能会更幸运地直接使用它,因为它有一个设置编码的选项
从docs https://github.com/Roche/pyreadstat#other-options
您可以手动设置原始文件的编码。编码必须是iconv-compatible encoding。如果您正在处理带有非ascii字符的旧的xport文件,这是绝对必要的。这些文件没有在文件本身中标记编码,因此必须手动设置编码。
import pyreadstat
df, meta = pyreadstat.read_sav(path, encoding=my_encoding)
也可能是你根本没有安装iconv (它依赖于它的编码),但我对此表示怀疑(你会得到一些其他的错误)
https://stackoverflow.com/questions/67217341
复制相似问题