我试图在Pandas中打开一个.CSV文件,但是我一直收到一个编码错误。我确实尝试过所有可能的编码代码,但没有一种是有效的:
encode_list = ['ascii','big5','big5hkscs','cp037','cp273','cp424','cp437','cp500','cp720','cp737','cp775','cp850','cp852','cp855','cp856','cp857','cp858','cp860','cp861','cp862','cp863','cp864','cp865','cp866','cp869','cp874','cp875','cp932','cp949','cp950','cp1006','cp1026','cp1125','cp1140','cp1250','cp1251','cp1252','cp1253','cp1254','cp1255','cp1256','cp1257','cp1258','euc_jp','euc_jis_2004','euc_jisx0213','euc_kr','gb2312','gbk','gb18030','hz','iso2022_jp','iso2022_jp_1','iso2022_jp_2','iso2022_jp_2004','iso2022_jp_3','iso2022_jp_ext','iso2022_kr','latin_1','iso8859_2','iso8859_3','iso8859_4','iso8859_5','iso8859_6','iso8859_7','iso8859_8','iso8859_9','iso8859_10','iso8859_11','iso8859_13','iso8859_14','iso8859_15','iso8859_16','johab','koi8_r','koi8_t','koi8_u','kz1048','mac_cyrillic','mac_greek','mac_iceland','mac_latin2','mac_roman','mac_turkish','ptcp154','shift_jis','shift_jis_2004','shift_jisx0213','utf_32','utf_32_be','utf_32_le','utf_16','utf_16_be','utf_16_le','utf_7','utf_8','utf_8_sig']
for encode in encode_list:
try:
df= pd.read_csv("myFile.csv", encoding = encode)
print(encode)
except Exception as e:
print(f"error: {e}")
pass
以下是所有的错误:
error: 'ascii' codec can't decode byte 0x92 in position 15: ordinal not in range(128)
error: 'big5' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'big5hkscs' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: 'charmap' codec can't decode byte 0x76 in position 12: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0xad in position 49: character maps to <undefined>
error: 'charmap' codec can't decode byte 0xf2 in position 60: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0x94 in position 50: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: 'cp932' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'cp949' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'cp950' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
error: 'charmap' codec can't decode byte 0x81 in position 116: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x98 in position 145: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x81 in position 116: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x81 in position 116: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: 'charmap' codec can't decode byte 0x9a in position 67: character maps to <undefined>
error: 'euc_jp' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'euc_jis_2004' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'euc_jisx0213' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'euc_kr' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'gb2312' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'gbk' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'gb18030' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'hz' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp_1' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp_2' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp_2004' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp_3' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_jp_ext' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: 'iso2022_kr' codec can't decode byte 0x92 in position 15: illegal multibyte sequence
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0xf0 in position 22: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0xb2 in position 17: character maps to <undefined>
error: 'charmap' codec can't decode byte 0xd2 in position 172: character maps to <undefined>
error: 'charmap' codec can't decode byte 0xc3 in position 53: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0xdb in position 104: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'johab' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0x9c in position 58: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'charmap' codec can't decode byte 0x98 in position 145: character maps to <undefined>
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
error: 'shift_jis' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'shift_jis_2004' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'shift_jisx0213' codec can't decode byte 0xf0 in position 22: illegal multibyte sequence
error: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
error: 'utf-32-be' codec can't decode bytes in position 0-3: code point not in range(0x110000)
error: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
error: 'utf-16-le' codec can't decode bytes in position 122-123: illegal UTF-16 surrogate
error: 'utf-16-be' codec can't decode bytes in position 80-81: illegal UTF-16 surrogate
error: 'utf-16-le' codec can't decode bytes in position 122-123: illegal UTF-16 surrogate
error: 'utf7' codec can't decode byte 0x92 in position 15: unexpected special character
error: 'utf-8' codec can't decode byte 0x92 in position 15: invalid start byte
error: 'utf-8' codec can't decode byte 0x92 in position 15: invalid start byte
如果我试图用记事本打开这个特定的.CSV,数据都是乱七八糟的,但是如果我用Excel或Gnumeric打开它,我就会把数据完美地放在一个表中。
这个文件有客户端的信息,所以我很遗憾不能共享它。
我如何打开这个文件作为熊猫的数据?
发布于 2022-03-22 14:29:43
如果您查看一下read_csv()
,您会发现可以使用参数encoding_errors='ignore'
忽略那些编码错误,然后继续导入。这应该允许您用最合适的编解码器打开文件。
这个参数的其他合适值可以在python编解码文档中找到。
https://stackoverflow.com/questions/71573722
复制相似问题