我要导入的文件的编码类型有问题(它包含波兰特殊字符)。我该怎么做呢?
错误说:
Traceback (most recent call last):
File "D:/Users/Denis/Dysk Google/scripts/python/napisy/napisy", line 6, in <module>
str = inputfile.read() #name for the file
File "D:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 2: character maps to <undefined>
有问题的部分是:
inputfilename = "a.txt"
outputfilename = inputfilename[0:-4]+"_fixed"+".txt"
inputfile = open(inputfilename, 'r')
str = inputfile.read() #name for the file
newstring = str.replace("œ", "s").replace("ê","e").replace("³","l").replace("¹","a").replace("¿","z").replace("ñ","n").replace("Ÿ","z").replace("æ","c")
outputfile = open(outputfilename, "w")
outputfile.write(newstring)
outputfile.close()
发布于 2014-03-01 14:40:43
您应该尝试使用“cp1250”作为编码:
import codecs
content = None
with codecs.open('file-name', 'r', encoding='cp1250') as f:
content = f.read()
print(content)
如果失败,也可以尝试ISO-8859-2编码。
https://stackoverflow.com/questions/22121108
复制