当我试图解析由MS SQL 2005 express edition的查询导出的csv时,字符串python给我完全出乎意料。例如,如果csv文件中的行是:“aaa,bbb,ccc,dddd",那么当python将其解析为字符串时,它就变成了:”a,b,c,c,d d“,类似于that.....What?
我试图删除代码中的空格,但不起作用。
import os
import random
f1 = open('a.txt', 'r')
f2 = open('dec_sql.txt', 'w')
text = 'abc'
while(text != ''):
text = f1.readline()
if(text==''):
break
splited = text.split(',')
for i in range (0, 32):
splited[i] = splited[i].replace(' ', '')
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values ('
sql += '\'' + splited[0] + '\', '
sql += '\'' + splited[1] + '\', '
sql += '\'' + splited[2] + '\', '
sql += '\'' + splited[3] + '\', '
sql += '\'' + splited[4] + '\', '
sql += '\'' + splited[5] + '\', '
sql += '\'' + splited[6] + '\', '
sql += '\'' + splited[7] + '\', '
sql += '\'' + splited[8] + '\', '
sql += '\'' + splited[9] + '\', '
sql += '\'' + splited[10] + '\', '
sql += '\'' + splited[11] + '\', '
sql += '\'' + splited[12] + '\', '
sql += '\'' + splited[13] + '\', '
sql += '\'' + splited[14] + '\', '
sql += '\'' + splited[15] + '\', '
sql += '\'' + splited[16] + '\', '
sql += '\'' + splited[17] + '\', '
sql += '\'' + splited[18] + '\', '
sql += '\'' + splited[19] + '\', '
sql += '\'' + splited[20] + '\', '
sql += '\'' + splited[21] + '\', '
sql += '\'' + splited[22] + '\', '
sql += '\'' + splited[23] + '\', '
sql += '\'' + splited[24] + '\', '
sql += '\'' + splited[25] + '\', '
sql += '\'' + splited[26] + '\', '
sql += '\'' + splited[27] + '\', '
sql += '\'' + splited[28] + '\', '
sql += '\'' + splited[29] + '\', '
sql += '\'' + splited[30] + '\', '
sql += '\'' + splited[31] + '\', '
sql += '\'' + splited[32] + '\' '
sql += ')'
print sql
f2.write(sql+'\n')
f2.close()
f1.close()
发布于 2011-03-21 10:28:16
使用Python内置的CSV阅读器可能会有所帮助。看起来像是unicode的问题,这个问题让我很沮丧。
import tkFileDialog
import csv
ENCODING_REGEX_REPLACEMENT_LIST = [(re.compile('\xe2\x80\x99'), "'"),
(re.compile('\xe2\x80\x94'), "--"),
(re.compile('\xe2\x80\x9c'), '"'),
(re.compile('\xe2\x80\x9d'), '"'),
(re.compile('\xe2\x80\xa6'), '...')]
def correct_encoding(csv_row):
for key in csv_row.keys():
# if there is a value for the current key
if csv_row[key]:
try:
csv_row[key] = unicode(csv_row[key], errors='strict')
except ValueError:
# we have a bad encoding, try iterating through all the known
# bad encodings in the ENCODING_REGEX_REPLACEMENT and replace
# everything and then try again
for (regex, replacement) in ENCODING_REGEX_REPLACEMENT_LIST:
csv_row[key] = regex.sub(replacement,csv_row[key])
print(csv_row)
csv_row[key] = unicode(csv_row[key])
# if there is NOT a value for the current key
else:
csv_row[key] = unicode('')
return csv_row
filename = tkFileDialog.askopenfilename()
csv_reader = csv.DictReader(open(filename, "rb"), dialect='excel') # assuming similar dialect
for csv_row in csv_reader:
csv_row = correct_encoding(csv_row)
# your application logic here
发布于 2011-03-21 10:25:00
在我看来,MS SQL 2005查询的输出是一个unicode文件。python不能处理csv module文件,但是csv模块的文档中有一些sample code描述了如何解决这个问题。
或者,某些文本编辑器允许您使用不同的编码保存文件。例如,我在Notepad++中打开MSSQL2005查询的结果,它告诉我文件是UCS-2编码的,我可以从编码菜单将其转换为UTF-8。
发布于 2011-03-21 10:23:16
尝试在记事本中打开文件,并使用全部替换功能将' '
替换为''
https://stackoverflow.com/questions/5373195
复制相似问题