我试图用以下内容解析csv文件:
# country,title1,title2,type
GB,Fast Friends,Burn Notice, S:4, E:2,episode,
SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,预期产出如下:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES, THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice, S:4, E:2', 'episode']问题是,“title”字段中的逗号没有转义。我尝试使用csvreader以及字符串和regex解析,但无法获得明确的匹配。
是否有可能在半个字段上用未转义的逗号精确地解析该文件?或者,是否需要创建一个新的csv?
发布于 2015-03-05 04:47:17
如果您可以假设所有的逗号都会出现在title2中,那么您就可以玩这个把戏了。否则,您将有不明确的数据。
strings = ['SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,'
,'GB,Fast Friends,Burn Notice, S:4, E:2,episode,'
]
for string in strings:
xs = string.split(',')
country = xs[0]
title1 = xs[1]
title2 = ' '.join(xs[2:-2])
mtype = xs[-2]
print [country, title1, title2, mtype]输出:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice S:4 E:2', 'episode']发布于 2015-03-05 05:29:32
发布于 2015-03-05 06:26:19
如果字段中有逗号,我将将excel保存为文本文件,字段由选项卡分隔。
https://stackoverflow.com/questions/28870071
复制相似问题