每次尝试写入csv时,都会出现内存错误。因此,前5 GB的数据工作正常,但随后我得到了一个内存错误。
我不知道为什么,因为我每次都试图从记忆中清除我的元素,这样它就不会发生。
def writeDataCSV(file):
try:
with open('Data/csv/'+file+'.csv','w') as fp:
for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
if elem.tag == 'row':
element_fields = elem.attrib
data = []
if(file== "Comments"):
data = commentsXML(element_fields)
wr = csv.writer(fp, dialect='excel')
wr.writerow(data)
elem.clear()
fp.close
except UnicodeEncodeError as uniError:
print(uniError)
try:
if(file== "Comments"):
df = pd.read_csv('Data/csv/Comments.csv', names=["Id","PostId","Score","Text","Date","Time","UserID"])
df.to_csv("Data/csv/Comments.csv")
except UnicodeDecodeError as uniDeError:
print(uniDeError)
MemoryError
发布于 2018-05-30 06:43:49
在你的函数中有太多的责任,很难阅读,很难调试,通常不是一个可以效仿的例子。
为了避免内存错误,我最好的猜测是将代码的读写部分分离到自己的函数中,风格如下:
import csv
# FIXME: iterparse, commentsXML are some global functions
def get_data(filename):
for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
if elem.tag == 'row':
yield commentsXML(elem.attrib)
def save_stream_to_csv_file(gen, target_csv_filename):
with open('Data/csv/'+target_csv_filename+'.csv','w') as fp:
wr = csv.writer(fp, dialect='excel')
for data in gen:
wr.writerow(data)
gen = get_data('your_source_filename')
save_stream_to_csv_file(gen, 'your_target_filename')
# WONTFIX: 'dumpData/'+str(filename)+'.xml' and
# 'Data/csv/'+target_csv_filename+'.csv' are a bit ugly
# os.join() and .format() highly welcome
https://stackoverflow.com/questions/50593590
复制相似问题