嘿,我遇到了一个问题,我的程序在57802记录处停止了对文件的迭代,因为某种原因,我不能确定。我放了一个心跳部分,这样我就可以看到它在哪一行上,这很有帮助,但现在我卡住了,为什么它会停在这里。我认为这是一个内存问题,但我只是在我的6 6GB内存的计算机上运行它,它仍然停止。
有没有更好的方法来做下面我正在做的事情?我的目标是读取该文件(如果您需要我将其发送给您,我可以15MB文本日志),根据regex表达式找到一个匹配项,并打印匹配的行。还会有更多,但这就是我所得到的。我使用的是python 2.6
任何想法和代码注释都会有帮助!我是一名蟒蛇新手,还在学习。
import sys, os, os.path, operator
import re, time, fileinput
infile = os.path.join("C:\\","Python26","Scripts","stdout.log")
start = time.clock()
filename = open(infile,"r")
match = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),\d{3} +\w+ +\[([\w.]+)\] ((\w+).?)+:\d+ - (\w+)_SEARCH:(.+)')
count = 0
heartbeat = 0
for line in filename:
heartbeat = heartbeat + 1
print heartbeat
lookup = match.search(line)
if lookup:
count = count + 1
print line
end = time.clock()
elapsed = end-start
print "Finished processing at:",elapsed,"secs. Count of records =",count,"."
filename.close()这是第57802行,它失败了:
2010-08-06 08:15:15,390 DEBUG [ah_admin] com.thg.struts2.SecurityInterceptor.intercept:46 - Action not SecurityAware; skipping privilege check.这是一个匹配的行:
2010-08-06 09:27:29,545 INFO [patrick.phelan] com.thg.sam.actions.marketmaterial.MarketMaterialAction.result:223 - MARKET_MATERIAL_SEARCH:{"_appInfo":{"_appId":21,"_companyDivisionId":42,"_environment":"PRODUCTION"},"_description":"symlin","_createdBy":"","_fieldType":"GEO","_geoIds":["Illinois"],"_brandIds":[2883],"_archived":"ACTIVE","_expired":"UNEXPIRED","_customized":"CUSTOMIZED","_webVisible":"VISIBLE_ONLY"}仅前5行的样本数据:
2010-08-06 00:00:00,035 DEBUG [] com.thg.sam.jobs.PlanFormularyLoadJob.executeInternal:67 - Entered into PlanFormularyLoadJob: executeInternal
2010-08-06 00:00:00,039 DEBUG [] com.thg.ftpComponent.service.JScapeFtpService.open:153 - Opening FTP connection to sdrive/hibbert@tccfp01.hibbertnet.com:21
2010-08-06 00:00:00,040 DEBUG [] com.thg.sam.email.EmailUtils.sendEmail:206 - org.apache.commons.mail.MultiPartEmail@446e79
2010-08-06 00:00:00,045 DEBUG [] com.thg.sam.services.OrderService.getOrdersWithStatus:121 - Orders list size=13
2010-08-06 00:00:00,045 DEBUG [] com.thg.ftpComponent.service.JScapeFtpService.open:153 - Opening FTP connection to sdrive/hibbert@tccfp01.hibbertnet.com:21发布于 2010-09-01 08:36:54
您编译了您的正则表达式,但从未使用过它?
lookup = re.search(match,line)应该是
lookup = match.search(line)并且您应该使用os.path.join()
infile = os.path.join("C:\\","Python26","Scripts","stdout.log")更新:
您的正则表达式可以是simpler.Just检查日期和时间戳。否则,请不要使用正则表达式。假设您的日期和时间从行首开始
for line in open("stdout.log"):
s = line.split()
D,T=s[0],s[1]
# use the time module and strptime to check valid date/time
# or you can split "-" on D and T and do manual check using > or < and mathhttps://stackoverflow.com/questions/3614075
复制相似问题