我有一个脚本,从一个.txt文件导出所有电子邮件地址,并打印所有的电子邮件地址。我想将其保存到list.txt,如果可能的话,删除重复项,但它会给出错误
Traceback (most recent call last):
File "mail.py", line 44, in <module>
notepad.write(email.read())
AttributeError: 'str' object has no attribute 'read'
脚本:
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo@bar.com' as '//foo@bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
#print email
notepad = open("list.txt","wb")
notepad.write(email.read())
notepad.close()
else:
print '"{}" is not a file.'.format(arg)
parser.print_usage()
发布于 2018-02-28 13:36:31
当我删除.read()时,它只在list.txt中显示1个电子邮件地址,当我使用打印电子邮件时,它显示了几百个电子邮件地址。在提取繁忙时刷新list.txt时,电子邮件地址会更改,但只显示1。
这是因为在循环中有open()
和close()
,也就是说,每个email
的文件都是重新写入的,最后只写入最后一个地址行。将循环更改为:
notepad = open("list.txt", "wb")
for email in get_emails(file_to_str(arg)):
#print email
notepad.write(email)
notepad.close()
或者更好:
with open("list.txt", "wb") as notepad:
for email in get_emails(file_to_str(arg)):
#print email
notepad.write(email)
https://stackoverflow.com/questions/40929744
复制相似问题