我是Python的新手,所以我需要一些帮助。我基本上正在为我自己的个人需求建立一个小的网络刮板,一切都很顺利,直到我想把抓取的数据写到它自己的文件中。给定一个包含80个urls的列表,循环将停止创建新文件,但仍将继续收集数据。我已经通过将所有数据集中到一个文件中来测试这个循环,这个方法工作得很好,但我真的需要创建单独的文件。循环将创建38个单独的文件,而不是我需要的80个文件。有人能帮我找出原因吗?我的代码如下:
while i < len(urls_to_scrape):
with urllib.request.urlopen(urls_to_scrape[i]) as response:
html = response.read()
smashsoup = BeautifulSoup(html,'html.parser')
title = smashsoup.find('h1').get_text()
author = smashsoup.find('a', {'itemprop':'author'}).get_text();
complete_title = title +' By '+ author
filename = hashlib.md5(complete_title.encode('utf-8')).hexdigest() + ".txt"
imgname = hashlib.md5(complete_title.encode('utf-8')).hexdigest() + ".jpg"
short_desc = smashsoup.find('div', {'itemprop':'description'}).get_text();
try:
long_desc = smashsoup.find('div', {'id':'longDescription'}).get_text();
except:
long_desc = ""
cats = smashsoup.find('div', {'itemprop':'genre'})
category = ""
for cat in cats.find_all('a'):
category += cat.get_text() + " - "
img = smashsoup.find('img',{'itemprop':'image'})
source = img.get('src');
nsource = source.replace('-thumb','')
#compile everything into a single text document
fo = open(filename,'a')
fo.write(str(complete_title.encode('ascii','ignore'))+"\n\n")
fo.write(str(short_desc.encode('ascii','ignore'))+"\n\n")
fo.write(str(long_desc.encode('ascii','ignore'))+"\n\n")
fo.write(category+"\n\n")
fo.flush()
fo.close()
i += 1
https://stackoverflow.com/questions/50669484
复制相似问题