问Python在循环期间停止创建新文件
EN

Stack Overflow用户

提问于 2018-06-04 02:08:35

回答 1查看 67关注 0票数 -1

我是Python的新手，所以我需要一些帮助。我基本上正在为我自己的个人需求建立一个小的网络刮板，一切都很顺利，直到我想把抓取的数据写到它自己的文件中。给定一个包含80个urls的列表，循环将停止创建新文件，但仍将继续收集数据。我已经通过将所有数据集中到一个文件中来测试这个循环，这个方法工作得很好，但我真的需要创建单独的文件。循环将创建38个单独的文件，而不是我需要的80个文件。有人能帮我找出原因吗？我的代码如下：

while i < len(urls_to_scrape):

    with urllib.request.urlopen(urls_to_scrape[i]) as response:
        html = response.read()

    smashsoup = BeautifulSoup(html,'html.parser')
    title = smashsoup.find('h1').get_text()
    author = smashsoup.find('a', {'itemprop':'author'}).get_text();
    complete_title = title +' By '+ author

    filename = hashlib.md5(complete_title.encode('utf-8')).hexdigest() + ".txt"
    imgname = hashlib.md5(complete_title.encode('utf-8')).hexdigest() + ".jpg"
    short_desc = smashsoup.find('div', {'itemprop':'description'}).get_text();


    try:
        long_desc = smashsoup.find('div', {'id':'longDescription'}).get_text();
    except:

        long_desc = ""


    cats = smashsoup.find('div', {'itemprop':'genre'})

    category = ""
    for cat in cats.find_all('a'):
        category += cat.get_text() + " - "


    img = smashsoup.find('img',{'itemprop':'image'})
    source = img.get('src');
    nsource = source.replace('-thumb','')

    #compile everything into a single text document
    fo = open(filename,'a')
    fo.write(str(complete_title.encode('ascii','ignore'))+"\n\n")
    fo.write(str(short_desc.encode('ascii','ignore'))+"\n\n")
    fo.write(str(long_desc.encode('ascii','ignore'))+"\n\n")
    fo.write(category+"\n\n")

    fo.flush()
    fo.close()

    i += 1

python

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50669484

复制

相似问题

问Python在循环期间停止创建新文件
EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python在循环期间停止创建新文件EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python在循环期间停止创建新文件
EN