问使用beautifulsoup和一系列网址进行数据抓取
EN

Stack Overflow用户

提问于 2018-12-17 01:21:52

回答 1查看 0关注 0票数 0

试图将数据刮到howlongtbeat.com

到目前为止，所有工作都有效，但是在网址中遇到问题

到目前为止这是我的代码：

import csv, re
from bs4 import BeautifulSoup as soup
import requests
flag = False
with open('filename.csv', 'w') as f:
  write = csv.writer(f)
  for i in range(1, 100):
    s = soup(requests.get('https://howlongtobeat.com/game.php?id={i}').text, 'html.parser')
    if not flag: #write header to file once
      write.writerow(['Name', 'Length']+[re.sub('[:\n]+', '', i.find('strong').text) for i in s.find_all('div', {'class':'profile_info'})])
      flag = True

content = s.find('div', {"class":'profile_header shadow_text'})
if content: 
  name = s.find('div', {"class":'profile_header shadow_text'}).text
  length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
  stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]

我的csv没有填满

如何让这个工作正常？

回答 1

Stack Overflow用户

发布于 2018-12-17 10:41:25

有些页面可能不包含预期的标签，这就是s.find（'div'，{“class”：'profile_header shadow_text'}）为None的原因。例如，检查id = 3。

你应该检查find（）是不是在提取文本之前

content = s.find('div', {"class":'profile_header shadow_text'})
if content: 
  name = s.find('div', {"class":'profile_header shadow_text'}).text
  length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
  stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]

另一种解决方法是使用try / except跳过出现问题的页面：

try: 
  name = s.find('div', {"class":'profile_header shadow_text'}).text
  length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
  stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]
except:
  continue

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/-100003078

复制

相似问题

问使用beautifulsoup和一系列网址进行数据抓取
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用beautifulsoup和一系列网址进行数据抓取EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用beautifulsoup和一系列网址进行数据抓取
EN