首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Web抓取多个Web地址

Web抓取多个Web地址
EN

Stack Overflow用户
提问于 2019-08-14 02:16:38
回答 1查看 29关注 0票数 1

我正试图在一个脚本中遍历几个网页。但是,它只会从我的列表中的最后一个URL拉回数据

下面是我当前的代码:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup # BeautifulSoup is in bs4 package 
import requests

URLS = ['https://sc2replaystats.com/replay/playerStats/11116819/1809336', 'https://sc2replaystats.com/replay/playerStats/11116819/1809336']

for URL in URLS:
  response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')

tb = soup.find('table', class_='table table-striped table-condensed')
for link in tb.find_all('tr'):
    name = link.find('span')
    if name is not None:
        print(name['title'])

结果是:

代码语言:javascript
运行
复制
Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot
Supplydepot

当我期待的时候:

代码语言:javascript
运行
复制
Nexus
Pylon
Gateway
Assimilator
Cyberneticscore
Pylon
Assimilator
Nexus
Roboticsfacility
Pylon
Shieldbattery
Gateway
Gateway
Commandcenter
Supplydepot
Barracks
Refinery
Orbitalcommand
Commandcenter
Barracksreactor
Supplydepot
Factory
Refinery
Factorytechlab
Orbitalcommand
Starport
Bunker
Supplydepot
Supplydepot
Starporttechlab
Supplydepot
Barracks
Refinery
Supplydepot
Barracks
Engineeringbay
Refinery
Starportreactor
Factorytechlab
Supplydepot
Barracks
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Supplydepot
Commandcenter
Barrackstechlab
Barracks
Barracks
Engineeringbay
Supplydepot
Barracksreactor
Barracksreactor
Supplydepot
Armory
Supplydepot
Supplydepot
Supplydepot
Orbitalcommand
Factory
Refinery
Refinery
Supplydepot
Factoryreactor
Supplydepot
Commandcenter
Barracks
Barrackstechlab
Planetaryfortress
Supplydepot
Supplydepot
EN

回答 1

Stack Overflow用户

发布于 2019-08-14 02:23:29

按照@RomanPerekhrest的说法,在for循环中,

代码语言:javascript
运行
复制
for URL in URLS:
  response = requests.get(URL) 

这意味着你的覆盖响应每次。要解决此问题,一种方法是创建一个名为responses的数组,并将响应附加到该数组,如下所示

代码语言:javascript
运行
复制
responses = []
for URL in URLS:
  response = requests.get(URL) 
  responses.append(response)
for response in responses: 
  soup = BeautifulSoup(response.content, 'html.parser')

  tb = soup.find('table', class_='table table-striped table-condensed')
  for link in tb.find_all('tr'):
    name = link.find('span')
    if name is not None:
        print(name['title'])
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57483339

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档