文章/答案/技术大牛

发布

社区首页 >问答首页 >Python :如果满足特定条件，如何使用不同的抓取页面重复“bs4”循环？

问Python :如果满足特定条件，如何使用不同的抓取页面重复“bs4”循环？
EN

Stack Overflow用户

提问于 2019-08-16 00:18:40

回答 1查看 139关注 0票数 1

我正在尝试创建一个for循环，一旦它到达抓取的页面中的最后一个search_result属性，它将重复该循环，但会使用新抓取的网页的数据。

在for循环到达最后一个属性后，它将在网页上查找链接，并对新抓取的网页重复该循环。

我已经写了下面的代码，但循环不会重复从原始网页的链接中获得新的抓取页面。

import requests
from bs4 import BeautifulSoup

page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page, 'lxml')

for search_result in soup.find_all(attrs={"search-result-index":True}):
    print(search_result.name.text)

    # if last search result, get link to new web page and repeat loop for the new web page.
    if search_result == soup.find_all(attrs={"search-result-index":True})[-1]:
        page = requests.get(soup.select_one('li.a-last [href]')['href'], headers=headers)
        soup = BeautifulSoup(page, 'lxml')

你有任何关于如何做的想法吗？

python

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-08-16 20:13:50

像这样的东西？？

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}


def func_go(URL):
    page = requests.get(URL, headers=headers)
    soup = BeautifulSoup(page, 'lxml')
    for search_result in soup.find_all(attrs={"search-result-index":True}):
        print(search_result.name.text)

        if search_result == soup.find_all(attrs={"search-result-index":True})[-1]:
            URL = soup.select_one('li.a-last [href]')['href']
            func_go(URL)




func_go('www.example.com')

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57512827

复制

相似问题

问Python :如果满足特定条件，如何使用不同的抓取页面重复“bs4”循环？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python :如果满足特定条件，如何使用不同的抓取页面重复“bs4”循环？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python :如果满足特定条件，如何使用不同的抓取页面重复“bs4”循环？
EN