如何将Backoff脚本插入到我的网页抓取中

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (3)

我想在我的网页中使用“Backoff”软件包,但我无法让它生效。我在哪里插入?如何让“r =请求......”被识别?

我已经尝试过以各种方式将语句放入我的代码中,但都不行。

要插入的代码

@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_time=60)

def get_url(what goes here?):
    return requests.get(what goes here?)

现有代码:

import os
import requests
import re
import backoff

asin_list = ['B079QHML21']
urls = []
print('Scrape Started')
for asin in asin_list:
  product_url = f'https://www.amazon.com/dp/{asin}'
  urls.append(product_url)
  base_search_url = 'https://www.amazon.com'
  scraper_url = 'http://api.scraperapi.com'

  while len(urls) > 0:
    url = urls.pop(0)
    payload = {key, url}  #--specific parameters
    r = requests.get(scraper_url, params=payload)
    print("we got a {} response code from {}".format(r.status_code, url))
    soup = BeautifulSoup(r.text, 'lxml')

    #Scraping Below#

我希望在代码中设计的“退避”代码能够重试500个错误而不会出现故障

提问于
用户回答回答于

不是直接调用:

requests.get(scraper_url, params=payload)

改变get_url来完成这一功能,并呼吁get_url

@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_time=60)

def get_url(scraper_url, payload):
    return requests.get(scraper_url, params=payload)

并在您的代码而不是:

r = requests.get(scraper_url, params=payload)

中做:

r = get_url(scraper_url, payload)

扫码关注云+社区

领取腾讯云代金券