我正在尝试创建一个for循环,一旦它到达抓取的页面中的最后一个search_result属性,它将重复该循环,但会使用新抓取的网页的数据。
在for循环到达最后一个属性后,它将在网页上查找链接,并对新抓取的网页重复该循环。
我已经写了下面的代码,但循环不会重复从原始网页的链接中获得新的抓取页面。
import requests
from bs4 import BeautifulSoup
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page, 'lxml')
for search_result in soup.find_all(attrs={"search-result-index":True}):
print(search_result.name.text)
# if last search result, get link to new web page and repeat loop for the new web page.
if search_result == soup.find_all(attrs={"search-result-index":True})[-1]:
page = requests.get(soup.select_one('li.a-last [href]')['href'], headers=headers)
soup = BeautifulSoup(page, 'lxml') 你有任何关于如何做的想法吗?
发布于 2019-08-16 20:13:50
像这样的东西??
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
def func_go(URL):
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page, 'lxml')
for search_result in soup.find_all(attrs={"search-result-index":True}):
print(search_result.name.text)
if search_result == soup.find_all(attrs={"search-result-index":True})[-1]:
URL = soup.select_one('li.a-last [href]')['href']
func_go(URL)
func_go('www.example.com')https://stackoverflow.com/questions/57512827
复制相似问题