如何通过查看更多按钮或使用bs4向下滚动来搜索隐藏的产品?在我的例子中,我试图从下面的链接中抓取所有的搜索结果,但我只能抓取20本书,即使有超过20本书。在这种情况下,我如何获取所有搜索结果,以及如何在其他执行相同操作的站点上执行此操作?
from bs4 import BeautifulSoup
from Book import Book
import requests
class BertrandScrapper:
def get_prices(self, title,author):
page = requests.get('https://www.bertrand.pt/pesquisa/'+(str(title)+" "+str(author)).replace(" ","+"))
soup = BeautifulSoup(page.text, 'html.parser')
titles=soup.findAll(class_='title')
for a in titles:
print(a.text.strip())
print(len(titles))https://www.bertrand.pt/pesquisa/os+maias+e%C3%A7a+de+queiroz
发布于 2021-03-21 06:27:41
页面通过对https://www.bertrand.pt/pesquisando的post请求加载。您可以检索所有标题,如下所示:
import requests
def get_results(page_nr):
data = {
'requestArea': '',
'pagina': str(page_nr),
'palavra': 'os+maias+e%C3%A7a+de+queiroz',
'filterKey': '',
'filterValue': '',
'filterName': '',
'filterMap': '',
'filterOperation': '',
'filterField': '',
'filterOrder': '',
'tab': 'livros'
}
response = requests.post('https://www.bertrand.pt/pesquisando', data=data)
soup = BeautifulSoup(response.content, 'html.parser')
titles=soup.findAll(class_='title')
return [a.text.strip() for a in titles]
page_nr = 1
titles = []
while True:
print("checking page nr", page_nr)
title_results = get_results(page_nr)
if not title_results:
print("No more results")
break
else:
titles.extend(title_results)
page_nr = page_nr+1结果titles
['Os Maias', 'Reler Eça de Queiroz', 'Os Maias', 'Os Maias', 'Maias\n\n\n(eBook)', 'Maias', 'Os Maias', 'MAIAS (OS) QUEIROZ, ECA DE', 'Os Maias', 'Os Maias', 'Os Maias\n\n\n(eBook)', 'Os Maias de Eça de Queiróz', 'Os Maias', 'The Maias', 'Os Maias\n\n\n(eBook)', 'Os Maias', 'Os Maias', 'Os Maias', 'Os Maias - Antologia Ilustrada', 'The Maias, The', 'Os Maias - Volume Ii', 'Os Maias - Volume I', 'Os Maias - Vol. 1 e 2', 'Os Maias - O Realismo']发布于 2021-03-21 06:25:37
页面正在执行Ajax请求,所以BeautifulSoup看不到它。您可以使用以下示例模拟这些Ajax请求:
import requests
from bs4 import BeautifulSoup
params = {
"requestArea": "",
"pagina": 2,
"palavra": "os maias eça de queiroz",
"filterKey": "",
"filterValue": "",
"filterName": "",
"filterMap": "",
"filterOperation": "",
"filterField": "",
"filterOrder": "",
"tab": "livros",
}
headers = {"X-Requested-With": "XMLHttpRequest"}
url = "https://www.bertrand.pt/pesquisando"
i = 1
for page in range(1, 3):
params["pagina"] = page
soup = BeautifulSoup(
requests.post(url, headers=headers, data=params).content, "html.parser"
)
for t in soup.select(".title"):
print(i, t.get_text(strip=True))
i += 1打印:
1 Os Maias
2 Reler Eça de Queiroz
3 Os Maias
4 Os Maias
5 Maias(eBook)
6 Maias
7 Os Maias
8 MAIAS (OS) QUEIROZ, ECA DE
9 Os Maias
10 Os Maias
11 Os Maias(eBook)
12 Os Maias de Eça de Queiróz
13 Os Maias
14 The Maias
15 Os Maias(eBook)
16 Os Maias
17 Os Maias
18 Os Maias
19 Os Maias - Antologia Ilustrada
20 The Maias, The
21 Os Maias - Volume Ii
22 Os Maias - Volume I
23 Os Maias - Vol. 1 e 2
24 Os Maias - O Realismohttps://stackoverflow.com/questions/66726650
复制相似问题