我正在尝试从下面的网页中抓取表格中的:
https://postcodebijadres.nl/3800
它显示了前25个结果,但对于其余的结果,您需要单击next按钮来查看它们。
我有一个python脚本,其中我使用请求和漂亮的汤来抓取表格,但只能从HTML中直接抓取前25个结果。我对此完全是新手,经过一些谷歌搜索,我仍然不能想出如何从所有页面中检索所有数据。
问题是,当选择新的结果页面时,URL不会更改。
有没有人能带我到正确的方向?
致以亲切的问候,
埃沃德
发布于 2021-05-25 02:30:50
您不需要在此页面上处理分页。所有数据都已存储在<table>
中
import requests
from bs4 import BeautifulSoup
url = "https://postcodebijadres.nl/3800"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for idx, tr in enumerate(soup.select("#postcodes-table tr")[1:], 1):
tds = [td.get_text(strip=True) for td in tr.select("td")]
print(idx, *tds)
打印:
1 3800 AA postbus 0 - 10000
2 3800 AB postbus 0 - 10000
3 3800 AC postbus 0 - 10000
4 3800 AD postbus 0 - 10000
5 3800 AE postbus 0 - 10000
6 3800 AG postbus 0 - 10000
7 3800 AH postbus 0 - 10000
8 3800 AJ postbus 0 - 10000
9 3800 AK postbus 0 - 10000
10 3800 AL postbus 0 - 10000
11 3800 AM postbus 0 - 10000
12 3800 AN postbus 0 - 10000
13 3800 AP postbus 0 - 10000
14 3800 AR postbus 0 - 10000
15 3800 AS postbus 0 - 10000
16 3800 AT postbus 0 - 10000
17 3800 AV postbus 0 - 10000
18 3800 AW postbus 0 - 10000
19 3800 AX postbus 0 - 10000
20 3800 AZ postbus 0 - 10000
21 3800 BA postbus 0 - 10000
22 3800 BB postbus 0 - 10000
23 3800 BC postbus 0 - 10000
24 3800 BD postbus 0 - 10000
25 3800 BE postbus 0 - 10000
26 3800 BG postbus 0 - 10000
27 3800 BH postbus 0 - 10000
28 3800 BJ postbus 0 - 10000
29 3800 BK postbus 0 - 10000
30 3800 BL postbus 0 - 10000
31 3800 BM postbus 0 - 10000
32 3800 BN postbus 0 - 10000
33 3800 BP postbus 0 - 10000
34 3800 BR postbus 0 - 10000
35 3800 BS postbus 0 - 10000
36 3800 BT postbus 0 - 10000
37 3800 BV postbus 0 - 10000
38 3800 CA postbus 0 - 10000
39 3800 CB postbus 0 - 10000
40 3800 CC postbus 0 - 10000
41 3800 CD postbus 0 - 10000
42 3800 CE postbus 0 - 10000
43 3800 DA postbus 0 - 10000
44 3800 DB postbus 0 - 10000
45 3800 EA postbus 0 - 10000
46 3800 GB postbus 0 - 10000
47 3800 GC postbus 0 - 10000
48 3800 GD postbus 0 - 10000
49 3800 GE postbus 0 - 10000
50 3800 GG postbus 0 - 10000
51 3800 GJ postbus 0 - 10000
52 3800 GK postbus 0 - 10000
53 3800 HA postbus 0 - 10000
54 3800 HD postbus 0 - 10000
55 3800 NA postbus 0 - 10000
https://stackoverflow.com/questions/67677071
复制相似问题