URL = "https://bitcointalk.org/index.php?board=1.0"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
links_with_text = []
for random in soup.find_all("span", attrs={"id": re.compile("^msg")}):
for b in random.find_all('a', href=True):
print(b['href'])
上面的代码运行得很好。我可以从第一页获取所有主题,但问题出在获取主题名称上。有什么想法吗?
发布于 2021-02-28 18:59:57
下面的代码应该可以工作:
URL = "https://bitcointalk.org/index.php?board=1.0"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
links_with_text = []
for random in soup.find_all("span", attrs={"id": re.compile("^msg")}):
for b in random.find_all('a', href=True):
print(b['href'])
print(b.string)
示例输出:
...
https://bitcointalk.org/index.php?topic=5320354.0
Craig Wright's Latest Escapade -- Give me the bitcoins I stole from Mt. Gox!
https://bitcointalk.org/index.php?topic=5233719.0
Opera now lets US users buy crypto with Apple Pay or debit card
...
https://stackoverflow.com/questions/66408044
复制相似问题