文章/答案/技术大牛

发布

社区首页 >问答首页 >不能用BeautifulSoup4刮网站

问不能用BeautifulSoup4刮网站
EN

Stack Overflow用户

提问于 2018-04-17 17:18:46

回答 1查看 55关注 0票数 1

我想刮的是第123次会议

https://www.bcb.gov.br/en/#!/c/copomstatements/1724

为此，我使用以下代码

import urllib.request           #get the HTML page from url 
import urllib.error

from bs4 import BeautifulSoup


# set page to read
with urllib.request.urlopen('https://www.bcb.gov.br/en/#!/c/copomstatements/1724') as response:
   page = response.read()

# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, "html.parser")
print(soup)

# Inspect: <h3 class="BCTituloPagina ng-binding">123rd Meeting</h3>
title = soup.find("h3", attrs={"class": "BCTituloPagina ng-binding"})
print(title)

然而，命令

print(soup)

既不返回标题:123次会议，也不返回正文:鉴于.目标为25个基点。

python-3.x

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-04-17 17:33:36

您不能使用python中的常规请求库来提取标题，因为您要提取的元素是用javascript呈现的。您需要使用selenium来实现您的目标。

代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://www.bcb.gov.br/en/#!/c/copomstatements/1724')
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//h3')))
title = driver.find_element_by_xpath('//h3').text
print(title)
driver.close()

输出：

123rd Meeting

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49883971

复制

相似问题

问不能用BeautifulSoup4刮网站
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不能用BeautifulSoup4刮网站EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不能用BeautifulSoup4刮网站
EN