我试图从python和Beautiful soup的页面中抓取一些信息,但我似乎无法将正确的路径写到我需要的地方,html是:
<div class="operator active" data-operator_name="Etisalat" data-
operator_id="5"><div class="operator_name_etisalat"></div></div>
我正在尝试获得名为"Etisalat“的运算符,我得到了以下结果:
def list_contries():
select = Select(driver.find_element_by_id('international_country'))
select.select_by_visible_text('France')
request = requests.get("https://mobilerecharge.com/buy/mobile_recharge?country=Afghanistan&operator=Etisalat")
content = request.content
soup = BeautifulSoup(content, "html.parser")
# print(soup.prettify())
prov=soup.find("div", {"class": "operator active"})['data-operator_name']
# prov = soup.find("div", {"class": "operator deselected"})
print(prov)
operator = (prov.text.strip())
但这只会返回一个NoneType。所以有些地方不对劲,有人能告诉我我哪里做错了吗?谢谢。
发布于 2018-07-27 20:42:01
不知何故,当我从浏览器访问链接时,除非我检查元素,否则我无法看到您想要的字段。因此,我在答案中使用了Selenium。
from bs4 import BeautifulSoup
from selenium import webdriver
scrapeLink = 'https://mobilerecharge.com/buy/mobile_recharge?country=Afghanistan&operator=Etisalat'
driver = webdriver.Firefox(executable_path = 'C:\geckodriver.exe')
driver.get(scrapeLink)
html = driver.execute_script('return document.body.innerHTML')
driver.close()
soup = BeautifulSoup(html,'html.parser')
operator = len(soup.find_all('div', class_ = 'operator'))
for i in range(operator):
print(soup.find_all('div', class_ = 'operator')[i].get('data-operator_name'))
输出:
Roshan
Etisalat
MTN
Wireless
https://stackoverflow.com/questions/51546730
复制相似问题