问无法摆脱由BeautifulSoup引起的导航性问题
EN

Stack Overflow用户

提问于 2018-06-16 02:16:52

回答 2查看 60关注 0票数 1

我用结合了BeautifulSoup的python编写了一个脚本来解析网页中的某个地址。然而，当我运行下面的脚本时，当它命中address = [item.find_next_sibling().get_text(strip=True)行时，我得到了一个问题AttributeError: 'NavigableString' object has no attribute 'text'。如果我尝试注释掉的行，我可以摆脱这个问题。然而，我想坚持目前应用的方式。我能做些什么呢？

这是我的尝试：

import requests
from bs4 import BeautifulSoup

URL = "https://beta.companieshouse.gov.uk/officers/lX9snXUPL09h7ljtMYLdZU9LmOo/appointments"

def fetch_names(session,link):
    session.headers = {"User-Agent":"Mozilla/5.0"}
    res = session.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for items in soup.select("#content-container dt"):

        #the error appears in the following line

        address = [item.find_next_sibling().get_text(strip=True) for item in items if "correspondence address" in item.text.lower()][0]
        print(address)

if __name__ == '__main__':
    with requests.Session() as session:
        fetch_names(session,URL)

我可以像下面这样做来消除这个错误，但我想坚持我在脚本中尝试的方法：

items = soup.select("#content-container dt")
address = [item.find_next_sibling().get_text(strip=True) for item in items if "correspondence address" in item.text.lower()][0]
print(address)

编辑：

这不是一个答案，但这是我尝试尝试的方法(仍然不确定如何应用.find_previous_sibling()：

import requests
from bs4 import BeautifulSoup

URL = "https://beta.companieshouse.gov.uk/officers/lX9snXUPL09h7ljtMYLdZU9LmOo/appointments"

def fetch_names(session,link):
    session.headers = {"User-Agent":"Mozilla/5.0"}
    res = session.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    for items in soup.select("#content-container dt"):
        address = [item for item in items.strings if "correspondence address" in item.lower()]
        print(address)

if __name__ == '__main__':
    with requests.Session() as session:
        fetch_names(session,URL)

并且它会产生(没有导航问题)。

[]
['Correspondence address']
[]
[]

python

python-3.x

web-scraping

beautifulsoup

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-06-17 05:05:40

items不是一个节点列表，而是一个单独的节点，所以您不应该在这里使用它作为迭代器- for item in items。只需将列表理解替换为以下内容：

for items in soup.select("#content-container dt"):
    if "correspondence address" in items.text.lower():
        address = items.find_next_sibling().get_text(strip=True)
        print(address)

票数 1

Stack Overflow用户

发布于 2018-06-16 02:35:24

您可以将BeautifulSoup选择器更改为直接查找联系地址id为# contact address -value-1。

import requests
from bs4 import BeautifulSoup


URL = "https://beta.companieshouse.gov.uk/officers/lX9snXUPL09h7ljtMYLdZU9LmOo/appointments"

def fetch_names(session,link):
    session.headers = {"User-Agent":"Mozilla/5.0"}
    res = session.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    addresses = [a.text for a in soup.select("#correspondence-address-value-1")]
    print(addresses)

if __name__ == '__main__':
    with requests.Session() as session:
        fetch_names(session,URL)

结果

13:32 $ python test.py
['21 Maes Y Llan, Conwy, Wales, LL32 8NB']

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50880811

复制

相似问题

问无法摆脱由BeautifulSoup引起的导航性问题
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无法摆脱由BeautifulSoup引起的导航性问题EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无法摆脱由BeautifulSoup引起的导航性问题
EN