我试图在维基百科头版“维基百科的其他领域”部分中删除文本。但是,我遇到了错误ResultSet object has no attribute 'find'
。我的代码有什么问题,我如何让它工作?
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml' )
otherAreasContainer = soup.find_all('div', class_='mp-bordered')
otherAreasContainerTexts = otherAreasContainer.find_all('li')
for otherAreasContainerText in otherAreasContainerTexts:
print(otherAreasContainerText.text)
发布于 2020-07-15 16:34:01
在您的代码中,otherAreasContainer
是ResultSet
类型的,ResultSet
没有.find_all()
方法。
要从<li>
下选择"Other areas of Wikipedia"
,可以使用CSS选择器h2:contains("Other areas of Wikipedia") + div li
。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
for li in soup.select('h2:contains("Other areas of Wikipedia") + div li'):
print(li.text)
指纹:
Community portal – Bulletin board, projects, resources and activities covering a wide range of Wikipedia areas.
Help desk – Ask questions about using Wikipedia.
Local embassy – For Wikipedia-related communication in languages other than English.
Reference desk – Serving as virtual librarians, Wikipedia volunteers tackle your questions on a wide range of subjects.
Site news – Announcements, updates, articles and press releases on Wikipedia and the Wikimedia Foundation.
Village pump – For discussions about Wikipedia itself, including areas for technical issues and policies.
更多关于CSS选择器的信息。
发布于 2020-07-15 16:33:20
运行你的代码
Traceback (most recent call last):
File "h.py", line 7, in <module>
otherAreasContainerTexts = otherAreasContainer.find_all('li')
File "/home/td/anaconda3/lib/python3.7/site-packages/bs4/element.py", line 1620, in __getattr__
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
这应该是你问题的一部分--让我们很容易发现你的问题!
find_all
返回一个ResultSet
,本质上是找到的元素列表。您需要枚举每个元素才能继续。
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml' )
otherAreasContainer = soup.find_all('div', class_='mp-bordered')
for other in otherAreasContainer:
otherAreasContainerTexts = other.find_all('li')
for otherAreasContainerText in otherAreasContainerTexts:
print(otherAreasContainerText.text)
发布于 2020-07-15 16:45:25
find_all
的结果是一个列表,而list没有find
或find_all
属性,您必须迭代otherAreasContainer
,然后在其上调用find_all
方法,如下所示:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
otherAreasContainer = soup.find_all('div', class_='mp-bordered')
for other in otherAreasContainer:
otherAreasContainerTexts = other.find_all('li')
for otherAreasContainerText in otherAreasContainerTexts:
print(otherAreasContainerText.text)
https://stackoverflow.com/questions/62919388
复制相似问题