我正在尽量减少我的代码,使它更有效率。然而,我被这辆KeyError卡车撞了,我不知道出了什么问题。请帮帮我,告诉我为什么我的表情不好?我是业余水平的。
有了这些守则:
recommended = soup.select('table:has(font:contains("推荐主题")), '
'table:has(font:contains("版块主题"))')
for item in recommended:
for i in item.select(".folder:has(a)"):我会让多姆:
<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>但当我再加一行时,
for item in recommended:
for i in item.select(".folder:has(a)"):
url_tail = i['href']我将得到以下KeyError:
return self.attrs[key]
KeyError: 'href'我想要摆脱的是href链接,谢谢大家。
发布于 2019-05-04 04:55:19
@facelessuser很好地解释了错误(+),并给出了我的首选选择器。看起来可能还有另外两种属性=值选择器的可能性,如B计划。
以下任一项:
[href^="thread-"]或者:
[title="新窗口打开"]可以用于列表理解,例如
links = [item['href'] for item in soup.select('[href^='thread-']')]您的select可能不在item,而不是soup。如果结束时匹配的.folder [title="新窗口打开"]太宽,则始终可以提交父类。
发布于 2019-05-04 04:12:10
.folder:has(a)正在选择td元素,因为该元素具有类.folder,并且具有a的子元素。它不是选择a元素,而是检查带有.folder的元素是否有a元素。
像.folder a这样的东西可能就是你想要的。
发布于 2019-05-04 04:19:00
你可以这样做。
由于我没有完整的HTML或Url,所以我只是尝试从粘贴的文本中检索的值。
1)导入和创建BeautifulSoup对象
>>> from bs4 import BeautifulSoup
>>>
>>> html_text = """<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
... <td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
... <td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
... <td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>"""
>>>
>>> soup = BeautifulSoup(html_text, "html.parser")
>>>
>>> soup
<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
>>> 2)找到所有的tds
>>> tds = soup.find_all("td", class_="folder")
>>> tds
[<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>, <td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>, <td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>, <td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>]
>>> 3)检查(只是为了测试)
>>> tds[0]
<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
>>>
>>> tds[0].a
<a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a>
>>>
>>> tds[0].a.get("href")
'thread-10439294-1-1.html'
>>> 4)最后,检索链接(2种方式)
>>> # Using loop
...
>>> for td in tds:
... print(td.a.get("href"))
...
thread-10439294-1-1.html
thread-10439293-1-1.html
thread-10439292-1-1.html
thread-10439290-1-1.html
>>>
>>> for td in tds:
... print(td.a["href"])
...
thread-10439294-1-1.html
thread-10439293-1-1.html
thread-10439292-1-1.html
thread-10439290-1-1.html
>>>
>>> https://stackoverflow.com/questions/55979050
复制相似问题