我正试着从名单中提取一个特定的链接。HTML如下所示:
<div class="icl-u-lg-hide">
<a href="https://www.indeed.com/rc/clk?jk=60f4f1ed9ea60a29&from=vj&pos=bottom&sjdu=YmZE5d5THV8u75cuc0H6Y26AwfY51UOGmh3Z9h4OvXiVStob9kU92ZXigtP-tVuXnUUKYs5yKqp3Fg7KgmoxhA&astse=65cca94a29bb2a04&assa=2009" referrerpolicy="origin" rel="noopener" target="_blank" role="link" class="icl-Button icl-Button--primary icl-Button--lg icl-Button--block">Apply Now</a>
</div>我使用以下代码访问了div:
list = soup.find_all('div', attrs={'class':'icl-u-lg-hide'})这将返回一个包含此div中所有元素的列表,但我不确定如何访问或只返回href。有人能帮我吗?谢谢!
发布于 2020-10-19 12:35:28
只需在list上迭代并在div中打印a标记的href属性即可。还有一件重要的事情要注意--不要使用关键字,比如list作为变量名。以下是代码:
div_lst = soup.find_all('div', class_ = "icl-u-lg-hide")
for div in div_lst:
try:
print(div.a['href'])
except TypeError:
pass以下是完整的代码:
from bs4 import BeautifulSoup
html = """
<div class="icl-u-lg-hide">
<a href="https://www.indeed.com/rc/clk?jk=60f4f1ed9ea60a29&from=vj&pos=bottom&sjdu=YmZE5d5THV8u75cuc0H6Y26AwfY51UOGmh3Z9h4OvXiVStob9kU92ZXigtP-tVuXnUUKYs5yKqp3Fg7KgmoxhA&astse=65cca94a29bb2a04&assa=2009" referrerpolicy="origin" rel="noopener" target="_blank" role="link" class="icl-Button icl-Button--primary icl-Button--lg icl-Button--block">Apply Now</a>
</div>
"""
soup = BeautifulSoup(html,'html5lib')
div_lst = soup.find_all('div', class_ = "icl-u-lg-hide")
for div in div_lst:
try:
print(div.a['href'])
except TypeError:
pass输出:
https://www.indeed.com/rc/clk?jk=60f4f1ed9ea60a29&from=vj&pos=bottom&sjdu=YmZE5d5THV8u75cuc0H6Y26AwfY51UOGmh3Z9h4OvXiVStob9kU92ZXigtP-tVuXnUUKYs5yKqp3Fg7KgmoxhA&astse=65cca94a29bb2a04&assa=2009发布于 2020-10-19 12:55:20
正如@Sushil所指出的,您可以简单地使用href遍历包含a标记的所有for loop属性的列表。然后指向您感兴趣的特定类:
from BeautifulSoup import BeautifulSoup
for a in soup.find('a', {'class': 'icl-u-lg-hide'})['href']:
print('Your url: ', a['href'])这将打印您正在寻找的类的所有href。
https://stackoverflow.com/questions/64427433
复制相似问题