示例
html-code
html-code
<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l"></div>
html-code
html-code
如何使用BeautifulSoup查找这个DIV并获取引号之间包含的数据?data-content="?????“
发布于 2020-04-04 08:47:44
易于使用的soup.findAll("div", attrs={"data-content":True})
如下所示:
from bs4 import BeautifulSoup
html = """
<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l" href="www.test1.com" </div>
<div data-content="2" href="www.test1.com" </div>
<div data-content="3" href="www.test2.com" </div>
<div data-content="4" href="www.test2.com" </div>
<div data-content="5" href="www.test3.com" </div>
<div data-content="6" href="www.test3.com" </div>
"""
soup = BeautifulSoup(html, 'html.parser')
goal = [url.get("data-content")
for url in soup.findAll("div", {'data-content': True})]
print(goal)
输出:
['N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l', '2', '3', '4', '5', '6']
发布于 2020-04-04 09:02:18
使用css选择器非常容易,如下所示:
from bs4 import BeautifulSoup
html = '<div data-content="N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l"></div>'
soup = BeautifulSoup(html, 'lxml')
soup.select_one('div[data-content]')["data-content"]
输出
'N(EX%hY-G47*@A8Ru%%c7@tG4mN3k/mebP631Y0B1A08s!Xn_sd#xGzJtF;^*03znN;-r6X8cu2;*+E%6l'
https://stackoverflow.com/questions/61025637
复制相似问题