我正在尝试抓取以下HTML:
<select id="sizeShoe" name="attributes['size']" class="selectFld col-xs-12">
<option value="">Select Size</option>
<option value="025">2.5</option>
<option value="035">3.5</option>
<option value="040">4</option>
<option value="045">4.5</option>
<option value="050">5</option>
<option value="055">5.5</option>
<option value="060">6</option>
<option value="065">6.5</option>
<option value="070">7</option>
<option value="075">7.5</option>
<option value="080">8</option>
<option value="085" selected="selected">8.5</option>
<option value="090">9</option>
</select>我需要创建一个具有下列值的字典:
argument = {"2.5":"025", "3.5":"035, "4":"040" ecc...}我的尝试是:
soup = BeautifulSoup(response.text, "lxml")
soup.prettify()
argument = {}
sizeShoe = soup.find("select", attrs={'id' : 'sizeShoe'})
for a in sizeShoe:
valor = sizeShoe.get("value")但是valor的结果是None。
如何抓取数据并将其保存为字典?还有比BeautifulSoup更快的库吗?
发布于 2020-09-21 11:48:15
有比BeautifulSoup更快的库吗?
查看Scrapy。请参阅Difference between BeautifulSoup and Scrapy crawler?
尝试使用以下代码将数据抓取到字典中:
from bs4 import BeautifulSoup, NavigableString
html = '''YOUR ABOVE CODE SNIPPET'''
soup = BeautifulSoup(html, 'lxml')
shoe_size = soup.select_one('#sizeShoe')
# Check that 'tag' is not an instance of 'NavigableString'
# Check that the value of 'value' is not an empty string
argument = {
tag.text: tag['value']
for tag in shoe_size
if not isinstance(tag, NavigableString) and tag['value']
}
print(argument)输出:
{'2.5': '025', '3.5': '035', '4': '040', '4.5': '045', '5': '050', '5.5': '055', '6': '060', '6.5': '065', '7': '070', '7.5':'075', '8': '080', '8.5': '085', '9': '090'}发布于 2020-09-21 10:55:44
代码可以在这里找到:
from bs4 import BeautifulSoup
result_dict = {}
soup = BeautifulSoup(html_data, 'html.parser')
for option in soup.find_all('option'):
if option['value'] != '':
result_dict[option.text] = option['value']result_dict:
{'2.5':'025','3.5':'035','4':'040','4.5':'045','5':'050','5.5':'055','6':'060','6.5':'065','7':'070','7.5':'075','8':'080','8.5':'085','9':'090'}
发布于 2020-09-21 07:40:05
您必须使用soup.find_all()而不是soup.find()。bs4是最好的。
https://stackoverflow.com/questions/63984450
复制相似问题