我正在用Python抓取一个站点,并将结果抛到.json上
from bs4 import BeautifulSoup
import json
import requests
url = 'https://storage.googleapis.com/infosimples-public/commercia/case/product.html#'
resposta_final = {}
response = requests.get(url)
parsed_html = BeautifulSoup(response.content, 'html.parser')
resposta_final['skus'] = [element.get_text(strip=True) for element in parsed_html.select(".skus-area")]
json_resposta_final = json.dumps(resposta_final)
with open('produto.json','w' ) as arquivo_json:
arquivo_json.write(json_resposta_final)produto.json
"skus": [
"Rubber Duck MK Ultra - Original$ 7.95$ 9.95Rubber Duck MK Ultra - Summer VersionOut of stockRubber Duck MK Ultra - Batman Version$ 14.95"
]但我需要以下格式:
"skus": [
{
"name": "Rubber Duck MK Ultra - Original$
Rubber Duck MK Ultra - Summer
Rubber Duck MK Ultra - Batman Version"
"current price": "7.95$
null
$ 14.95"
"old price": "9.95
null
null"
"available": "true
false
true"
}
]发布于 2022-08-22 18:47:51
如果你想把单一产品的所有细节放在一起,就试试这样的方法吧。
for i in parsed_html.select(".card"):
data = {}
data["name"] = i.find("sku-name")[0].text.strip()
data["current_price"] = i.select(".sku-current-price")[0].text.strip() if len(i.select(".sku-current-price")) else None
data["old_price"] = i.select(".sku-old-price")[0].text.strip() if len(i.select(".sku-old-price")) else None
data["availability"] = "true" if data.get("current_price") or data.get("old_price") else "false"
resposta_final.append(data)否则,只需创建每个细节的单独列表并将其附加到您的响应中。
发布于 2022-08-22 19:31:02
首先,让我建议您所要求的格式并不理想。相反,考虑一下
"skus": [
{"name": "Rubber Duck MK Ultra - Original",
"current price": "$7.95",
"old price": "$9.95",
"available": True
},
# ... etc
]这是一个dicts列表,每个dict代表一个SKU并包含所需的字段。这是数据的一种更自然的表示,它使访问每个SKU的信息更加容易。
更新了(考虑到各个字段在解析的中可用)。
skus = parsed_html.select(".card")
def sku_is_available(sku_tag):
oos_content = "https://schema.org/OutOfStock"
if sku_tag.find(content = oos_content) is not None:
return False
return True
def text_of_class(css_class, tag):
'''Helper function to extract desired properties
from single SKU'''
try:
text = tag.select_one(css_class).text
text = text.strip()
except:
text = None
return text
def clean_price_string(price_string):
'''Returns price_string without white spaces'''
import re
if price_string is not None:
price_string = re.sub(r'\s+', '', price_string)
return price_string
def compile_sku_dict(sku_tag):
'''Takes BS tag of single SKU
and returns desired properties in dict.'''
name = text_of_class(".sku-name", sku_tag)
old_price = text_of_class(".sku-old-price", sku_tag)
old_price = clean_price_string(old_price)
current_price = text_of_class(".sku-current-price", sku_tag)
current_price = clean_price_string(current_price)
available = sku_is_available(sku_tag)
out = {"name": name,
"current price": current_price,
"old price": old_price,
"available": available
}
return out
[compile_sku_dict(sku) for sku in skus] https://stackoverflow.com/questions/73449010
复制相似问题