我试图从Vivino中抓取数据,到目前为止,我使用了API,并使用以下文章从json文件中读取数据:https://stackoverflow.com/a/62224619/7575172
r = requests.get(
"https://www.vivino.com/api/explore/explore",
params = {
"country_code": "DK",
"country_codes[]":"fr",
"currency_code":"DKK",
"grape_filter":"varietal",
"min_rating":"1",
"order_by":"price",
"order":"asc",
"page": 1,
"price_range_max":"500",
"price_range_min":"0",
"wine_type_ids[]":"1",
},
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
}
)
results = [
(
t["vintage"]["wine"]["winery"]["name"],
f'{t["vintage"]["wine"]["name"]}',
t['vintage']['year'],
t['vintage']['wine']['region']['country']['name'],
# t['vintage']['wine']['region']['name_en'],
t['vintage']['wine']['style']['region']['name'],
t["vintage"]["statistics"]["ratings_average"],
t["vintage"]["statistics"]["ratings_count"],
t['price']['amount']
)
for t in r.json()["explore_vintage"]["matches"]
]
dataframe = pd.DataFrame(results,columns=
['Winery',
'Wine',
'Year',
'Country',
'Region',
'Rating',
'num_review',
'Price'])
# print(dataframe)
print(dataframe[['Winery','Year','Region','Rating','num_review','Price']])
但是,我无法在任何json文件中找到描述同一种葡萄酒的其他可用年份的数据。我正在考虑2019年,但也有2015-2020年的数据。
我已经使用火狐中的网络监视器来检查当您打开下面的页面时发送的其他json文件。但据我所见,关于全部可用年份的信息还没有出现?
我想要刮的部分的一个例子可以在这里和图像中看到:https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846?ref=nav-search#vintageListSection
发布于 2021-10-16 18:44:47
数据位于window.__PRELOADED_STATE__.winePageInformation
对象下的javascript中,如下所示:
<script>
window.__PRELOADED_STATE__ = ....
window.__PRELOADED_STATE__.winePageInformation = { very long JSON here }
</script>
您可以使用regex提取它,结果似乎是有效的JSON:
import requests
import re
import json
url = "https://www.vivino.com/DK/en/pierre-amadieu-gigondas-romane-machotte-rouge/w/73846"
r = requests.get(url,
headers= {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0"
})
# this gets the javascript object
res = re.search(r"^.*window\.__PRELOADED_STATE__\.winePageInformation\s*=\s*(.*});", r.text, re.MULTILINE)
print( r.text)
data = json.loads(res.group(1))
print("recommended vintages")
print(data["recommended_vintages"])
print("all vintages")
print(data["wine"]["vintages"])
发布于 2021-10-17 01:49:48
伯特兰给了你最好的答案。也许奇怪的是,您正在命中的端点并没有被配置为允许您传递一个葡萄酒id并将所有的葡萄酒返回。现有的方案如下:
country_code, country_codes, currency_code, discount_prices, food_ids,
grape_ids, grape_filter, max_rating, merchant_id, merchant_type, min_rating,
min_ratings_count, order_by, order, page, per_page, price_range_max,
price_range_min, region_ids, wine_style_ids, wine_type_ids, winery_ids,
vintage_ids, wine_years, excluding_vintage_id, wsa_year, top_list_filter
JS文件https://www.vivino.com/packs/common-8f26f13b0ac53f391471.js
中详细介绍了这些内容。
您需要确定年份in并将它们传递给API。
https://stackoverflow.com/questions/69577385
复制相似问题