我在尝试抓取网站上的链接。当我点击链接时,它可以是一个汽车广告,也可以是一个普通的广告。对于这两种类型的广告,我需要抓取的键是相同的:
对于汽车广告- data =dict_keys‘’header‘,'description','currency','price','wanted','id','photos','section','age','spotlight','year','state','friendlyUrl','keyInfo','seller','displayAttributes','countyTown','breadcrumbs’
对于普通广告- data = dict_keys('header','description','currency','price','wanted','id','photos','section','age','spotlight','year','state','friendlyUrl','keyInfo','seller','displayAttributes','countyTown','breadcrumbs')
在汽车广告数据中,'breadcrumbs‘键给我
[{'name': 'motor',
'displayName': 'Cars & Motor',
'id': 1003,
'title': 'Cars Motorbikes Trucks Caravans and More',
'subdomain': 'www',
'containsSubsections': True,
'xtn2': 101},
{'name': 'cars',
'displayName': 'Cars',
'id': 11,
'title': 'Cars',
'subdomain': 'cars',
'containsSubsections': False,
'xtn2': 142}]而在普通的广告中“面包屑”给了我
[{'name': 'all',
'displayName': 'All Sections',
'id': 2066,
'title': 'See Everything For Sale',
'subdomain': 'www',
'containsSubsections': True,
'xtn2': 100},
{'name': 'household',
'displayName': 'House & DIY',
'id': 1001,
'title': 'House & DIY',
'subdomain': 'www',
'containsSubsections': True,
'xtn2': 105},
{'name': 'furniture',
'displayName': 'Furniture & Interiors',
'id': 3,
'title': 'Furniture',
'subdomain': 'www',
'containsSubsections': True,
'xtn2': 105},
{'name': 'kitchenappliances',
'displayName': 'Kitchen Appliances',
'id': 1089,
'title': 'Kitchen Appliances',
'subdomain': 'www',
'containsSubsections': False,
'xtn2': 105}]我尝试通过调用'xtn2‘键和带有数据’‘breadcrumbs’‘xtn2’== 101的值来获取电机数据:并将其命名为'motordata‘
if data['breadcrumbs'][0]['xtn2'] == 101:
motordata = data
if motordata:
motors = motordata['breadcrumbs'][0]['name']
views = motordata['views']
title = motordata['header']
Adcounty = motordata['county']
itemId = motordata['id']
sellerId = motordata['seller']['id']
sellerName = motordata['seller']['name']
adCount = motordata['seller']['adCount']
lifetimeAds = motordata['seller']['adCountStats']['lifetimeAdView']['value']
currency = motordata['currency']
price = motordata['price']
adUrl = motordata['friendlyUrl']
adAge = motordata['age']
spotlight = motordata['spotlight']使用elif data‘’breadcrumbs‘xtn2’== 100的普通数据:名为'Allotherads‘
elif data['breadcrumbs'][0]['xtn2'] == 100:
Allotherads = alldata
if Allotherads:
views = Allotherads['views']
title = Allotherads['header']
itemId = Allotherads['id']
Adcounty = Allotherads['county']
# Adtown = alldata['countyTown']
sellerId = Allotherads['seller']['id']
sellerName = Allotherads['seller']['name']
adCount = Allotherads['seller']['adCount']
lifetimeAds = Allotherads['seller']['adCountStats']['lifetimeAdView']['value']
currency = Allotherads['currency']
price = Allotherads['price']
adUrl = Allotherads['friendlyUrl']
adAge = Allotherads['age']
spotlight = Allotherads['spotlight']
topSectionName = Allotherads['xitiAdData']['topSectionName']
xtn2 = Allotherads['breadcrumbs'][2]['xtn2']
subSection = Allotherads['breadcrumbs'][2]['displayName']但它不起作用。它只抓取了普通的广告,而不是汽车广告。我哪里错了?
发布于 2020-08-01 20:17:32
你就不能这样做吗(如果可能有多个运动障碍):
motordata = [x for x in data.get('breadcrumbs') if x.get('name') == "motor"]或者(如果只能有一个发动机数据:
motordata = next(iter([x for x in data.get('breadcrumbs') if x.get('name') == "motor"]))在这里,next(iter())最终与[0]的工作方式相同,但速度更快
https://stackoverflow.com/questions/63204280
复制相似问题