首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >试图刮一个网站,但我没有HTML内容

试图刮一个网站,但我没有HTML内容
EN

Stack Overflow用户
提问于 2022-07-27 21:35:44
回答 3查看 302关注 0票数 -1

我试图刮这个网站,但我没有得到我看到的“检查元素”。我觉得HTML内容是隐藏的,或者什么的:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup 
import requests

result = requests.get("https://groceries.asda.com/aisle/price-match/view-all-price-match/view-all-price-match/1215686354045-1215686354052-1215686354053")
src = result.content
soup = BeautifulSoup(src, 'html.parser')
print(soup)

--这是我在检查元素中所看到的和我想要的:

但是,当我打印汤时,我得到的是其他东西(请尝试执行这段代码,因为输出将很长时间粘贴在这里)

EN

Stack Overflow用户

发布于 2022-07-27 22:34:05

网页是通过JS动态加载的。因此,在bs4的帮助下,您无法看到html内容。如果您的最终目标是刮取数据,那么您也可以使用API实现这一点。这是健壮的,同时也是仅使用requests模块获取数据的最简单方法。

示例:

代码语言:javascript
运行
复制
import requests

api_url = "https://groceries.asda.com/api/bff/graphql"
payload= {"requestorigin":"gi","contract":"web/cms/get-items","variables":{"user_segments":["1259","1194","1140","1141","1182","1130","1128","1124","1126","1119","1123","1117","1112","1116","1109","1111","1102","1110","1097","1105","1100","1107","1098","1038","1087","1099","1070","1082","1067","1047","1059","1057","1055","1053","1043","1041","1042","1027","1023","1024","1020","1019","1007","1242","1241","1262","1239","1256","1245","1237","1263","1264","1233","1249","1260","1247","1238","1236","1227","1208","1220","1210","1172","1178","1222","1231","1217","1179","1225","1207","1167","1221","1219","1160","1180","1152","1213","1206","1176","1224","1165","1159","1209","1169","1144","1214","1177","1216","1196","1173","1186","1147","1183","1204","1174","1191","1201","1202","1190","1157","1198","1189","1166","1197","1150","1170","1184","1271","1278","1279","1269","1283","1284","1285","rmp_enabled_user","dp-False","wapp","store_4565","vp_M","anonymous","clothing_store_enabled","checkoutOptimization","NAV_UI","T003","T014"],"store_id":"4565","page":2,"page_size":60,"request_origin":"gi","type":"content","ship_date":1658880000000,"payload":{"cacheable":True,"hierarchy_id":"1215686354045-1215686354052-1215686354053","filter_query":[]}}}
headers={
    'content-type': 'application/json',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'request-origin': 'gi'
}
data = requests.post(api_url,headers=headers,json=payload).json()

for item in data['data']['tempo_items']['products']['items']:
    print(item['item']['name'])

输出:

代码语言:javascript
运行
复制
Fixodent Complete Denture Adhesive Original
Surf Tropical Lily Concentrated Liquid Laundry Detergent 24 Washes
Always Maxi Profresh Night Sanitary Towels Without Wings
Pantene 3 Minute Miracle Repair&Protect Hair Conditioner
Garnier Ultimate Blends Coconut Oil Frizzy Hair Shampoo
Pedigree Schmackos Strips Adult Dog Treats Fish Mix
TRESemme Replenish & Cleanse Conditioner
Herbal Essences Hello Hydration Shampoo For Dry Hair
Blistex Relief Cream
Garnier Skin Active Micellar Cleansing Water Sensitive Skin       
TRESemme Rich Moisture Conditioner
Lemsip Max Day & Night Cold & Flu Relief Capsules
Lenor In-Wash Scent Booster Spring Awakening
Sudafed Congestion Headache Relief Day & Night Capsules
Halls Mentholyptus Extra Strong Lozenges 10 pack
Panadol Advance Paracetamol Tablets x16
Always Dailies Extra Protect Large Panty Liners
Simple Kind To Skin Purifying Cleansing Lotion
Nivea Gentle Exfoliating Face Scrub
Simple Kind to Skin Refreshing Facial Wash Gel
Pantene 3 Minute Miracle Smooth&Sleek Hair Conditioner
Olbas Oil Inhalant Decongestant
Johnson's Bedtime Shampoo
Huggies DryNites Pyjama Pants Girl 8-15 Years
Garnier Belle Color 6 Natural Light Brown Permanent Hair Dye
Westlab Pure Mineral Bathing Epsom Salt
Herbal Essences Ignite My Colour Hair Conditioner For Coloured Hair
Poligrip Denture Adhesive Ultra Fixative Cream
Garnier Ultimate Blends Argan Oil & Almond Cream Dry Hair Conditioner
Halls Original Sugar Free Lozenges 10 pack
Huggies DryNites Pyjama Pants Boy 8-15 Years
Westlab Sleep Epsom & Dead Sea Salts with Lavender & Jasmine
Herbal Essences Ignite My Colour Shampoo For Coloured Hair
Westlab Mindful Epsom & Himalayan Salts with Frankincense & Bergamot
Jolen Creme Bleach
Garnier Belle Color 7.1 Natural Dark Ash Blonde Permanent Hair Dye
Herbal Essences Dazzling Shine Hair Conditioner For All Hair Type
Dettol Antibacterial Disinfectant Multi Surface Spray Lemon & Lime
Lemsip Cold & Flu Lemon Flavour Sachets
Toplife Puppy Formula Milk
Westlab Pure Mineral Bathing Dead Sea Salt
Misfits Nasher Sticks Adult Medium Dog Treats with Chicken and Beef
Dove Deeply Nourishing Body Wash
Dreamies Cat Treat Biscuits with Chicken Mega Pack
Deep Freeze Cold Spray
Tena Lady Discreet Mini Pads
Pantene Pro-V Smooth & Sleek 3in1 Shampoo
Garnier Nutrisse 4.3 Dark Golden Brown Permanent Hair Dye
Fixodent Plus Dual Power Denture Adhesive
Beechams All In One Oral Solution 8 Doses 160ML
Panadol Extra Advance 500mg/65mg Tablets x14
Duck Fresh Brush Toilet Cleaning System Holder
Oral-B Allrounder Black Manual Toothbrush x 3
Dove Indulging Cream Bath Soak
Garnier Ultimate Blends Honey Treasures Strengthening Conditioner
Sudafed Sinus Max Strength Capsules
Johnson's Baby Shampoo
Halls Soothers Cherry Lozenges
Rennie Spearmint Heartburn & Indigestion Relief Tablets
Huggies DryNites Pyjama Pants Boy 4-7 Years

硒与bs4:

由于API与HTML内容没有通信,所以我们无法通过API获得html内容。网页是动态的,bs4不能呈现JS。因此,要获得html内容,可以在bs4中使用selenium。下面的代码将从页面生成正确的html内容。

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_experimental_option("detach", True)
# chrome_options.add_argument("--headless")

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url='https://groceries.asda.com/aisle/price-match/view-all-price-match/view-all-price-match/1215686354045-1215686354052-1215686354053'
driver.get(url)
driver.maximize_window()
time.sleep(5)
#accept cookie
driver.find_element(By.XPATH,'//*[@id="onetrust-button-group-parent"]/div/button[1]').click()
time.sleep(2)
soup=BeautifulSoup(driver.page_source,'lxml')
html=soup.select_one('div.co-product-list > ul:nth-child(1)')
print(html.prettify())

输出:

代码语言:javascript
运行
复制
<li class="co-product__promo-icon-item">
       <div class="co-product__promo-icon-image-cntr">
        <button aria-label="show information on Smooth &amp; Frizz Free" class="asda-btn asda-btn--plain co-product__promo-icon-button" data-auto-id="btnPromo" type="button">
         <picture class="asda-image picture">
          <source srcset="https://ui.assets-asda.com/dm/_103_frizzfree?$icon-wapp$=&amp;$Icon-wapp$=">
           <img alt="Smooth &amp; Frizz Free" class="asda-img asda-image co-product__promo-icon-img" data-auto-id="" loading="lazy" src="https://ui.assets-asda.com/dm/_103_frizzfree?$icon-wapp$=&amp;$Icon-wapp$=" title="Smooth &amp; Frizz Free"/>
          </source>
         </picture>
        </button>
       </div>
      </li>
     </ul>
    </div>
   </div>
   <div class="co-item__col3">
    <div class="co-item__price-container">
     <span class="co-item__price-per-uom">
      <strong class="co-product__price">
       <span class="co-product__hidden-label">
        now
       </span>
       £1.99
      </strong>
      <p class="co-item__price-per-uom-msg">
       <span class="co-product__price-per-uom">
        (55.3p/100ml)
       </span>
      </p>
     </span>
    </div>
    <div class="co-item__quantity-container">
     <div class="unavailable-banner">
      <span class="asda-pill asda-pill--warning unavailable-banner__product-status" data-auto-id="">
       OUT OF STOCK
      </span>
      <button aria-disabled="false" class="asda-link asda-link--primary asda-link--standalone 
asda-link--button unavailable-banner__see-alternatives" data-auto-id="linkSeeAlternatives" type="button">
       See alternatives
      </button>
     </div>
    </div>
   </div>
  </div>
 </li>

..。等等

票数 1
EN
查看全部 3 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73144880

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档