我开始使用Python和美汤。
我正在练习使用以下代码:
import requests
r = requests.get('https://www.autobarn.com.au/car-care-touring-accessories/car-care/washes?dir=asc&limit=48&order=name')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'lxml')
results = soup.find_all('div', class_='product-details')
records = []
for result in results:
SKU = result.find('small',class_='text-muted').text.strip()
DESC = result.find('strong').text.strip().upper()
PRICE = result.find ('span',class_='price')
URL = result.find('a')['href']
records.append((SKU, DESC, PRICE, URL))
import pandas as pd
df = pd.DataFrame(records, columns=['SKU','DESCRIPTION', 'RRP', 'URL'])
df.to_csv('d:\\WEB SCRAPE TEST 4.csv', index=False, encoding='utf-8')
这可以很好地获取我想要的信息。
然而,对于价格,它会拖拽所有周围的HTML信息。
例如span class="price“id="product-price-1242”span class="price">$6.99
这似乎是由两个相继相同的标记引起的:- span class='price‘span class='price’
虽然我可以在csv文件中清理价格数据,但有没有办法改进代码以获取价格?
提前感谢
发布于 2018-06-06 05:47:19
尝尝这个。它应该可以解决这个问题:
import requests
from bs4 import BeautifulSoup
url = 'https://www.autobarn.com.au/car-care-touring-accessories/car-care/washes?dir=asc&limit=48&order=name'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for result in soup.find_all('div', class_='product-details'):
SKU = result.find('small',class_='text-muted').text.strip()
DESC = result.find('strong').text.strip().upper()
try:
PRICE = result.select_one("[id^='product-price-'] span").text
except AttributeError: PRICE = ""
URL = result.find('a')['href']
print(SKU, DESC, PRICE, URL)
发布于 2018-06-05 20:46:26
你可以这样做:
PRICE = result.find('span',class_='price').find('span',class_='price').text
您还必须决定如何处理没有价格可用的情况。可能是这样的:
if result.find('span',class_='price') is None:
PRICE = "N/A"
else:
PRICE = result.find('span',class_='price').find('span',class_='price').text
https://stackoverflow.com/questions/50698698
复制相似问题