首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException

用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException
EN

Stack Overflow用户
提问于 2022-10-30 00:03:15
回答 1查看 51关注 0票数 0

我想刮的东西我试过改变睡眠时间和time_between_checking。它在第一次迭代时返回,然后在while循环中失败。为什么By.CSS选择器会在第一次调用get_first_listing函数时正确执行,然后第二次失败?

代码语言:javascript
运行
复制
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import time
import os

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])

os.environ['WDM_LOG_LEVEL'] = '0'
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)
# driver = webdriver.Chrome(s=path, options=chrome_options) # if you have problems with line 15

# Setting
classified_link = 'https://classifieds.ksl.com/search/Furniture'
time_to_wait_between_checking = 15


def get_first_listing():
    driver.get(classified_link)
    time.sleep(15)
    link = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) > '
                                                'section:nth-child(4) > div.listing-item-info > h2 >'
                                                ' div > a').get_attribute('href')
    title = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) >'
                                                 ' section:nth-child(4) > div.listing-item-info > h2 > div > a').text
    return (link, title)


listing_info = get_first_listing()
first_listing_link_temp = listing_info[0]
listing_title = listing_info[1]

print(f"First Listing Title: {listing_title}, Link: {first_listing_link_temp}")

check_count = 0
active = True
while active:
    check_count += 1
    time.sleep(time_to_wait_between_checking)
    print(f"Checking to see if new listing, this is attempt number {check_count}")
    new_listing_info = get_first_listing()
    first_listing_link = new_listing_info[0]
    title = new_listing_info[1]
    if first_listing_link_temp != first_listing_link:
        print(f"There is a new ad. Title {title}, Link: {first_listing_link}")
        active = False
        break

产出如下:

代码语言:javascript
运行
复制
Traceback (most recent call last):
  File "C:PATH.py", line 46, in <module>
    new_listing_info = get_first_listing()
  File "C:PATH.py", line 26, in get_first_listing
    link = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) >'
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 856, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 429, in execute
    self.error_handler.check_response(response)
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#search-results > div > section > div > div:nth-child(1) > section:nth-child(4) > div.listing-item-info > h2 > div > a"}
  (Session info: headless chrome=106.0.5249.119)
Stacktrace:
Backtrace:
...

Process finished with exit code 1
EN

回答 1

Stack Overflow用户

发布于 2022-10-30 07:01:01

这里有几个问题可以让您的代码开始工作:

  1. 你必须关闭曲奇横幅。
  2. 需要引入WebDriverWait来等待元素的可见性、可点击性等。
  3. 定位器需要改进。 以下代码应该更好:
代码语言:javascript
运行
复制
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import time
import os

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])

os.environ['WDM_LOG_LEVEL'] = '0'
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)
# driver = webdriver.Chrome(s=path, options=chrome_options) # if you have problems with line 15
wait = WebDriverWait(driver, 20)

# Setting
classified_link = 'https://classifieds.ksl.com/search/Furniture'
time_to_wait_between_checking = 15


def get_first_listing():
    driver.get(classified_link)
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#onetrust-close-btn-container button.onetrust-close-btn-handler"))).click() #close the cookies banner
    title_element = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".item-info-title-link a")))
    link = title_element.get_attribute('href')
    title = title_element.text
    return (link, title)


listing_info = get_first_listing()
first_listing_link_temp = listing_info[0]
listing_title = listing_info[1]

print(f"First Listing Title: {listing_title}, Link: {first_listing_link_temp}")

check_count = 0
active = True
while active:
    check_count += 1
    time.sleep(time_to_wait_between_checking)
    print(f"Checking to see if new listing, this is attempt number {check_count}")
    new_listing_info = get_first_listing()
    first_listing_link = new_listing_info[0]
    title = new_listing_info[1]
    if first_listing_link_temp != first_listing_link:
        print(f"There is a new ad. Title {title}, Link: {first_listing_link}")
        active = False
        break

这里我只修正了get_first_listing()方法,而不是继续while循环。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74249696

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档