文章/答案/技术大牛

发布

社区首页 >问答首页 >用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException

问用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException
EN

Stack Overflow用户

提问于 2022-10-30 00:03:15

回答 1查看 51关注 0票数 0

我想刮的东西我试过改变睡眠时间和time_between_checking。它在第一次迭代时返回，然后在while循环中失败。为什么By.CSS选择器会在第一次调用get_first_listing函数时正确执行，然后第二次失败？

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import time
import os

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])

os.environ['WDM_LOG_LEVEL'] = '0'
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)
# driver = webdriver.Chrome(s=path, options=chrome_options) # if you have problems with line 15

# Setting
classified_link = 'https://classifieds.ksl.com/search/Furniture'
time_to_wait_between_checking = 15


def get_first_listing():
    driver.get(classified_link)
    time.sleep(15)
    link = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) > '
                                                'section:nth-child(4) > div.listing-item-info > h2 >'
                                                ' div > a').get_attribute('href')
    title = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) >'
                                                 ' section:nth-child(4) > div.listing-item-info > h2 > div > a').text
    return (link, title)


listing_info = get_first_listing()
first_listing_link_temp = listing_info[0]
listing_title = listing_info[1]

print(f"First Listing Title: {listing_title}, Link: {first_listing_link_temp}")

check_count = 0
active = True
while active:
    check_count += 1
    time.sleep(time_to_wait_between_checking)
    print(f"Checking to see if new listing, this is attempt number {check_count}")
    new_listing_info = get_first_listing()
    first_listing_link = new_listing_info[0]
    title = new_listing_info[1]
    if first_listing_link_temp != first_listing_link:
        print(f"There is a new ad. Title {title}, Link: {first_listing_link}")
        active = False
        break

产出如下：

Traceback (most recent call last):
  File "C:PATH.py", line 46, in <module>
    new_listing_info = get_first_listing()
  File "C:PATH.py", line 26, in get_first_listing
    link = driver.find_element(By.CSS_SELECTOR, '#search-results > div > section > div > div:nth-child(1) >'
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 856, in find_element
    return self.execute(Command.FIND_ELEMENT, {
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 429, in execute
    self.error_handler.check_response(response)
  File "C:PATH\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"#search-results > div > section > div > div:nth-child(1) > section:nth-child(4) > div.listing-item-info > h2 > div > a"}
  (Session info: headless chrome=106.0.5249.119)
Stacktrace:
Backtrace:
...

Process finished with exit code 1

selenium-webdriver

css-selectors

python-3.9

python

selenium

回答 1

Stack Overflow用户

发布于 2022-10-30 07:01:01

这里有几个问题可以让您的代码开始工作：

你必须关闭曲奇横幅。
需要引入WebDriverWait来等待元素的可见性、可点击性等。
定位器需要改进。以下代码应该更好：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import time
import os

chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])

os.environ['WDM_LOG_LEVEL'] = '0'
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=chrome_options)
# driver = webdriver.Chrome(s=path, options=chrome_options) # if you have problems with line 15
wait = WebDriverWait(driver, 20)

# Setting
classified_link = 'https://classifieds.ksl.com/search/Furniture'
time_to_wait_between_checking = 15


def get_first_listing():
    driver.get(classified_link)
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#onetrust-close-btn-container button.onetrust-close-btn-handler"))).click() #close the cookies banner
    title_element = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".item-info-title-link a")))
    link = title_element.get_attribute('href')
    title = title_element.text
    return (link, title)


listing_info = get_first_listing()
first_listing_link_temp = listing_info[0]
listing_title = listing_info[1]

print(f"First Listing Title: {listing_title}, Link: {first_listing_link_temp}")

check_count = 0
active = True
while active:
    check_count += 1
    time.sleep(time_to_wait_between_checking)
    print(f"Checking to see if new listing, this is attempt number {check_count}")
    new_listing_info = get_first_listing()
    first_listing_link = new_listing_info[0]
    title = new_listing_info[1]
    if first_listing_link_temp != first_listing_link:
        print(f"There is a new ad. Title {title}, Link: {first_listing_link}")
        active = False
        break

这里我只修正了get_first_listing()方法，而不是继续while循环。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74249696

复制

相似问题

问用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementExceptionEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用selenium在Python中进行Web抓取。selenium.common.exceptions.NoSuchElementException
EN