首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >刮w/ Selenium返回“none”:

刮w/ Selenium返回“none”:
EN

Stack Overflow用户
提问于 2022-06-29 17:07:44
回答 1查看 88关注 0票数 1

我试着用selenium从capterra抓取公司的个人资料页面。Capterra分25批加载配置文件页。我的代码能够获得前5,但在页面上为其他20返回"none“。

代码:

代码语言:javascript
运行
复制
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.firefox import GeckoDriverManager


driver = webdriver.Firefox()
driver.get("https://www.capterra.com/waste-management-software/")

page = bs(driver.page_source, 'html.parser')
# Hits "Show More" button
driver.find_element(By. XPATH, "//*[contains(text(), 'Show More')]").click()
# Grabs Company portfolio page links
plinks = [div.a for div in page.findAll("div", attrs={"class" : "nb-mb-0"})]

for link in plinks:
    print(link)

driver.close()

输出:

代码语言:javascript
运行
复制
<a class="nb-thumbnail nb-relative nb-thumbnail-medium nb-thumbnail-interactive" href="/p/81310/AMCS/"><img alt="" class="nb-max-h-full" loading="lazy" src="https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/946474e4-bd54-451d-bbaf-9c5602b2f399.png?auto=compress%2Cformat&amp;w=180&amp;h=180"/></a>
<a class="nb-thumbnail nb-relative nb-thumbnail-medium nb-thumbnail-interactive" href="/p/103755/HazMat-T-T/"><img alt="" class="nb-max-h-full" loading="lazy" src="https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/838db9d8-c251-4d78-aa69-a9cd745ef6b9.png?auto=compress%2Cformat&amp;w=180&amp;h=180"/></a>
<a class="nb-thumbnail nb-relative nb-thumbnail-medium nb-thumbnail-interactive" href="/p/79230/WAM-Hauler-Easy-Bill-Route/"><img alt="" class="nb-max-h-full" loading="lazy" src="https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/0820f6ea-9d9d-4062-987b-a3fcf25f2813.png?auto=compress%2Cformat&amp;w=180&amp;h=180"/></a>
<a class="nb-thumbnail nb-relative nb-thumbnail-medium nb-thumbnail-interactive" href="/p/152697/Waste-Management-Software/"><img alt="" class="nb-max-h-full" loading="lazy" src="https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/64597b5d-84e5-464c-ae60-84a1c5ad4976.png?auto=compress%2Cformat&amp;w=180&amp;h=180"/></a>
<a class="nb-thumbnail nb-relative nb-thumbnail-medium nb-thumbnail-interactive" href="/p/177472/Via-Analytics/"><img alt="" class="nb-max-h-full" loading="lazy" src="https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/c20bf8d6-88cc-49d5-8424-b724ba734d4a.png?auto=compress%2Cformat&amp;w=180&amp;h=180"/></a>
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

从输出中我真正需要的是包含"/p/“的引用。点击页面上的“显示更多”按钮,然后收集下25个链接,点击按钮等。

谢谢!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-06-29 21:16:12

你不需要硒。这里您有一个 API ,您可以使用one request直接抓取API,它返回您需要的125个对象

代码语言:javascript
运行
复制
import json
import requests

headers = {
    'accept': '*/*',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,es;q=0.7,ru;q=0.6',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
params = {'htmlName': 'waste-management-software', 'countryCode': 'ES'}
base_url = "https://www.capterra.com/p/"
response = requests.get('https://www.capterra.com/directoryPage/rest/v1/category', params=params, headers=headers)
json = json.loads(response.content)

products = json["pageData"]["categoryData"]["products"]
print("Total elements: " + str(len(products)))
for product in products:
    print("Name: " + product["product_name"])
    print("URL: " + base_url + str(product["product_id"]) + "/" + product["product_slug"] + "/")
    print("Product url: " + product["product_url"])
    print("Image: " + product["logo_filepath"])
    print("Rating: " + str(product["rating"]))
    print()

输出:

代码语言:javascript
运行
复制
Total elements: 125
Name: FAMA
URL: https://www.capterra.com/p/86768/FAMA/
Product url: https://info.gartnerdigitalmarkets.com/fama-es-gdm-lp
Image: https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/7a7a8467-9a2c-40d9-8488-7d6c3c0dec52.jpeg
Rating: 3.6

Name: Quentic
URL: https://www.capterra.com/p/127188/Quentic/
Product url: https://go.quentic.com/hazardous-materials-management-software
Image: https://gdm-catalog-fmapi-prod.imgix.net/ProductLogo/ba5e26a7-375d-4423-a1f2-68a27d5318c5.png
Rating: 4.8
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72805325

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档