文章/答案/技术大牛

发布

社区首页 >问答首页 >Selenium、cloudflare、colab和JSON

问Selenium、cloudflare、colab和JSON
EN

Stack Overflow用户

提问于 2022-10-25 01:45:51

回答 1查看 122关注 0票数 1

我试图找到一种有效的方法来提取显示在此页面上的数据：

https://www.kartanarusheniy.org/messages

它是从大约44k JSON文件中提取的，这些JSON文件是从https://www.kartanarusheniy.org/api/messages/中通过ID号( https://www.kartanarusheniy.org/api/messages/1、https://www.kartanarusheniy.org/api/messages/3等)提取的。任务是提取所有这些44k文件。然而，服务器使用cloudflare来阻止我仅仅下载它们。

我曾多次尝试使用运行在Google上的Selenium来使其工作。下面是我最后得到的代码：

!apt update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium

from selenium import webdriver
import pandas as pd

options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument('--no-sandbox')
options.add_argument("--enable-javascript")
options.add_argument('--disable-dev-shm-usage')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)

i = 1
while i < 10:
  url = "https://www.kartanarusheniy.org/api/messages" + str(i)
  filename = str(i) + ".json"
  #session_obj = requests.Session ()
  getter = driver.get(url)
  saver = driver.page_source
  print(saver)
  i += 1

这段代码为我提供了具有常规cloudflare的HTML文件“检查站点连接是否安全”、“启用JavaScript和cookies继续”、“www.kartanarusheniy.org需要在继续之前检查连接的安全性”。

我使用过: undetected_cromedriver和selenium_stealth (如在硒无头:如何绕过硒探测云团中)。

在这种情况下我的其他选择是什么？

cloudflare

python

selenium

selenium-webdriver

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-25 03:17:54

您可能可以使用SeleniumBase的未检测色驱动程序模式，它比原始的未检测色驱动程序具有更多的特性。下面是一个简单的例子，它绕过Selenium检测，进入您想要的主站点，并获得一个截图，并使用最少的代码行。

首先，pip install -U seleniumbase，然后使用python运行以下命令

from seleniumbase import Driver
from seleniumbase import page_actions

driver = Driver(uc=True)
driver.get("https://www.kartanarusheniy.org/")
page_actions.wait_for_element(driver, "div.main")
screenshot_name = "kartanarusheniy.png"
driver.save_screenshot(screenshot_name)
print("\nScreenshot saved to: %s" % screenshot_name)
driver.quit()

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74188360

复制

相似问题

问Selenium、cloudflare、colab和JSON
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Selenium、cloudflare、colab和JSONEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Selenium、cloudflare、colab和JSON
EN