首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Python询问如何绕过“检查浏览器5秒”这件事

Python询问如何绕过“检查浏览器5秒”这件事
EN

Stack Overflow用户
提问于 2021-11-20 20:57:16
回答 1查看 356关注 0票数 0

我刚刚写了一些代码来拉我的交易机器人项目的Investing.com与python的请求库,但我能拉的唯一的html是关于‘检查浏览器’这里是一些数据

拉取数据请求数:

代码语言:javascript
运行
复制
<!DOCTYPE HTML>
<html lang="en-US">
<head>
  <meta charset="UTF-8" />
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
  <meta name="robots" content="noindex, nofollow" />
  <meta name="viewport" content="width=device-width,initial-scale=1" />
  <title>Just a moment...</title>
  <style type="text/css">
    html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
    body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
    h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
    p {font-size: 20px; font-weight: 400; margin: 8px 0;}
    p, .attribution, {text-align: center;}
    #spinner {margin: 0 auto 30px auto; display: block;}
    .attribution {margin-top: 32px;}
    @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
    #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
    #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
    .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
    a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
    a:hover{color: #f4a15d}
    .attribution{font-size: 16px; line-height: 1.5;}
    .ray_id{display: block; margin-top: 8px;}
    #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
    #cf-hcaptcha-container { text-align:center;}
    #cf-hcaptcha-container iframe { display: inline-block;}
  </style>

      <meta http-equiv="refresh" content="35">
  <script type="text/javascript">
    //<![CDATA[
    (function(){
      
      window._cf_chl_opt={
        cvId: "2",
        cType: "non-interactive",
        cNounce: "76414",
        cRay: "6b1482833fd66244",
        cHash: "eb90d788b40007d",
        cPMDTk: "",
        cFPWv: "b",
        cTTimeMs: "1000",
        cRq: {
          ru: "aHR0cHM6Ly90ci5pbnZlc3RpbmcuY29tL2NyeXB0by9iaXRjb2luL2J0Yy10cnk/Y2lkPTEwMzEzODI=",
          ra: "cHl0aG9uLXJlcXVlc3RzLzIuMjUuMQ==",
          rm: "R0VU",
          d: "a0DZUGkM6M3EHwuCQBn8MaFzRdaBiZLAjRjen+qhUv69cfEs3ya8yAfsstBQoxD6cdxDhSWaNNTDnFl2el9RQae+NbDXGVtUjODtAqO+A7SanMpxmFczY5Q2BP/q32fXt0ZsjUn5nBvXv6SmUdg9RpcYKILhzBZllUKTMR5wWQPxfzfAFz882qO/Gi6LDE/foncRE0iKLJaAs4/uTKcY3XFzcfZ+1w3o0Nliz5UGBqavIn0UPRAzwSqN5ZXlOvSI/ljL1qj3jl4Syy+h/nyVcp+a5vZSaqvGHYyp59PCLSY7jVPpXk7qqzc1708r3vWSRz+UN/0cqAbwO2oblQ3SOJV54dY/2AvuKBbaRRMAogxBQAgtjyCnhTLSLj3D4bfTVVrjRlF+vRgFFaFppX4b6oebqnzTCTDUVsFxrxp8yCFtab6F5sD1s2nPQpg9/Vk6jfJnbrLhfsibYjoJ74i1wS/leAEOr+BSk94E+gFchy93u62fO4GiFkYAXTerYAwIbXlpA4LrIS3EWkEj4GttD3JNUO8oC9dRsONC5zKUJ9U+LQFO/495QdMzc2o5fV5y",
          t: "MTYzNzQ0MDk5OC45MjcwMDA=",
          m: "P4HQU/6WBisbc+R6aRw0sYizTvbUME/3XuegInf566I=",
          i1: "InCC6JdcVDWY2ZvTZ8VSJw==",
          i2: "txEI7ubOoDTAtluPaIDvFQ==",
          zh: "JJQg2KI/+bPgJbLHlLjmrs/mnno8aAGH5k3tm8QDk4c=",
          uh: "5GU+jYv2xJ+bCaE/ARmi/DORbiS/v56CW7E0TH4XWQk=",
          hh: "YmlEyuLTY297Bf4fWYKqj1eFG+SXP0t8yKaFmf8oRu0=",
        }
      }
      window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
      
    })();
    //]]>
  </script>
  

</head>
<body>
  <table width="100%" height="100%" cellpadding="20">
    <tr>
      <td align="center" valign="middle">
          <div class="cf-browser-verification cf-im-under-attack">
  <noscript>
    <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
  <div id="cf-content" style="display:none">
    
    <div id="cf-bubbles">
      <div class="bubbles"></div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
    </div>
    <h1><span data-translate="checking_browser">Checking your browser before accessing</span> tr.investing.com.</h1>
    
    <div id="no-cookie-warning" class="cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    </div>
    <p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
    <p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs" >Please allow up to 5 seconds&hellip;</p>
    <p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting&hellip;</p>
  </div>
   
  <form class="challenge-form" id="challenge-form" action="/crypto/bitcoin/btc-try?cid=1031382&amp;__cf_chl_jschl_tk__=CDZzIbYLGXykSLkVwh2MqkPNxG_RPjcd8xFfkZ0fgP4-1637440998-0-gaNycGzNBz0" method="POST" enctype="application/x-www-form-urlencoded">
    <input type="hidden" name="md" value="Wqq9NpYqtJShazwbZwz5SAqIRQsLKWUULt83_J1PWQ4-1637440998-0-AeHZYymAro52c5y0Zj4jafFO06_M0S7dAuXzwMd6L3UHN6CIx3Plh58eBAf5MM22l9hkKXL4dCz8A9lid8-gAXrlx3RNEJk_eAgJaK0U2OXiDQcOmE7Zh9IPDxxQCaTrjaXYC9cufjr4B2vv40GVEkMil2mgBTha97BD82aaoxD1FTLrJLCsQ4YIc9ZdZmouwloPPMYy2EFm7GbozVorA9wToFgFO5z1-11MsD0_uS4_1DnbpJI9YWWzFHJAk6V7gr7hZE_L-ngm-kdkgy2pSsvIXSt-fF-PR0mAWPsuyD79rdv-FI38snagG-lRkVPR8iCsEEctPcdW_GU6vAJ5cYCMGMLMUVVYkthENVScCRCNEp4aWy0t0rEyAf9KSBbg-9J7wwjQlmaQII06wQwQmZJCvjL5_pvMoclTiSLNTFhU8CQCVy-ay51BaDs_FNJmvAYyOQPL2aph0sm2d11guSDs7RJs4CapH9rzIEbLwVw2S6NHrFX1wSrBHHNFdQBpuE_J6tQuHbNUR0jdK10CzMbpBSdKfhKZgEqOvoumaYZR8nNdE-orX8gyOdgXcqnHbQ" />
    <input type="hidden" name="r" value="7sRtptnsxEX3cgZaq5X0udJSCGZWOfcV3iy.EtraIdc-1637440998-0-AW+LAjAuS5CAITErZ4x/g8StGu4rnsSnXxe6FXBPM1BuPi2pd9inxq2DALXdugSKSRV2lpw7KyFxpCzOE9KBX0d8x6lFXGBntA04Ak6mHv+dJRZI7MqNxob93cMAi+gZtp8S2/WilEGpA5m7wy7ok+1uMj4mtEWf3FRVwuWKSq6/0FHQlcvVIicSXo7lmf6iaUbIOGGedbeTc2jpUnSg4C8E3Rpirj6pgyUQxeH7xTbKDr8S8xm9iSwJZ5MPgubSdQBXT6IB8s+1RoZy4530vValRYTtYNIgAL+9WZ7zDBZjLOXsdIq7D1Y34sKGulITYX1w8FhhrmrltOCs3SbqIk0RyGceTtpGKCITMbsyEfS5lgnK+E/viefSrt0NxypHr9vGpvtWy8ikeoyJQqgeq5QMnb4lqK8FLVnppy4DG6zpgNMx01eKCD+L+epos/zaxRCcSL65q9Op9bmPT08ucKBnCOzrQPLcMJ0s/Aizp75wnmXQxlBvCZhad88XAZVmyi8NCMhOSj1iX+Sd/EUANhw+meEVim6Apjuwbo9gAz1Hg9AxEr10iC0+jT2StIUb17kEjB4vkmF9LHh8YqD5WV+E56igEQEX3Gr5o6yiLuLteC/hYs0gVzN90X4Iz8X3kcfVvede9ClyHb9R2qzrrVRuR5Vzwoxz0oRfdeXWdGTb5tH65Z60Jh+hxsFWLxssCdTWt+O7/RYuskepGkDoO3Dob55tBl49qUNrUoNgcRWp+ELwggKa19kTKQRO8FRrB+F0yW6hcCHySsrUyKS/FHp6klzAQ4+50OHGjohtVsLYpEUOfFbIzXNX+nPrgDvuOFvivTIoHwTpmLO+8PLQYHciwo86ySSeTukEkayIqJuqVH4kvd987/9CGBjfyiobfCQZJXyuK1H6X/zJxajYAQoO7/lJwi39XO27nYdoNu28p1FQ7MgJCXOo9TV6k+OOGjrdNXsaSFUrFgK9g/0D/oOkw+bgT1g+k8quSuW53xvWoiOPMGbr1T1Ww+A+ey4mxhSV1QaROYnBStr4j38ivJ6/3QaxWHqBB3YUbyM0wTzfKBmzAhqsypNXx/0QTo2Ezvice037fJZOQ2EQv2mObIEaEL2NeTF5Lv9v6fzA+XUym6alGCxJ385TH4/AkptmJDmuttobnUGRKPbLPrb4m71H+xKTMoydJCGPmWXqL/W3HBuKdRiYhLy+twusUE74Lm4y26MYRR8WyBNNzQCIv+KB0ahdUbJIneoBMIb4RvNKJUzJXQklI+3DeasPhKJQ/Eqlrk/cIHPOnLpWYjv5XtvVGGz/aoMALs22Ap9TS5ewgn99lHR5smv8dEzPIkqAUnP68+b3AjSuS+OE64BbXVPuPwHpgeyBlnVMDENbI48j+AuAMaIIljHlZ8u8jTmnE5cUBHhDZAM7qSn3c25WgulYHJSphQ8hgywWaW0Q4s3MyVupuFR1djgijZYmg76lbEUkCsTDrpLOnYMlXJdIzdfUdsRxqIdNziAHZvVqMIb89VJpr62HB7N463VdXPYPz8tiiStQ1o3fbaZs48tZr62ucBE/t+ypGwCeXL3AOz7HYcHecNsDTSLFpIETh5iebpt9Ny7aQT3+kPeI0u1CnVJ2KO1sY0Bx7oMwopdwLVGnvohwtVlUx7YZLprHs9v7tWMMWfgcLn0N4Lq0Df0EjaMB7/m6ceObvORNv2AVWgbvZxuJTv0ZwPnPV25nb3l1QkNjp2Cx8pghFfh+KdsGEwwP1AD/nFpdVRdTV2D39fQribW0VkRNhoiq+rTObjijAQ=="/>
    <input type="hidden" value="083fe53780735202364bbe5cba47b14e" id="jschl-vc" name="jschl_vc"/>
    <!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
    <input type="hidden" name="pass" value="1637440999.927-7nb1cbKw+i"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>
  </form>
     
    <script type="text/javascript">
      //<![CDATA[
      (function(){
          var a = document.getElementById('cf-content');
          a.style.display = 'block';
          var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
          var trkjs = isIE ? new Image() : document.createElement('img');
          trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=6b1482833fd66244");
          trkjs.id = "trk_jschal_js";
          trkjs.setAttribute("alt", "");
          document.body.appendChild(trkjs);
          
          var cpo=document.createElement('script');
          cpo.type='text/javascript';
          cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=6b1482833fd66244";
          document.getElementsByTagName('head')[0].appendChild(cpo);
        }());
      //]]>
    </script>
  

  
  <div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=6b1482833fd66244')"> </div>
</div>

          
          <div class="attribution">
            DDoS protection by <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
            <br />
            <span class="ray_id">Ray ID: <code>6b1482833fd66244</code></span>
          </div>
      </td>
     
    </tr>
  </table>
</body>
</html>

以下是实际的代码

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup as soup

Url = "https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30"
R = requests.get(Url)
print(R.text)

我怎样才能收到我想要得到的网站的html?

EN

回答 1

Stack Overflow用户

发布于 2021-11-22 12:55:59

你被cloudflare的反机器人页面抓到了。您应该始终尝试模拟一个类似人类的请求,因此我建议至少在头部包含用户代理。

我刚刚修改了你的代码,将用户代理包含在头文件中,对我来说,它工作得很好。看一下..。

代码语言:javascript
运行
复制
!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup as soup

url = "https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30"

headers = {
'User-Agent': 'Mozilla/5.0',
}

response = requests.get(url)
print(response.text)

如果这还不够,还有一个专门用来绕过这5秒页面的cloudscraper模块。

https://pythonrepo.com/repo/VeNoMouS-cloudscraper-python-web-crawling

我在这里测试了它,它起作用了。看一下..。

代码语言:javascript
运行
复制
!/usr/bin/env python3
import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
print(scraper.get("https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30").text)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70049808

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档