我刚刚写了一些代码来拉我的交易机器人项目的Investing.com与python的请求库,但我能拉的唯一的html是关于‘检查浏览器’这里是一些数据
拉取数据请求数:
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>Just a moment...</title>
<style type="text/css">
html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
p {font-size: 20px; font-weight: 400; margin: 8px 0;}
p, .attribution, {text-align: center;}
#spinner {margin: 0 auto 30px auto; display: block;}
.attribution {margin-top: 32px;}
@keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
@-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
#cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
#cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
#cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
.bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
a:hover{color: #f4a15d}
.attribution{font-size: 16px; line-height: 1.5;}
.ray_id{display: block; margin-top: 8px;}
#cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
#cf-hcaptcha-container { text-align:center;}
#cf-hcaptcha-container iframe { display: inline-block;}
</style>
<meta http-equiv="refresh" content="35">
<script type="text/javascript">
//<![CDATA[
(function(){
window._cf_chl_opt={
cvId: "2",
cType: "non-interactive",
cNounce: "76414",
cRay: "6b1482833fd66244",
cHash: "eb90d788b40007d",
cPMDTk: "",
cFPWv: "b",
cTTimeMs: "1000",
cRq: {
ru: "aHR0cHM6Ly90ci5pbnZlc3RpbmcuY29tL2NyeXB0by9iaXRjb2luL2J0Yy10cnk/Y2lkPTEwMzEzODI=",
ra: "cHl0aG9uLXJlcXVlc3RzLzIuMjUuMQ==",
rm: "R0VU",
d: "a0DZUGkM6M3EHwuCQBn8MaFzRdaBiZLAjRjen+qhUv69cfEs3ya8yAfsstBQoxD6cdxDhSWaNNTDnFl2el9RQae+NbDXGVtUjODtAqO+A7SanMpxmFczY5Q2BP/q32fXt0ZsjUn5nBvXv6SmUdg9RpcYKILhzBZllUKTMR5wWQPxfzfAFz882qO/Gi6LDE/foncRE0iKLJaAs4/uTKcY3XFzcfZ+1w3o0Nliz5UGBqavIn0UPRAzwSqN5ZXlOvSI/ljL1qj3jl4Syy+h/nyVcp+a5vZSaqvGHYyp59PCLSY7jVPpXk7qqzc1708r3vWSRz+UN/0cqAbwO2oblQ3SOJV54dY/2AvuKBbaRRMAogxBQAgtjyCnhTLSLj3D4bfTVVrjRlF+vRgFFaFppX4b6oebqnzTCTDUVsFxrxp8yCFtab6F5sD1s2nPQpg9/Vk6jfJnbrLhfsibYjoJ74i1wS/leAEOr+BSk94E+gFchy93u62fO4GiFkYAXTerYAwIbXlpA4LrIS3EWkEj4GttD3JNUO8oC9dRsONC5zKUJ9U+LQFO/495QdMzc2o5fV5y",
t: "MTYzNzQ0MDk5OC45MjcwMDA=",
m: "P4HQU/6WBisbc+R6aRw0sYizTvbUME/3XuegInf566I=",
i1: "InCC6JdcVDWY2ZvTZ8VSJw==",
i2: "txEI7ubOoDTAtluPaIDvFQ==",
zh: "JJQg2KI/+bPgJbLHlLjmrs/mnno8aAGH5k3tm8QDk4c=",
uh: "5GU+jYv2xJ+bCaE/ARmi/DORbiS/v56CW7E0TH4XWQk=",
hh: "YmlEyuLTY297Bf4fWYKqj1eFG+SXP0t8yKaFmf8oRu0=",
}
}
window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
})();
//]]>
</script>
</head>
<body>
<table width="100%" height="100%" cellpadding="20">
<tr>
<td align="center" valign="middle">
<div class="cf-browser-verification cf-im-under-attack">
<noscript>
<h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
</noscript>
<div id="cf-content" style="display:none">
<div id="cf-bubbles">
<div class="bubbles"></div>
<div class="bubbles"></div>
<div class="bubbles"></div>
</div>
<h1><span data-translate="checking_browser">Checking your browser before accessing</span> tr.investing.com.</h1>
<div id="no-cookie-warning" class="cookie-warning" data-translate="turn_on_cookies" style="display:none">
<p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
</div>
<p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
<p data-translate="allow_5_secs" id="cf-spinner-allow-5-secs" >Please allow up to 5 seconds…</p>
<p data-translate="redirecting" id="cf-spinner-redirecting" style="display:none">Redirecting…</p>
</div>
<form class="challenge-form" id="challenge-form" action="/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=CDZzIbYLGXykSLkVwh2MqkPNxG_RPjcd8xFfkZ0fgP4-1637440998-0-gaNycGzNBz0" method="POST" enctype="application/x-www-form-urlencoded">
<input type="hidden" name="md" value="Wqq9NpYqtJShazwbZwz5SAqIRQsLKWUULt83_J1PWQ4-1637440998-0-AeHZYymAro52c5y0Zj4jafFO06_M0S7dAuXzwMd6L3UHN6CIx3Plh58eBAf5MM22l9hkKXL4dCz8A9lid8-gAXrlx3RNEJk_eAgJaK0U2OXiDQcOmE7Zh9IPDxxQCaTrjaXYC9cufjr4B2vv40GVEkMil2mgBTha97BD82aaoxD1FTLrJLCsQ4YIc9ZdZmouwloPPMYy2EFm7GbozVorA9wToFgFO5z1-11MsD0_uS4_1DnbpJI9YWWzFHJAk6V7gr7hZE_L-ngm-kdkgy2pSsvIXSt-fF-PR0mAWPsuyD79rdv-FI38snagG-lRkVPR8iCsEEctPcdW_GU6vAJ5cYCMGMLMUVVYkthENVScCRCNEp4aWy0t0rEyAf9KSBbg-9J7wwjQlmaQII06wQwQmZJCvjL5_pvMoclTiSLNTFhU8CQCVy-ay51BaDs_FNJmvAYyOQPL2aph0sm2d11guSDs7RJs4CapH9rzIEbLwVw2S6NHrFX1wSrBHHNFdQBpuE_J6tQuHbNUR0jdK10CzMbpBSdKfhKZgEqOvoumaYZR8nNdE-orX8gyOdgXcqnHbQ" />
<input type="hidden" name="r" value="7sRtptnsxEX3cgZaq5X0udJSCGZWOfcV3iy.EtraIdc-1637440998-0-AW+LAjAuS5CAITErZ4x/g8StGu4rnsSnXxe6FXBPM1BuPi2pd9inxq2DALXdugSKSRV2lpw7KyFxpCzOE9KBX0d8x6lFXGBntA04Ak6mHv+dJRZI7MqNxob93cMAi+gZtp8S2/WilEGpA5m7wy7ok+1uMj4mtEWf3FRVwuWKSq6/0FHQlcvVIicSXo7lmf6iaUbIOGGedbeTc2jpUnSg4C8E3Rpirj6pgyUQxeH7xTbKDr8S8xm9iSwJZ5MPgubSdQBXT6IB8s+1RoZy4530vValRYTtYNIgAL+9WZ7zDBZjLOXsdIq7D1Y34sKGulITYX1w8FhhrmrltOCs3SbqIk0RyGceTtpGKCITMbsyEfS5lgnK+E/viefSrt0NxypHr9vGpvtWy8ikeoyJQqgeq5QMnb4lqK8FLVnppy4DG6zpgNMx01eKCD+L+epos/zaxRCcSL65q9Op9bmPT08ucKBnCOzrQPLcMJ0s/Aizp75wnmXQxlBvCZhad88XAZVmyi8NCMhOSj1iX+Sd/EUANhw+meEVim6Apjuwbo9gAz1Hg9AxEr10iC0+jT2StIUb17kEjB4vkmF9LHh8YqD5WV+E56igEQEX3Gr5o6yiLuLteC/hYs0gVzN90X4Iz8X3kcfVvede9ClyHb9R2qzrrVRuR5Vzwoxz0oRfdeXWdGTb5tH65Z60Jh+hxsFWLxssCdTWt+O7/RYuskepGkDoO3Dob55tBl49qUNrUoNgcRWp+ELwggKa19kTKQRO8FRrB+F0yW6hcCHySsrUyKS/FHp6klzAQ4+50OHGjohtVsLYpEUOfFbIzXNX+nPrgDvuOFvivTIoHwTpmLO+8PLQYHciwo86ySSeTukEkayIqJuqVH4kvd987/9CGBjfyiobfCQZJXyuK1H6X/zJxajYAQoO7/lJwi39XO27nYdoNu28p1FQ7MgJCXOo9TV6k+OOGjrdNXsaSFUrFgK9g/0D/oOkw+bgT1g+k8quSuW53xvWoiOPMGbr1T1Ww+A+ey4mxhSV1QaROYnBStr4j38ivJ6/3QaxWHqBB3YUbyM0wTzfKBmzAhqsypNXx/0QTo2Ezvice037fJZOQ2EQv2mObIEaEL2NeTF5Lv9v6fzA+XUym6alGCxJ385TH4/AkptmJDmuttobnUGRKPbLPrb4m71H+xKTMoydJCGPmWXqL/W3HBuKdRiYhLy+twusUE74Lm4y26MYRR8WyBNNzQCIv+KB0ahdUbJIneoBMIb4RvNKJUzJXQklI+3DeasPhKJQ/Eqlrk/cIHPOnLpWYjv5XtvVGGz/aoMALs22Ap9TS5ewgn99lHR5smv8dEzPIkqAUnP68+b3AjSuS+OE64BbXVPuPwHpgeyBlnVMDENbI48j+AuAMaIIljHlZ8u8jTmnE5cUBHhDZAM7qSn3c25WgulYHJSphQ8hgywWaW0Q4s3MyVupuFR1djgijZYmg76lbEUkCsTDrpLOnYMlXJdIzdfUdsRxqIdNziAHZvVqMIb89VJpr62HB7N463VdXPYPz8tiiStQ1o3fbaZs48tZr62ucBE/t+ypGwCeXL3AOz7HYcHecNsDTSLFpIETh5iebpt9Ny7aQT3+kPeI0u1CnVJ2KO1sY0Bx7oMwopdwLVGnvohwtVlUx7YZLprHs9v7tWMMWfgcLn0N4Lq0Df0EjaMB7/m6ceObvORNv2AVWgbvZxuJTv0ZwPnPV25nb3l1QkNjp2Cx8pghFfh+KdsGEwwP1AD/nFpdVRdTV2D39fQribW0VkRNhoiq+rTObjijAQ=="/>
<input type="hidden" value="083fe53780735202364bbe5cba47b14e" id="jschl-vc" name="jschl_vc"/>
<!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
<input type="hidden" name="pass" value="1637440999.927-7nb1cbKw+i"/>
<input type="hidden" id="jschl-answer" name="jschl_answer"/>
</form>
<script type="text/javascript">
//<![CDATA[
(function(){
var a = document.getElementById('cf-content');
a.style.display = 'block';
var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
var trkjs = isIE ? new Image() : document.createElement('img');
trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js/transparent.gif?ray=6b1482833fd66244");
trkjs.id = "trk_jschal_js";
trkjs.setAttribute("alt", "");
document.body.appendChild(trkjs);
var cpo=document.createElement('script');
cpo.type='text/javascript';
cpo.src="/cdn-cgi/challenge-platform/h/b/orchestrate/jsch/v1?ray=6b1482833fd66244";
document.getElementsByTagName('head')[0].appendChild(cpo);
}());
//]]>
</script>
<div id="trk_jschal_nojs" style="background-image:url('/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=6b1482833fd66244')"> </div>
</div>
<div class="attribution">
DDoS protection by <a rel="noopener noreferrer" href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
<br />
<span class="ray_id">Ray ID: <code>6b1482833fd66244</code></span>
</div>
</td>
</tr>
</table>
</body>
</html>
以下是实际的代码
import requests
from bs4 import BeautifulSoup as soup
Url = "https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30"
R = requests.get(Url)
print(R.text)
我怎样才能收到我想要得到的网站的html?
发布于 2021-11-22 12:55:59
你被cloudflare的反机器人页面抓到了。您应该始终尝试模拟一个类似人类的请求,因此我建议至少在头部包含用户代理。
我刚刚修改了你的代码,将用户代理包含在头文件中,对我来说,它工作得很好。看一下..。
!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup as soup
url = "https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30"
headers = {
'User-Agent': 'Mozilla/5.0',
}
response = requests.get(url)
print(response.text)
如果这还不够,还有一个专门用来绕过这5秒页面的cloudscraper模块。
https://pythonrepo.com/repo/VeNoMouS-cloudscraper-python-web-crawling
我在这里测试了它,它起作用了。看一下..。
!/usr/bin/env python3
import cloudscraper
scraper = cloudscraper.create_scraper() # returns a CloudScraper instance
print(scraper.get("https://tr.investing.com/crypto/bitcoin/btc-try?cid=1031382&__cf_chl_jschl_tk__=mwDMnxBYJ1cW7K9fW7d4wVxyw7g4eQEw67kFqN7wZhk-1637432241-0-gaNycGzNC30").text)
https://stackoverflow.com/questions/70049808
复制相似问题