我正在抓取与Cloudflare安全的网站,有时会得到一个错误,因为重定向到页面与ReCapcha,页面甚至无法加载,因为一些javascript错误。代码在#getPage方法上失败了,我不知道为什么。
下面的代码在普通页面上运行良好,但在确认页面上失败:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
final HtmlPage page = webClient.getPage("https://mydummy.site");
webClient.waitForBackgroundJavaScript(10000);
int waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
int loopCount = 0;
while (waitForBackgroundJavaScript > 0 && loopCount < 2) {
++loopCount;
waitForBackgroundJavaScript = webClient.waitForBackgroundJavaScript(200);
if (waitForBackgroundJavaScript == 0) {
break;
}
}
日志:
java.lang.RuntimeException: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:305)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:539)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:399)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:316)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:467)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:449)
at Main.htmlUnit(Main.java:156)
at Main.main(Main.java:43)
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function start in object [object MessagePort]. (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#249) (https://www.gstatic.com/recaptcha/api2/v1536705955372/recaptcha__en.js#253)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:892)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:616)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:532)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:772)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:748)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:104)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:992)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:371)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:246)
at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:298)
发布于 2018-10-16 05:48:20
我们也一直在努力解决这个问题。我们的测试套件运行得很完美,直到2018年底这个问题破坏了我们所有的登录。我相信Google故意这样做是为了打破自动尝试打破验证码,因为解决其中的一部分似乎只会导致另一个问题。加载页面和提交页面都会导致问题,即使告诉HtmlUnitDriver
to ignore all JavaScript errors也是如此。
在这一点上,我尝试了几个选项。如果您使用Google specified test site key,那么错误就会消失。因此,如果您对如何生成站点密钥具有完全的服务器端控制,那么您就可以了。记住要确保测试站点密钥在验证错误和所有类似的用例中再次出现,否则您将得到该错误。
(对我们来说不幸的是,我们的登录页面是纯JSP的,所以除非我们想在任何地方更改URL,否则实现这个页面是很麻烦的。(仍然在讨论该做什么,目前我们确实有一个可行的解决方案,虽然丑陋,但它涉及页面上的一些条件逻辑,并在测试代码中的其他点捕获JavaScript异常。)
https://stackoverflow.com/questions/52332792
复制相似问题