我想抓取一个网页,请求类型是post,但是我得到了一个错误: java.io.IOException: Server returned HTTP response code: 523
public static String readContentFromPost(String urlStr, String content) {
URL url = null;
HttpURLConnection con = null;
StringBuffer sb = new StringBuffer();
try {
url = new URL(urlStr);
con = (HttpURLConnection) url.openConnection();
con.setDoOutput(true);
con.setDoInput(true);
con.setRequestMethod("POST");
con.setUseCaches(false);
con.setInstanceFollowRedirects(true);
con.setRequestProperty("Content-Type", "text/html;charset=utf-8");
con.connect();
DataOutputStream out = new DataOutputStream(con.getOutputStream());
out.writeBytes(content);
out.flush();
out.close();
BufferedReader br = new BufferedReader(new InputStreamReader(
con.getInputStream()));
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
发布于 2015-04-09 18:19:47
要抓取一个用javascript解决的webPage,可以使用selenium模拟浏览器来获取数据。selenium:http://www.seleniumhq.org
首先创建一个maven项目,然后添加:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>2.45.0</version>
</dependency>
然后下载一个ChromeDriver:http://chromedriver.storage.googleapis.com/index.html?path=2.14/
并将其放在/usr/local/bin目录中
最后,你可以抓取页面:
public static void testSelenium(String url) {
// System.getProperties().setProperty("webdriver.chrome.driver","/Users/freezhan/IDE/tools/chromedriver");
WebDriver webDriver = new ChromeDriver();
webDriver.get(url);
//WebElement webElement = webDriver.findElement(By.xpath("/html"));
System.out.println(webDriver.getPageSource());
webDriver.close();
}
发布于 2015-04-08 16:31:59
错误523没有任何标准含义:http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml
所以这是你试图抓取的服务器的一个概率错误...请尝试联系web管理员以了解其含义。
523并不意味着无法到达的源头...这只意味着在Cloudflare中:https://support.cloudflare.com/hc/en-us/articles/200171946-Error-523-Origin-is-unreachable
在Google或Wikipedia等知名服务器上尝试您的代码,以了解它是否正常工作。
https://stackoverflow.com/questions/29509457
复制相似问题