首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >URLConnection不允许我访问有关Http错误的数据(404,500等)

URLConnection不允许我访问有关Http错误的数据(404,500等)
EN

Stack Overflow用户
提问于 2012-02-03 21:58:42
回答 1查看 32.1K关注 0票数 23

我正在做一个爬虫,并需要从流中获取数据,无论它是200或不是。CURL正在做这件事,就像任何标准的浏览器一样。

下面的代码不会实际获取请求的内容,即使有,也会抛出异常,并显示http错误状态代码。不管怎样,我都想要输出,有没有办法?我更喜欢使用这个库,因为它实际上会做持久连接,这对于我正在做的爬行类型来说是完美的。

代码语言:javascript
复制
package test;

import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args) {

         try {

            URL url = new URL("http://github.com/XXXXXXXXXXXXXX");
            URLConnection connection = url.openConnection();

            DataInputStream inStream = new DataInputStream(connection.getInputStream());
            String inputLine;

            while ((inputLine = inStream.readLine()) != null) {
                System.out.println(inputLine);
            }
            inStream.close();
        } catch (MalformedURLException me) {
            System.err.println("MalformedURLException: " + me);
        } catch (IOException ioe) {
            System.err.println("IOException: " + ioe);
        }
    }
}

成功了,谢谢:这是我想出来的--只是粗略的概念证明:

代码语言:javascript
复制
import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args) {
//InputStream error = ((HttpURLConnection) connection).getErrorStream();

        URL url = null;
        URLConnection connection = null;
        String inputLine = "";

        try {

            url = new URL("http://verelo.com/asdfrwdfgdg");
            connection = url.openConnection();

            DataInputStream inStream = new DataInputStream(connection.getInputStream());

            while ((inputLine = inStream.readLine()) != null) {
                System.out.println(inputLine);
            }
            inStream.close();
        } catch (MalformedURLException me) {
            System.err.println("MalformedURLException: " + me);
        } catch (IOException ioe) {
            System.err.println("IOException: " + ioe);

            InputStream error = ((HttpURLConnection) connection).getErrorStream();

            try {
                int data = error.read();
                while (data != -1) {
                    //do something with data...
                    //System.out.println(data);
                    inputLine = inputLine + (char)data;
                    data = error.read();
                    //inputLine = inputLine + (char)data;
                }
                error.close();
            } catch (Exception ex) {
                try {
                    if (error != null) {
                        error.close();
                    }
                } catch (Exception e) {

                }
            }
        }

        System.out.println(inputLine);
    }
}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-02-03 22:15:21

简单:

代码语言:javascript
复制
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
if (connection instanceof HttpURLConnection) {
   HttpURLConnection httpConn = (HttpURLConnection) connection;
   int statusCode = httpConn.getResponseCode();
   if (statusCode != 200 /* or statusCode >= 200 && statusCode < 300 */) {
     is = httpConn.getErrorStream();
   }
}

您可以参考Javadoc进行解释。我处理这个问题的最好方法是:

代码语言:javascript
复制
URLConnection connection = url.openConnection();
InputStream is = null;
try {
    is = connection.getInputStream();
} catch (IOException ioe) {
    if (connection instanceof HttpURLConnection) {
        HttpURLConnection httpConn = (HttpURLConnection) connection;
        int statusCode = httpConn.getResponseCode();
        if (statusCode != 200) {
            is = httpConn.getErrorStream();
        }
    }
}
票数 49
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/9129766

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档