前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >java爬虫知识盲区整理

java爬虫知识盲区整理

作者头像
大忽悠爱学习
发布2021-12-17 20:46:32
3920
发布2021-12-17 20:46:32
举报
文章被收录于专栏:c++与qt学习

java爬虫知识盲区整理

HttpClient重定向处理

【HttpClient4.5中文教程】八.终止请求和重定向处理

首先说说HttpClient和浏览器的区别

我们从浏览器发起一笔请求,浏览器则会帮你处理重定向、缓存等事情。这也就是为什么用浏览器表单post提交后,不管服务端如何重定向,都能正常接收到服务端返回的数据。

但是用HttpClient呢,你会发现,请求后,会返回302,因为POST方式提交HttpClient是不会帮你处理重定向的。这时候怎么办呢?

方法一:(自己手动处理)

代码语言:javascript
复制
HttpClient httpClient = HttpClients.createDefault();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = httpclient.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

        Header header=response.getFirstHeader("Location");

        //重定向地址
        String location =  header.getValue();
        System.out.println(location);

        //然后再对新的location发起请求即可

        HttpGet httpGet = new HttpGet(location);
        CloseableHttpResponse response2 = httpclient.execute(httpGet);
        System.out.println("返回报文"+EntityUtils.toString(response2.getEntity(), "UT-F-8"));

方法二:(已有工具类)

代码语言:javascript
复制
HttpClientBuilder builder = HttpClients.custom()
            .disableAutomaticRetries() //关闭自动处理重定向
            .setRedirectStrategy(new LaxRedirectStrategy());//利用LaxRedirectStrategy处理POST重定向问题

       CloseableHttpClient client = builder.build();

        HttpPost httpPost= new HttpPost(http://ip:port/xxx);

        CloseableHttpResponse response = client.execute(httpPost);

        int statusCode = response.getStatusLine().getStatusCode();
        System.out.println("statusCode=="+statusCode); //返回码

         System.out.println("返回报文"+EntityUtils.toString(response.getEntity(), "UT-F-8"));

HttpClient获取Cookie的两种方式

一、旧版本的HttpClient获取Cookies

p.s. 该方式官方已不推荐使用

使用DefaultHttpClient类实例化httpClient对象:

代码语言:javascript
复制
public static String dooPost_deprecated(String url, Map<String, String> map, String charset) {
        DefaultHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try {
            httpClient = new DefaultHttpClient();
            httpPost = new HttpPost(url);
            // 设置参数
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) {
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            }
            if (list.size() > 0) {
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            }
            HttpResponse response = httpClient.execute(httpPost);
            System.out.println(response.getStatusLine().getStatusCode());
            String JSESSIONID = null;
            String cookie_user = null;
            //获得Cookies
            CookieStore cookieStore = httpClient.getCookieStore();
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) {
                //遍历Cookies
                System.out.println(cookies.get(i));
                System.out.println("cookiename=="+cookies.get(i).getName());
                System.out.println("cookieValue=="+cookies.get(i).getValue());
                System.out.println("Domain=="+cookies.get(i).getDomain());
                System.out.println("Path=="+cookies.get(i).getPath());
                System.out.println("Version=="+cookies.get(i).getVersion());

                if (cookies.get(i).getName().equals("JSESSIONID")) {
                    JSESSIONID = cookies.get(i).getValue();
                }
                if (cookies.get(i).getName().equals("cookie_user")) {
                    cookie_user = cookies.get(i).getValue();
                }
            }
            if (cookie_user != null) {
                result = JSESSIONID;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return result;
    }

二、新版本的HttpClient获取Cookies

使用CloseableHttpClient类实例化httpClient对象:

代码语言:javascript
复制
public static String doPost(Map<String, String> map, String charset) {
        CloseableHttpClient httpClient = null;
        HttpPost httpPost = null;
        String result = null;
        try {
            CookieStore cookieStore = new BasicCookieStore();
            httpClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build();
            httpPost = new HttpPost("http://localhost:8080/testtoolmanagement/LoginServlet");
            List<NameValuePair> list = new ArrayList<NameValuePair>();
            Iterator<Map.Entry<String, String>> iterator = map.entrySet().iterator();
            while (iterator.hasNext()) {
                Entry<String, String> elem = (Entry<String, String>) iterator.next();
                list.add(new BasicNameValuePair(elem.getKey(), elem.getValue()));
            }
            if (list.size() > 0) {
                UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list, charset);
                httpPost.setEntity(entity);
            }
            httpClient.execute(httpPost);
            String JSESSIONID = null;
            String cookie_user = null;
            List<Cookie> cookies = cookieStore.getCookies();
            for (int i = 0; i < cookies.size(); i++) {
                if (cookies.get(i).getName().equals("JSESSIONID")) {
                    JSESSIONID = cookies.get(i).getValue();
                }
                if (cookies.get(i).getName().equals("cookie_user")) {
                    cookie_user = cookies.get(i).getValue();
                }
            }
            if (cookie_user != null) {
                result = JSESSIONID;
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return result;
    }
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021/12/16 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • java爬虫知识盲区整理
  • HttpClient重定向处理
  • HttpClient获取Cookie的两种方式
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档