专栏首页灵儿的笔记武汉疫情系列(工具类)|JAVA爬取丁香医生|腾讯新闻|新浪等全国新型肺炎疫情实时动态

武汉疫情系列(工具类)|JAVA爬取丁香医生|腾讯新闻|新浪等全国新型肺炎疫情实时动态

1、HttpPojo.java

用于爬取请求的时候,模拟请求header

import java.io.Serializable;

/**
 * Created by yjl on 2019/5/30.
 */
public class HttpPojo implements Serializable{
    private static final long serialVersionUID = -2019661705306735496L;

    private String httpIp;
    private String httpHost;
    private String httpAccept;
    private String httpConnection;
    private String httpUserAgent;
    private String httpReferer;
    private String httpOrigin;
    private String httpCookie;
    private String httpContentType;

    public String getHttpCookie() {
        return httpCookie;
    }

    public void setHttpCookie(String httpCookie) {
        this.httpCookie = httpCookie;
    }

    public String getHttpIp() {
        return httpIp;
    }

    public void setHttpIp(String httpIp) {
        this.httpIp = httpIp;
    }

    public String getHttpHost() {
        return httpHost;
    }

    public void setHttpHost(String httpHost) {
        this.httpHost = httpHost;
    }

    public String getHttpAccept() {
        return httpAccept;
    }

    public void setHttpAccept(String httpAccept) {
        this.httpAccept = httpAccept;
    }

    public String getHttpConnection() {
        return httpConnection;
    }

    public void setHttpConnection(String httpConnection) {
        this.httpConnection = httpConnection;
    }

    public String getHttpUserAgent() {
        return httpUserAgent;
    }

    public void setHttpUserAgent(String httpUserAgent) {
        this.httpUserAgent = httpUserAgent;
    }

    public String getHttpReferer() {
        return httpReferer;
    }

    public void setHttpReferer(String httpReferer) {
        this.httpReferer = httpReferer;
    }

    public String getHttpOrigin() {
        return httpOrigin;
    }

    public void setHttpOrigin(String httpOrigin) {
        this.httpOrigin = httpOrigin;
    }

    public String getHttpContentType() {
        return httpContentType;
    }

    public void setHttpContentType(String httpContentType) {
        this.httpContentType = httpContentType;
    }
}

用法:

 public static String getStatisticsService(){
        String url="https://ncov.dxy.cn/ncovh5/view/pneumonia";
        //模拟请求
        HttpPojo httpPojo = new HttpPojo();
        httpPojo.setHttpHost("ncov.dxy.cn");
        httpPojo.setHttpAccept("*/*");
        httpPojo.setHttpConnection("keep-alive");
        httpPojo.setHttpUserAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        httpPojo.setHttpReferer("https://ncov.dxy.cn");
        httpPojo.setHttpOrigin("https://ncov.dxy.cn");
       

        return null;
    }

2、httpSendGet()方法

是一个http请求方法,用于请求第三方网站数据

private static String httpSendGet(String url, Map paramObj, HttpPojo httpPojo){
        String result = "";
        String urlName = url + "?" + parseParam(paramObj);

        BufferedReader in=null;
        try {

            URL realURL = new URL(urlName);
            URLConnection conn = realURL.openConnection();
            //伪造ip访问
            String ip = randIP();
            System.out.println("目前伪造的ip:"+ip);
            conn.setRequestProperty("X-Forwarded-For", ip);
            conn.setRequestProperty("HTTP_X_FORWARDED_FOR", ip);
            conn.setRequestProperty("HTTP_CLIENT_IP", ip);
            conn.setRequestProperty("REMOTE_ADDR", ip);
            conn.setRequestProperty("Host", httpPojo.getHttpHost());
            conn.setRequestProperty("accept", httpPojo.getHttpAccept());
            conn.setRequestProperty("connection", httpPojo.getHttpConnection());
            conn.setRequestProperty("user-agent", httpPojo.getHttpUserAgent());
            conn.setRequestProperty("Referer",httpPojo.getHttpReferer()); //伪造访问来源
            conn.setRequestProperty("Origin", httpPojo.getHttpOrigin()); //伪造访问域名
            conn.connect();
            Map<String, List<String>> map = conn.getHeaderFields();
            for (String s : map.keySet()) {
                //System.out.println(s + "-->" + map.get(s));
            }
            in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "utf-8"));
            String line;
            while ((line = in.readLine()) != null) {
                result += "\n" + line;
            }


        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if (in!=null){
                try {
                    in.close();
                }catch (Exception e){
                    e.printStackTrace();
                }

            }
        }
        return result;
    }

3、parseParam()方法

是一个解析map,将map的各个key和value用&拼接

public static String parseParam(Map paramObj){
        String param="";
        if (paramObj!=null&&!paramObj.isEmpty()){
            for (Object key:paramObj.keySet()){
                String value = paramObj.get(key).toString();
                param+=(key+"="+value+"&");

            }
        }
        return param;
    }

3、randIP()方法

伪造ip地址方法

public static String randIP() {
        Random random = new Random(System.currentTimeMillis());
        return (random.nextInt(255) + 1) + "." + (random.nextInt(255) + 1)
                + "." + (random.nextInt(255) + 1) + "."
                + (random.nextInt(255) + 1);
    }

4、getRegContent()方法

通过正则获取指定数据

public static String getRegContent(String reg,String content,int index){
        Pattern pattern = Pattern.compile(reg); 	// 讲编译的正则表达式对象赋给pattern
        Matcher matcher = pattern.matcher(content);
        String group="";
        while (matcher.find()){
            group= matcher.group(index);
            //System.out.println(group);
        }
        return group;
    }

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 武汉疫情系列(2)|java爬取【新型冠状病毒肺炎确诊患者同行程查询工具】数据

    了解到已经有大佬们整理和制作了同行程的查询网站,能够帮助到更多人解决问题,这里感谢一下这些无私奉献的大佬们。我这里的爬取并没有恶意的意思,正如我提到的,我是希望...

    小小鱼儿小小林
  • 武汉疫情系列(3)|java爬取腾讯【新型冠状病毒肺炎实时辟谣】较真查证平台数据

    腾讯有一个【新型冠状病毒肺炎实时辟谣】的一个平台,地址:https://vp.fact.qq.com/home

    小小鱼儿小小林
  • [亲测可用]hibernate调用Oracle存储过程|Spring Data JPA调用Oracle存储过程方法

    但是在代码里如何调用存储过程呢,我试了一些网上大家提供的方法,基本没用效果,包括在@Query后面加{call P_ACCOUNT(?,?)} 什么存储过程名称...

    小小鱼儿小小林
  • JMail接收发送邮件使用参考

    用户2135432
  • java练习本(2019-07-06)

    String类型会指向堆内存中的存储空间,因此当通过==对String类型进行比较时,比较的是相应变量的地址。只要指向的是同一地址则两个String类型==比较...

    微笑的小小刀
  • 大数据算法设计模式(2) - 左外链接(leftOuterJoin) spark实现

    左外链接(leftOuterJoin) spark实现 package com.kangaroo.studio.algorithms.join; impor...

    用户1225216
  • QQ小程序支付

    首先是配置类,设置为包内访问权限,其实应该放于properties文件,或者直接配置在xml中,偷了个懒直接写在了代码中

    WindrunnerMax
  • 厉害了,关于String的10道经典面试题。

    1、String是基本数据类型吗? 2、String是可变的话? 3、怎么比较两个字符串的值一样,怎么比较两个字符串是否同一对象? 4、switch中可以使用S...

    Java技术栈
  • 关于SpringMVC中如何把查询数据全转成String类型

    上帝
  • java中两个map比较

    ydymz

扫码关注云+社区

领取腾讯云代金券