专栏首页CS-Data Structure-Huffman哈夫曼编码和数据压缩Java
原创

哈夫曼编码和数据压缩Java

哈夫曼的实现和代码:

Huffman Node 的类定义:

class HFMTreeNode {
    int freq; //权值
    String s; //节点数据

    HFMTreeNode parent;
    HFMTreeNode left;
    HFMTreeNode  right;

    public  HFMTreeNode(int freq, String s){  //构造函数
        this.freq = freq;
        this.s = s;
    }
}

Huffman Tree 的类定义:

map: Map, 储存每个string及其对应的频率

encode_map: Map, 储存每个string及其编码

pq: 优先队列,每次插入按照HFMTreeNode的频率,从小到大排列

public class HFMTree {
    private  String[] data;
    private HFMTreeNode root;

    //Use dictionary to store: string-frequency 
    private Map<String, Integer> map = new HashMap<>();
    
    //Use dictionary to store: string-encoded string
    private Map<String, String> encode_map = new HashMap<>();
    
    //Use priority queque to store: HFMTreeNode 
    private PriorityQueue<HFMTreeNode> pq = new PriorityQueue<>(
            (HFMTreeNode a, HFMTreeNode b) -> {
                if( a.freq < b.freq)
                    return -1;
                else
                    return 1;
            }
    );
    
    //构造函数
    public HFMTree(String[] data){
        this.data = data;
    }
    
    //类方法:
    public void count_freq();                                                       //第一步
    public void create_node_in_pq();                                                //第二步
    public HFMTreeNode build_Tree();                                                //第三步
    public void setEncode_map(HFMTreeNode root, String code);                       //第四步
    public String encoded(String[] data);                                           //第五步
    public String decoded(String encoded_string);                                   //第六步
    public String search_encoded_map(String value);                  //第六步的hepler function 
    
    //Spilt string into string array
    static public String[] String_to_StringArr(String string);
    public void tree_print(HFMTreeNode root); //traversal the tree 
    public void map_print(); //print the string-freq information
    public void encode_map_print(); //print the string-encoded string information  

}

具体实现:

第一步:计算每个string的权值(出现的频率)

第二步:构造优先队列

第三步:构造Huffman Tree

第四步:遍历Huffman Tree, 计算并保存每个string对应的编码

第五步:加密

第六步:解密

第一步:计算每个string的权值(出现的频率), 存储到map结构里面

//Step1: Count each string freq and put them in the map set
    public void count_freq(){
        for(int i = 0; i<data.length; i++){
            if(map.containsKey(data[i]) ){
                int freq = map.get(data[i]) + 1;
                map.put(data[i], freq);
            }else
                map.put(data[i], 1);
        }
    }

第二步:创建新的HFMTreeNode, 在优先队列里面添加每一个Huffman Node

//Step2 : Create each  tree node from map into pq
    public void create_node_in_pq(){
        for( String s: map.keySet()){
            HFMTreeNode hfmTreeNode = new HFMTreeNode(map.get(s), s);
            pq.add(hfmTreeNode);
        }
    }

第三步:每次去优先队列里面的前两个,组成一个新的Huffman Node, 添加到优先队列里面,再重复第三步,直到优先队列只剩下一个Huffman Node

//Step3: return the root node of Huffman tree
    public HFMTreeNode build_Tree(){
        if(pq.size() == 0)
            return null;
        if(pq.size() == 1)
            return pq.peek();

        while(pq.size() >= 2){
            HFMTreeNode first = pq.poll();
            HFMTreeNode second = pq.poll();

            int new_freq = first.freq + second.freq;
            HFMTreeNode new_node = new HFMTreeNode(new_freq, null);
            new_node.left = first;
            new_node.right = second;

            first.parent = new_node;
            second.parent = new_node;

            pq.offer(new_node);
        }

        //最后只剩一个,就是 root;
        this.root = pq.peek();
        return pq.peek();

    }

第四步:遍历Huffman Tree, 找到每个叶节点,存入每个string对应的编码

//Step4: encode the string and store them into map
    public void setEncode_map(HFMTreeNode root, String code){
        if(root ==null)
            return;

        if(root.left == null && root.right ==null){
            encode_map.put(root.s, code);
        }

        setEncode_map(root.left, code + "0");
        setEncode_map(root.right, code+"1");

    }
    

第五步:对输入的string(data)进行加密,并输出保存到新的文件

//Step5: encode the string 
    public String encoded(String[] data){
        String out = "";
        for(int i = 0;i<data.length;i++){
            String temp = data[i];
            out += encode_map.get(temp);
        }
        return out;
    }

第六步:对加密的文件进行解密,并输出保存到新的文件

//Step6: decode the encoded string
    public String decoded(String encoded_string){
        String de_code = "";
        String temp = "";
        
        for(int i  = 0;i<encoded_string.length();i++){
            temp += encoded_string.charAt(i);
            //只要在encode_map 里面有这个value,我们就添加对应的key
            if(encode_map.containsValue(temp)){
                de_code+= search_encoded_map(temp);
                temp = "";
            }
        }
        return de_code;
    }

    //Helper function:
    //Traversal encoded_map, return the key by searching the value
    //通过value来找key
    public String search_encoded_map(String value){
        for(String key: encode_map.keySet()){
            if(encode_map.get(key).equals(value))
                return key;
        }
        return null;
    }

各种信息的打印:

//BFS print the tree information
    public void tree_print(HFMTreeNode root){
        System.out.println("Tree Print: ");
        Queue<HFMTreeNode> queue= new LinkedList<>();
        if(root == null){
            return;
        }

        queue.add(root);
        while( ! queue.isEmpty()) {
            int size = queue.size();
            for(int i= 0; i<size;i++){

                HFMTreeNode temp = queue.poll();
                if(temp.left != null ) queue.add(temp.left);
                if(temp.right != null) queue.add(temp.right);

                System.out.print(temp.s + " "+ temp.freq+" ");
            }
            System.out.println();
        }
    }
    
    
    //print the map information
    public void map_print(){
        System.out.println("Map Print: ");
        for(String k : map.keySet())
            System.out.println( k + " " + map.get(k));
    }

    //print the encode map information
    public void encode_map_print(){
        System.out.println("Encode_map Print: ");
        for(String k : encode_map.keySet())
            System.out.println( k + " " + encode_map.get(k));
    }


    //print the pq information
    public void pq_print(){
        System.out.println("Pq Print: ");
        HFMTreeNode cur = pq.poll();
        while(! pq.isEmpty()){
            System.out.println(cur.s + " "+ cur.freq);
            cur = pq.poll();
        }
    }
    
    
    

把String 分成一个一个的,组成String[]

//Spilt string into string array
static public String[] String_to_StringArr(String string){
    char[] chars = string.toCharArray();
    String[] s = new String[chars.length];

    for(int i = 0; i< chars.length;i++){
        s[i] = String.valueOf(chars[i]);
    }

    return s;
}

后面会有ReadFile类来帮助我们读取和存储文件,我在这儿就不一一介绍啦~

完整代码

1. HFMTree类:

package 哈夫曼;

import java.util.*;

public class HFMTree {
    private  String[] data;
    private HFMTreeNode root;

    //using dictionary to store the freq of each string
    private Map<String, Integer> map = new HashMap<>();

    private Map<String, String> encode_map = new HashMap<>();

    //
    private PriorityQueue<HFMTreeNode> pq = new PriorityQueue<>(
            (HFMTreeNode a, HFMTreeNode b) -> {
                if( a.freq < b.freq)
                    return -1;
                else
                    return 1;
            }
    );


    public HFMTree(String[] data){
        this.data = data;
    }

    //-----------------------核心代码-----------------------
    //Step1: Count each string freq and put them in the map set
    public void count_freq(){
        for(int i = 0; i<data.length; i++){
            if(map.containsKey(data[i]) ){
                int freq = map.get(data[i]) + 1;
                map.put(data[i], freq);
            }else
                map.put(data[i], 1);
        }
    }


    //Step2 : Create each  tree node from map into pq
    public void create_node_in_pq(){
        for( String s: map.keySet()){
            HFMTreeNode hfmTreeNode = new HFMTreeNode(map.get(s), s);
            pq.add(hfmTreeNode);
        }
    }


    //Step3: return the root node of Huffman tree
    public HFMTreeNode build_Tree(){
        if(pq.size() == 0)
            return null;


        if(pq.size() == 1)
            return pq.peek();

        while(pq.size() >= 2){
            HFMTreeNode first = pq.poll();
            HFMTreeNode second = pq.poll();

            int new_freq = first.freq + second.freq;
            HFMTreeNode new_node = new HFMTreeNode(new_freq, null);
            new_node.left = first;
            new_node.right = second;

            first.parent = new_node;
            second.parent = new_node;

            pq.offer(new_node);
        }

        //最后只剩一个,就是 root;
        this.root = pq.peek();
        return pq.peek();

    }

    //Step4: encode the string and store them into map
    public void setEncode_map(HFMTreeNode root, String code){
        if(root ==null)
            return;

        if(root.left == null && root.right ==null){
            encode_map.put(root.s, code);
        }

        setEncode_map(root.left, code + "0");
        setEncode_map(root.right, code+"1");

    }

    //Step5: encode the string
    public String encoded(String[] data){
        String out = "";
        for(int i = 0;i<data.length;i++){
            String temp = data[i];
            out += encode_map.get(temp);
        }
        return out;
    }

    //Step6: decode the encoded string
    public String decoded(String encoded_string){
        String de_code = "";
        String temp = "";
        for(int i  = 0;i<encoded_string.length();i++){
            temp += encoded_string.charAt(i);
            //只要在encode_map 里面有这个value,我们就添加对应的key
            if(encode_map.containsValue(temp)){
                de_code+= search_encoded_map(temp);
                temp = "";
            }
        }
        return de_code;
    }
    
    //Helper function:
    //Traversal encoded_map, return the key by searching the value
    //通过value来找key
    public String search_encoded_map(String value){
        for(String key: encode_map.keySet()){
            if(encode_map.get(key).equals(value))
                return key;
        }
        return null;
    }


    //-----------------------核心代码-----------------------




    //BFS print the tree information
    public void tree_print(HFMTreeNode root){
        System.out.println("Tree Print: ");
        Queue<HFMTreeNode> queue= new LinkedList<>();
        if(root == null){
            return;
        }

        queue.add(root);
        while( ! queue.isEmpty()) {
            int size = queue.size();
            for(int i= 0; i<size;i++){

                HFMTreeNode temp = queue.poll();
                if(temp.left != null ) queue.add(temp.left);
                if(temp.right != null) queue.add(temp.right);

                System.out.print(temp.s + " "+ temp.freq+" ");
            }
            System.out.println();
        }
    }


    //print the map information
    public void map_print(){
        System.out.println("Map Print: ");
        for(String k : map.keySet())
            System.out.println( k + " " + map.get(k));
    }

    //print the encode map information
    public void encode_map_print(){
        System.out.println("Encode_map Print: " );
        for(String k : encode_map.keySet())
            System.out.println( k + " " + encode_map.get(k));
    }


    //print the pq information
    public void pq_print(){
        System.out.println("Pq Print: ");
        HFMTreeNode cur = pq.poll();
        while(! pq.isEmpty()){
            System.out.println(cur.s + " "+ cur.freq);
            cur = pq.poll();
        }
    }


    //Spilt string into string array
    static public String[] String_to_StringArr(String string){
        char[] chars = string.toCharArray();
        String[] s = new String[chars.length];

        for(int i = 0; i< chars.length;i++){
            s[i] = String.valueOf(chars[i]);
        }

        return s;
    }

    public static void main(String[] args) {
        ReadFile readFile = new ReadFile();

        //从in.txt 读取String,保存到read_string
        String read_string = readFile.readFileByArray("in.txt");
        //对数据进行预处理
        String[] data = String_to_StringArr(read_string);

        //创建对象
        HFMTree hfmTree = new HFMTree(data);
        //计算权值
        hfmTree.count_freq();
        //构造优先队列
        hfmTree.create_node_in_pq();
        //创建树
        HFMTreeNode root = hfmTree.build_Tree();

        //编码
//        hfmTree.tree_print(root);
        hfmTree.setEncode_map(root, "");
        //打印加密map
        hfmTree.encode_map_print();

        //加密
        String encoded_string = hfmTree.encoded(data);
        //把编译码输入到文件里面
        readFile.save(encoded_string, "encode_out.txt");


        //解码
        String decoded_string = hfmTree.decoded(encoded_string);
        //把解码输入到文件里面
        readFile.save(decoded_string, "decode_out.txt");

    }

}


class HFMTreeNode {
    int freq;
    String s;

    HFMTreeNode parent;
    HFMTreeNode left;
    HFMTreeNode  right;

    public  HFMTreeNode(int freq, String s){
        this.freq = freq;
        this.s = s;
    }
}

2. ReadFile类

package 哈夫曼;


//文件数据读写的基本模式java/c++/go
//    I/O模型

//1.读写一下文件,熟练,写,出错,然后改正,是个固定的模式
//2. 读取一个文本文件(统计每个字符的出现频率),输出每个字符的哈夫曼编码
//    读取一个文件,统计每个字节的出现频率,输出不同频率字节的哈夫曼编码


import java.io.FileInputStream;
import java.io.FileOutputStream;

public class ReadFile {


    /**
     *
     * @param fileName 要读取的文件名
     * @return  文件字符串(只能是文本文件)
     */

    public  String  readFileByArray(String fileName){
        try{
            //用输入流对象
            java.io.FileInputStream  fins=new FileInputStream(fileName);
            //得知道文件有多长,多少个字节
            int len=fins.available();
            //无论什么文件,在盘上,都是字节序列
            byte[] data=new byte[len];
            fins.read(data);//将文件中的字节,读进缓冲区
            //将字节数组,转码为文本
            String txt=new String(data);
//            System.out.println("读到的文件是 "+txt);
            return txt;
        }catch(Exception ef){
            ef.printStackTrace();//打印出错的信息
            System.out.println("读取文件出错");
        }
        return null;
    }


    //保存文件
    public boolean  save(String content,String fileName){
        try{
            //输出流
            FileOutputStream fous=new FileOutputStream(fileName);
            //将字符串转为字节数组
            byte[] data=content.getBytes();
            //写入
            fous.write(data);
            fous.close();//水管子用完了,要关
            return true;
        }catch(Exception ef){
            ef.printStackTrace();//输出错误
        }
        return false;

    }

}

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Dubbo源码学习--环境搭建及基础准备(ServiceLoader、ExtensionLoader)

    环境搭建 Github上下载Dubbo最新发布版本,楼主下载版本为2.5.7。 cd到源码解压目录,maven编译,命令为: mvn clean install...

    YGingko
  • 打包生成 zip 文件

    用户5927264
  • Same Tree

  • 2018-05-20

    JavaEdge
  • 《深入理解C# 3.x的新特性》博文系列汇总

    较之C# 2.0, C# 3.x引入了一系列新的特性,为我们编程带来很大的便利,通过有效地利用这些新特性,我们可以编写出更加简洁、优雅的程序。不过这些新特性仅仅...

    蒋金楠
  • springboot集成schedule(深度理解)

    名山丶深处
  • springboot集成schedule(深度理解)

    名山丶深处
  • Spring Boot缓存配置不同到期时间

    这种方式可以实现不同缓存的不同到期时间,但是后面再新增缓存数据的话,都需要再在CacheManager中配置

    十毛
  • java中为什么需要接口

    最近看到论坛里有个帖子在讨论接口和抽象类的区别和作用,这其实也是很多面试官喜欢问的问题,这里我就说说我的总结,顺便说说内部类的作用,当是给刚入门,或者想学习ja...

    bear_fish
  • 每日算法系列【LeetCode 714】买卖股票的最佳时机含手续费

    给定一个整数数组 prices,其中第 i 个元素代表了第 i 天的股票价格 ;非负整数 fee 代表了交易股票的手续费用。

    godweiyang

扫码关注云+社区

领取腾讯云代金券