专栏首页不温卜火MapReduce快速入门系列(10) | 二次排序和辅助排序案例(GroupingComparator分组)

MapReduce快速入门系列(10) | 二次排序和辅助排序案例(GroupingComparator分组)

Hello,大家好!博主上篇讲解了合并,这篇要讲的是辅助排序。如何讲解这个章节呢?首先先对什么是合并进行解释,然后通过案例进行证明。

一. GroupingComparator分组的简介

什么是GroupingComparator分组(辅助排序)?   对Reduce阶段的数据根据某一个或几个字段进行分组。

分组排序的步骤:

  • 1. 自定义类继承WritableComparator
  • 2. 重写compare()方法
@Override
public int compare(WritableComparable a, WritableComparable b) {
		// 比较的业务逻辑
		return result;
}
  • 3. 创建一个构造将比较对象的类传给父类
protected OrderGroupingComparator() {
		super(OrderBean.class, true);
}

二. 根据案例分析

2.1 需求

  • 1. 有如下订单数据

订单id

商品id

成交金额

0000001

Pdt_01

222.8

Pdt_02

33.8

0000001

Pdt_03

522.8

Pdt_04

122.4

Pdt_05

722.4

0000001

Pdt_06

232.8

Pdt_02

33.8

现在需要求出每一个订单中最贵的商品。

  • 2. 输入的数据
  • 3. 期望输出数据
1	222.8
2	722.4
3	232.8

2.2 需求分析

(1)利用“订单id和成交金额”作为key,可以将Map阶段读取到的所有订单数据按照id升序排序,如果id相同再按照金额降序排序,发送到Reduce。 (2)在Reduce端利用groupingComparator将订单id相同的kv聚合成组,然后取第一个即是该订单中最贵商品,如下图所示所示。

2.3 代码实现

1. 定义订单信息OrderBean类

package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderBean implements WritableComparable<OrderBean> {

    private String orderId;
    private String productId;
    private double price;

    @Override
    public String toString() {
        return orderId + "\t" + productId + "\t" + price;
    }

    public String getOrderId() {
        return orderId;
    }

    public void setOrderId(String orderId) {
        this.orderId = orderId;
    }

    public String getProductId() {
        return productId;
    }

    public void setProductId(String productId) {
        this.productId = productId;
    }

    public double getPrice() {
        return price;
    }

    public void setPrice(double price) {
        this.price = price;
    }

    @Override
    public int compareTo(OrderBean o) {

        int compare = this.orderId.compareTo(o.orderId);

        if (compare == 0) {
            return Double.compare(o.price, this.price);
        } else {
            return compare;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(orderId);
        out.writeUTF(productId);
        out.writeDouble(price);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.orderId = in.readUTF();
        this.productId = in.readUTF();
        this.price = in.readDouble();
    }
}

2. 编写OrderSortMapper类

package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {


    private OrderBean orderBean = new OrderBean();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] fields = value.toString().split("\t");
        orderBean.setOrderId(fields[0]);
        orderBean.setProductId(fields[1]);
        orderBean.setPrice(Double.parseDouble(fields[2]));
        context.write(orderBean, NullWritable.get());
    }
}

3. 编写OrderSortGroupingComparator类

package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderComparator extends WritableComparator {

    protected OrderComparator() {
        super(OrderBean.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        OrderBean oa = (OrderBean) a;
        OrderBean ob = (OrderBean) b;

        return oa.getOrderId().compareTo(ob.getOrderId());
    }
}

4. 编写OrderSortReducer类

package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {

    @Override
    protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
        Iterator<NullWritable> iterator = values.iterator();
        for (int i = 0; i < 2; i++) {
            if (iterator.hasNext()) {
                context.write(key, iterator.next());
            }
        }
    }
}

5. 编写OrderSortDriver类

package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Job job = Job.getInstance(new Configuration());

        job.setJarByClass(OrderDriver.class);

        job.setMapperClass(OrderMapper.class);
        job.setReducerClass(OrderReducer.class);

        job.setMapOutputKeyClass(OrderBean.class);
        job.setMapOutputValueClass(NullWritable.class);

        job.setGroupingComparatorClass(OrderComparator.class);

        job.setOutputKeyClass(OrderBean.class);
        job.setOutputValueClass(NullWritable.class);

        FileInputFormat.setInputPaths(job, new Path("d:\\input"));
        FileOutputFormat.setOutputPath(job, new Path("d:\\output"));

        boolean b = job.waitForCompletion(true);
        System.exit(b ? 0 : 1);
    }
}

2.4 运行与结果实现

  • 1. 运行
  • 2. 结果与对比

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • MapReduce快速入门系列(9) | Shuffle之Combiner合并

    每一个 map 都可能会产生大量的本地输出,Combiner 的作用就是对map 端的输出先做一次合并,以减少在 map 和 reduce 节点之间的数据传输量...

    不温卜火
  • Flume快速入门系列(9) | 如何自定义Sink

    Sink不断地轮询Channel中的事件且批量地移除它们,并将这些事件批量写入到存储或索引系统、或者被发送到另一个Flume Agent。   Sink是完...

    不温卜火
  • MapReduce快速入门系列(6) | Shuffle之Partition分区

      Partition分区:按照一定的分区规则,将key value的list进行分区。分区的创建分为默认的和自定义两种。

    不温卜火
  • 小程序云函数访问第三方服务器错误解决

    1.报以下错误大概率是因为got版本问题 我是直接npm install的,got版本是10.x

    薛定喵君
  • 4.《python自省指南》学习

      前面几篇博客我都是通过python自省来分析代码并试图得出结论。当然,仅仅通过自省能解决的问题有限,但遇到问题还是不自主的去用这个功能,觉得对于认识代码的含...

    py3study
  • Spring Boot支持文件上传

    十毛
  • SoapUI中是如何断言的呢(一)

    将请求发送到Web服务器后,就会收到响应。我们需要验证响应是否包含我们期望的数据。为了验证响应,我们需要使用断言。

    用户7466307
  • NEJM:Waving Hello to Noninvasive Deep-Brain Stimulation

    近日多伦多大学Andres M. Lozano等人在新英格兰医学杂志发文,介绍了无创深部脑刺激技术。通过两个频率差异较小的电场信号刺激,激活深部大脑细胞,同时避...

    用户1279583
  • 正则表达式语法实例详解

    斑马
  • 在浏览器扩展程序中进行: 跨域 XMLHttpRequest 请求

    跨域 XMLHttpRequest 请求 https://crxdoc-zh.appspot.com/extensions/xhr

    一个会写诗的程序员

扫码关注云+社区

领取腾讯云代金券