前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >MapReduce快速入门系列(10) | 二次排序和辅助排序案例(GroupingComparator分组)

MapReduce快速入门系列(10) | 二次排序和辅助排序案例(GroupingComparator分组)

作者头像
不温卜火
发布2020-10-28 15:27:34
5450
发布2020-10-28 15:27:34
举报
文章被收录于专栏:不温卜火不温卜火

Hello,大家好!博主上篇讲解了合并,这篇要讲的是辅助排序。如何讲解这个章节呢?首先先对什么是合并进行解释,然后通过案例进行证明。

一. GroupingComparator分组的简介

什么是GroupingComparator分组(辅助排序)?   对Reduce阶段的数据根据某一个或几个字段进行分组。

分组排序的步骤:

  • 1. 自定义类继承WritableComparator
  • 2. 重写compare()方法
代码语言:javascript
复制
@Override
public int compare(WritableComparable a, WritableComparable b) {
		// 比较的业务逻辑
		return result;
}
  • 3. 创建一个构造将比较对象的类传给父类
代码语言:javascript
复制
protected OrderGroupingComparator() {
		super(OrderBean.class, true);
}

二. 根据案例分析

2.1 需求

  • 1. 有如下订单数据

订单id

商品id

成交金额

0000001

Pdt_01

222.8

Pdt_02

33.8

0000001

Pdt_03

522.8

Pdt_04

122.4

Pdt_05

722.4

0000001

Pdt_06

232.8

Pdt_02

33.8

现在需要求出每一个订单中最贵的商品。

  • 2. 输入的数据
1
1
  • 3. 期望输出数据
代码语言:javascript
复制
1	222.8
2	722.4
3	232.8

2.2 需求分析

(1)利用“订单id和成交金额”作为key,可以将Map阶段读取到的所有订单数据按照id升序排序,如果id相同再按照金额降序排序,发送到Reduce。 (2)在Reduce端利用groupingComparator将订单id相同的kv聚合成组,然后取第一个即是该订单中最贵商品,如下图所示所示。

2
2

2.3 代码实现

1. 定义订单信息OrderBean类

代码语言:javascript
复制
package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderBean implements WritableComparable<OrderBean> {

    private String orderId;
    private String productId;
    private double price;

    @Override
    public String toString() {
        return orderId + "\t" + productId + "\t" + price;
    }

    public String getOrderId() {
        return orderId;
    }

    public void setOrderId(String orderId) {
        this.orderId = orderId;
    }

    public String getProductId() {
        return productId;
    }

    public void setProductId(String productId) {
        this.productId = productId;
    }

    public double getPrice() {
        return price;
    }

    public void setPrice(double price) {
        this.price = price;
    }

    @Override
    public int compareTo(OrderBean o) {

        int compare = this.orderId.compareTo(o.orderId);

        if (compare == 0) {
            return Double.compare(o.price, this.price);
        } else {
            return compare;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(orderId);
        out.writeUTF(productId);
        out.writeDouble(price);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.orderId = in.readUTF();
        this.productId = in.readUTF();
        this.price = in.readDouble();
    }
}

2. 编写OrderSortMapper类

代码语言:javascript
复制
package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable> {


    private OrderBean orderBean = new OrderBean();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] fields = value.toString().split("\t");
        orderBean.setOrderId(fields[0]);
        orderBean.setProductId(fields[1]);
        orderBean.setPrice(Double.parseDouble(fields[2]));
        context.write(orderBean, NullWritable.get());
    }
}

3. 编写OrderSortGroupingComparator类

代码语言:javascript
复制
package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderComparator extends WritableComparator {

    protected OrderComparator() {
        super(OrderBean.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        OrderBean oa = (OrderBean) a;
        OrderBean ob = (OrderBean) b;

        return oa.getOrderId().compareTo(ob.getOrderId());
    }
}

4. 编写OrderSortReducer类

代码语言:javascript
复制
package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable> {

    @Override
    protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
        Iterator<NullWritable> iterator = values.iterator();
        for (int i = 0; i < 2; i++) {
            if (iterator.hasNext()) {
                context.write(key, iterator.next());
            }
        }
    }
}

5. 编写OrderSortDriver类

代码语言:javascript
复制
package com.buwenbuhuo.groupingcomparator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
/**
 * @author 卜温不火
 * @create 2020-04-24 23:43
 * com.buwenbuhuo.groupingcomparator - the name of the target package where the new class or interface will be created.
 * mapreduce0422 - the name of the current project.
 */
public class OrderDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Job job = Job.getInstance(new Configuration());

        job.setJarByClass(OrderDriver.class);

        job.setMapperClass(OrderMapper.class);
        job.setReducerClass(OrderReducer.class);

        job.setMapOutputKeyClass(OrderBean.class);
        job.setMapOutputValueClass(NullWritable.class);

        job.setGroupingComparatorClass(OrderComparator.class);

        job.setOutputKeyClass(OrderBean.class);
        job.setOutputValueClass(NullWritable.class);

        FileInputFormat.setInputPaths(job, new Path("d:\\input"));
        FileOutputFormat.setOutputPath(job, new Path("d:\\output"));

        boolean b = job.waitForCompletion(true);
        System.exit(b ? 0 : 1);
    }
}

2.4 运行与结果实现

  • 1. 运行
3
3
  • 2. 结果与对比
4
4
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2020-04-28 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一. GroupingComparator分组的简介
  • 二. 根据案例分析
    • 2.1 需求
      • 2.2 需求分析
        • 2.3 代码实现
          • 1. 定义订单信息OrderBean类
          • 2. 编写OrderSortMapper类
          • 3. 编写OrderSortGroupingComparator类
          • 4. 编写OrderSortReducer类
          • 5. 编写OrderSortDriver类
        • 2.4 运行与结果实现
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档