Hadoop数据类型
BooleanWritable:标准布尔型数值 ByteWritable:单字节数值 DoubleWritable:双字节数值 FloatWritable:浮点数 IntWritable:整型数 LongWritable:长整型数 Text:使用UTF8格式存储的文本 NullWritable:当<key, value>中的key或value为空时使用
1.对于只需要作为“值”出现的数据类型,实现Writable接口即可 2.对于有可能作为“键”出现的数据类型,需要实现WritableComparable接口
##实现Writable接口:
/* DataInput and DataOutput 类是java.io的类 */ public interface Writable { void readFields(DataInput in); void write(DataOutput out); }
下面是一个小例子:
public class Point3D implement Writable { public float x, y, z; public Point3D(float fx, float fy, float fz) { this.x = fx; this.y = fy; this.z = fz; } public Point3D() { this(0.0f, 0.0f, 0.0f); } public void readFields(DataInput in) throws IOException { x = in.readFloat(); y = in.readFloat(); z = in.readFloat(); } public void write(DataOutput out) throws IOException { out.writeFloat(x); out.writeFloat(y); out.writeFloat(z); } public String toString() { return Float.toString(x) + ", " + Float.toString(y) + ", " + Float.toString(z); } }
2、实现WritableComparable接口
public interface WritableComparable<T> { public void readFields(DataInput in); public void write(DataOutput out); public int compareTo(T other); }
先给出下面的简单例子,再做说明和扩展。
public class Point3D inplements WritableComparable { public float x, y, z; public Point3D(float fx, float fy, float fz) { this.x = fx; this.y = fy; this.z = fz; } public Point3D() { this(0.0f, 0.0f, 0.0f); } public void readFields(DataInput in) throws IOException { x = in.readFloat(); y = in.readFloat(); z = in.readFloat(); } public void write(DataOutput out) throws IOException { out.writeFloat(x); out.writeFloat(y); out.writeFloat(z); } public String toString() { return Float.toString(x) + ", " + Float.toString(y) + ", " + Float.toString(z); } public float distanceFromOrigin() { return (float) Math.sqrt( x*x + y*y +z*z); } //影响map输出的排序,默认是升序,return值加一个负号变降序 public int compareTo(Point3D other) { return Float.compareTo(distanceFromOrigin(),other.distanceFromOrigin()); } public boolean equals(Object o) { if( !(o instanceof Point3D)) { return false; } Point3D other = (Point3D) o; return this.x == o.x && this.y == o.y && this.z == o.z; } /* 实现 hashCode() 方法很重要 * Hadoop的Partitioners会用到这个方法,后面再说 */ public int hashCode() { return Float.floatToIntBits(x) ^ Float.floatToIntBits(y) ^ Float.floatToIntBits(z); } }
自定义Hadoop数据类型后,需要明确告诉Hadoop来使用它们。这是 JobConf 所能担当的了。
void setOutputKeyClass(Class<T> theClass) void setOutputValueClass(Class<T> theClass)
通常(默认条件下),这个函数对Map和Reduce阶段的输出都起到作用,当然也有专门的 setMapOutputKeyClass() / setReduceOutputKeyClass() 接口。
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句