blocks|key|1794788|text|每个reducer都使用一个OutputFormat来写入记录。这就是为什么你会在每个reducer中得到一组奇偶文件。这是为了让每个reducer可以并行执行写操作而设计的。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1794789|如果您只需要一个奇数和单个偶数文件，则需要将mapred.reduce.tasks设置为1。但是性能将受到影响，因为所有映射器都将提供给单个reducer。|1794790|另一种选择是更改读取这些文件的过程以接受多个输入文件，或者编写一个将这些文件合并在一起的单独过程。|1794791|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|H|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|I|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|J|8|@]|9|@]|A|$]]|$1|F|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|G|$]]

Each reducer uses an OutputFormat to write records to. So that's why you are getting a set of odd and even files per reducer. This is by design so that each reducer can perform writes in parallel.

If you want just a single odd and single even file, you'll need to set mapred.reduce.tasks to 1. But performance will suffer, because all the mappers will be feeding into a single reducer.

Another option is to change the process the reads these files to accept multiple input files, or write a separate process that merges these files together.

blocks|key|2008581|text|将根据reducers的数量生成多个输出文件。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2008582|您可以使用hadoop+dfs+-getmerge合并输出|2008583|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|F|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|G|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|H|8|@]|9|@]|A|$]]]|E|$]]

Multiple Output files will be generated based on number of reducers.

You can use hadoop dfs -getmerge to merged outputs

blocks|key|2008658|text|我为此编写了一个类。把它用在你的工作中：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2008659|job.setOutputFormatClass(m_customOutputFormatClass);|code-block|syntax|javascript|2008660|这是我的班级：|2008661|import+java.io.IOException;
import+java.util.HashMap;
import+java.util.Map;
import+java.util.Map.Entry;

import+org.apache.hadoop.fs.Path;
import+org.apache.hadoop.mapreduce.RecordWriter;
import+org.apache.hadoop.mapreduce.TaskAttemptContext;
import+org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
+*+TextOutputFormat+extension+which+enables+writing+the+mapper/reducer's+output+in+multiple+files. 
+*+
+*+WARNING:+The+number+of+different+folder+shuoldn't+be+large+for+one+mapper+since+we+keep+an
+*+{@link+RecordWriter}+instance+per+folder+name.
+*+
+*+
+*+In+this+class+the+folder+name+is+defined+by+the+written+entry's+key. 
+*+To+change+this+behavior+simply+extend+this+class+and+override+the
+*+{@link+HdMultipleFileOutputFormat#getFolderNameExtractor()}+method+and+create+your+own
+*+{@link+FolderNameExtractor}+implementation.
+*+
+*+
+*+
+*+@author+ykesten
+*+
+*+@param+<K>+-+Keys+type
+*+@param+<V>+-+Values+type
+*/
public+class+HdMultipleFileOutputFormat<K,+V>+extends+TextOutputFormat<K,+V>+{

++++private+String+folderName;

++++private+class+MultipleFilesRecordWriter+extends+RecordWriter<K,+V>+{

++++++++private+Map<String,+RecordWriter<K,+V>>+fileNameToWriter;
++++++++private+FolderNameExtractor<K,+V>+fileNameExtractor;
++++++++private+TaskAttemptContext+job;

++++++++public+MultipleFilesRecordWriter(FolderNameExtractor<K,+V>+fileNameExtractor,+TaskAttemptContext+job)+{
++++++++++++fileNameToWriter+=+new+HashMap<String,+RecordWriter<K,+V>>();
++++++++++++this.fileNameExtractor+=+fileNameExtractor;
++++++++++++this.job+=+job;
++++++++}

++++++++@Override
++++++++public+void+write(K+key,+V+value)+throws+IOException,+InterruptedException+{
++++++++++++String+fileName+=+fileNameExtractor.extractFolderName(key,+value);
++++++++++++RecordWriter<K,+V>+writer+=+fileNameToWriter.get(fileName);
++++++++++++if+(writer+==+null)+{
++++++++++++++++writer+=+createNewWriter(fileName,+fileNameToWriter,+job);
++++++++++++++++if+(writer+==+null)+{
++++++++++++++++++++throw+new+IOException("Unable+to+create+writer+for+path:+"+%2B+fileName);
++++++++++++++++}
++++++++++++}
++++++++++++writer.write(key,+value);
++++++++}

++++++++@Override
++++++++public+void+close(TaskAttemptContext+context)+throws+IOException,+InterruptedException+{
++++++++++++for+(Entry<String,+RecordWriter<K,+V>>+entry+:+fileNameToWriter.entrySet())+{
++++++++++++++++entry.getValue().close(context);
++++++++++++}
++++++++}

++++}

++++private+synchronized+RecordWriter<K,+V>+createNewWriter(String+folderName,
++++++++++++Map<String,+RecordWriter<K,+V>>+fileNameToWriter,+TaskAttemptContext+job)+{
++++++++try+{
++++++++++++this.folderName+=+folderName;
++++++++++++RecordWriter<K,+V>+writer+=+super.getRecordWriter(job);
++++++++++++this.folderName+=+null;
++++++++++++fileNameToWriter.put(folderName,+writer);
++++++++++++return+writer;
++++++++}+catch+(Exception+e)+{
++++++++++++e.printStackTrace();
++++++++++++return+null;
++++++++}
++++}

++++@Override
++++public+Path+getDefaultWorkFile(TaskAttemptContext+context,+String+extension)+throws+IOException+{
++++++++Path+path+=+super.getDefaultWorkFile(context,+extension);
++++++++if+(folderName+!=+null)+{
++++++++++++String+newPath+=+path.getParent().toString()+%2B+"/"+%2B+folderName+%2B+"/"+%2B+path.getName();
++++++++++++path+=+new+Path(newPath);
++++++++}
++++++++return+path;
++++}

++++@Override
++++public+RecordWriter<K,+V>+getRecordWriter(TaskAttemptContext+job)+throws+IOException,+InterruptedException+{
++++++++return+new+MultipleFilesRecordWriter(getFolderNameExtractor(),+job);
++++}

++++public+FolderNameExtractor<K,+V>+getFolderNameExtractor()+{
++++++++return+new+KeyFolderNameExtractor<K,+V>();
++++}

++++public+interface+FolderNameExtractor<K,+V>+{
++++++++public+String+extractFolderName(K+key,+V+value);
++++}

++++private+static+class+KeyFolderNameExtractor<K,+V>+implements+FolderNameExtractor<K,+V>+{
++++++++public+String+extractFolderName(K+key,+V+value)+{
++++++++++++return+key.toString();
++++++++}
++++}

}|2008662|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

I wrote a class for doing this.
Just use it your job:

<pre><code>job.setOutputFormatClass(m_customOutputFormatClass);
</code></pre>

This is the my class:

<pre><code>import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
 * TextOutputFormat extension which enables writing the mapper/reducer's output in multiple files.&lt;br&gt;
 * &lt;p&gt;
 * &lt;b&gt;WARNING&lt;/b&gt;: The number of different folder shuoldn't be large for one mapper since we keep an
 * {@link RecordWriter} instance per folder name.
 * &lt;/p&gt;
 * &lt;p&gt;
 * In this class the folder name is defined by the written entry's key.&lt;br&gt;
 * To change this behavior simply extend this class and override the
 * {@link HdMultipleFileOutputFormat#getFolderNameExtractor()} method and create your own
 * {@link FolderNameExtractor} implementation.
 * &lt;/p&gt;
 * 
 * 
 * @author ykesten
 * 
 * @param &lt;K&gt; - Keys type
 * @param &lt;V&gt; - Values type
 */
public class HdMultipleFileOutputFormat&lt;K, V&gt; extends TextOutputFormat&lt;K, V&gt; {

 private String folderName;

 private class MultipleFilesRecordWriter extends RecordWriter&lt;K, V&gt; {

 private Map&lt;String, RecordWriter&lt;K, V&gt;&gt; fileNameToWriter;
 private FolderNameExtractor&lt;K, V&gt; fileNameExtractor;
 private TaskAttemptContext job;

 public MultipleFilesRecordWriter(FolderNameExtractor&lt;K, V&gt; fileNameExtractor, TaskAttemptContext job) {
 fileNameToWriter = new HashMap&lt;String, RecordWriter&lt;K, V&gt;&gt;();
 this.fileNameExtractor = fileNameExtractor;
 this.job = job;
 }

 @Override
 public void write(K key, V value) throws IOException, InterruptedException {
 String fileName = fileNameExtractor.extractFolderName(key, value);
 RecordWriter&lt;K, V&gt; writer = fileNameToWriter.get(fileName);
 if (writer == null) {
 writer = createNewWriter(fileName, fileNameToWriter, job);
 if (writer == null) {
 throw new IOException("Unable to create writer for path: " + fileName);
 }
 }
 writer.write(key, value);
 }

 @Override
 public void close(TaskAttemptContext context) throws IOException, InterruptedException {
 for (Entry&lt;String, RecordWriter&lt;K, V&gt;&gt; entry : fileNameToWriter.entrySet()) {
 entry.getValue().close(context);
 }
 }

 }

 private synchronized RecordWriter&lt;K, V&gt; createNewWriter(String folderName,
 Map&lt;String, RecordWriter&lt;K, V&gt;&gt; fileNameToWriter, TaskAttemptContext job) {
 try {
 this.folderName = folderName;
 RecordWriter&lt;K, V&gt; writer = super.getRecordWriter(job);
 this.folderName = null;
 fileNameToWriter.put(folderName, writer);
 return writer;
 } catch (Exception e) {
 e.printStackTrace();
 return null;
 }
 }

 @Override
 public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
 Path path = super.getDefaultWorkFile(context, extension);
 if (folderName != null) {
 String newPath = path.getParent().toString() + "/" + folderName + "/" + path.getName();
 path = new Path(newPath);
 }
 return path;
 }

 @Override
 public RecordWriter&lt;K, V&gt; getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
 return new MultipleFilesRecordWriter(getFolderNameExtractor(), job);
 }

 public FolderNameExtractor&lt;K, V&gt; getFolderNameExtractor() {
 return new KeyFolderNameExtractor&lt;K, V&gt;();
 }

 public interface FolderNameExtractor&lt;K, V&gt; {
 public String extractFolderName(K key, V value);
 }

 private static class KeyFolderNameExtractor&lt;K, V&gt; implements FolderNameExtractor&lt;K, V&gt; {
 public String extractFolderName(K key, V value) {
 return key.toString();
 }
 }

}
</code></pre>

I'm a newbie in Hadoop. I'm trying out the Wordcount program. 

Now to try out multiple output files, i use <code>MultipleOutputFormat</code>. this link helped me in doing it. <a href="http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html" rel="noreferrer">http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html</a>

in my driver class i had 

<pre><code> MultipleOutputs.addNamedOutput(conf, "even",
 org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
 IntWritable.class);

 MultipleOutputs.addNamedOutput(conf, "odd",
 org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
 IntWritable.class);`
</code></pre>

and my reduce class became this 

<pre><code>public static class Reduce extends MapReduceBase implements
 Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
 MultipleOutputs mos = null;

 public void configure(JobConf job) {
 mos = new MultipleOutputs(job);
 }

 public void reduce(Text key, Iterator&lt;IntWritable&gt; values,
 OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter)
 throws IOException {
 int sum = 0;
 while (values.hasNext()) {
 sum += values.next().get();
 }
 if (sum % 2 == 0) {
 mos.getCollector("even", reporter).collect(key, new IntWritable(sum));
 }else {
 mos.getCollector("odd", reporter).collect(key, new IntWritable(sum));
 }
 //output.collect(key, new IntWritable(sum));
 }
 @Override
 public void close() throws IOException {
 // TODO Auto-generated method stub
 mos.close();
 }
}
</code></pre>

Things worked , but i get LOT of files, (one odd and one even for every map-reduce) 

Question is : How can i have just 2 output files (odd &amp; even) so that every odd output of every map-reduce gets written into that odd file, and same for even.

MultipleOutputFormat in hadoop

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我是Hadoop的新手。我正在试用字数统计程序。现在，为了尝试多个输出文件，我使用了MultipleOutputFormat。这个链接帮助我做到了这一点。在我的司机课上我有    MultipleOutputs.addNamedOutput(conf, "even",            org.apache.had...

问hadoop中的MultipleOutputFormat
EN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问hadoop中的MultipleOutputFormatEN

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问hadoop中的MultipleOutputFormat
EN