首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >os.environ['mapreduce_map_input_file']不工作

os.environ['mapreduce_map_input_file']不工作
EN

Stack Overflow用户
提问于 2015-03-06 23:03:51
回答 1查看 4.1K关注 0票数 5

我在Python中创建了一个简单的map reduce,只为了测试os.environ['mapreduce_map_input_file']调用,如下所示:

map.py

代码语言:javascript
运行
复制
#!/usr/bin/python
import sys

# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:

    l = line.strip().split()

    for word in l:

        # output goes to STDOUT (stream data that the program writes)
        print "%s\t%d" %( word, 1 )

reduce.py

代码语言:javascript
运行
复制
#!/usr/bin/python
import sys
import os

current_word = None
current_sum = 0

# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:

    word, count = line.strip().split("\t", 1)

    try:
        count = int(count)
        except ValueError:
        continue

    if word == current_word:
        current_sum += count
    else:
            if current_word:
            # output goes to STDOUT (stream data that the program writes)
            print "%s\t%d" %( current_word, current_sum )
            print (os.environ['mapreduce_map_input_file'])
        current_word = word
        current_sum = count

错误消息:

代码语言:javascript
运行
复制
Traceback (most recent call last):
  File "/Users/brunomacedo/Desktop/Big-Data/./reduce.py", line 25, in <module>
    print (os.environ['mapreduce_map_input_file'])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'mapreduce_map_input_file'
15/03/06 17:50:26 INFO streaming.PipeMapRed: Records R/W=16127/1
15/03/06 17:50:26 INFO streaming.PipeMapRed: MRErrorThread done
15/03/06 17:50:26 WARN streaming.PipeMapRed: java.io.IOException: Stream closed
15/03/06 17:50:26 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:128)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/03/06 17:50:26 WARN streaming.PipeMapRed: java.io.IOException: Stream closed
15/03/06 17:50:26 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/03/06 17:50:26 INFO mapred.LocalJobRunner: reduce task executor complete.
15/03/06 17:50:26 WARN mapred.LocalJobRunner: job_local1265836882_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
15/03/06 17:50:27 INFO mapreduce.Job: Job job_local1265836882_0001 failed with state FAILED due to: NA
15/03/06 17:50:27 INFO mapreduce.Job: Counters: 35
    File System Counters
        FILE: Number of bytes read=181735210
        FILE: Number of bytes written=292351104
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=0
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=0
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Map-Reduce Framework
        Map input records=100
        Map output records=334328
        Map output bytes=2758691
        Map output materialized bytes=3427947
        Input split bytes=14100
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=3427947
        Reduce input records=0
        Reduce output records=0
        Spilled Records=334328
        Shuffled Maps =100
        Failed Shuffles=0
        Merged Map outputs=100
        GC time elapsed (ms)=1224
        Total committed heap usage (bytes)=49956257792
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=2090035
    File Output Format Counters
        Bytes Written=0
15/03/06 17:50:27 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

命令我用来运行它:

代码语言:javascript
运行
复制
hadoop jar /usr/local/Cellar/hadoop/2.6.0/libexec/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
-D mapreduce.job.reduces=1 \
-file map.py \
-mapper map.py \
-file reduce.py \
-reducer reduce.py \
-input file:///Users/brunomacedo/Desktop/Big-Data/articles \
-output file:///Users/brunomacedo/Desktop/Big-Data/output

如果我把print (os.environ['mapreduce_map_input_file'])线取出来,它就会工作得很好。这一行的目的是打印单词计数的输入文件的名称。无论如何,我这样做只是为了测试这个命令,因为我需要在一个更复杂的项目中使用它。

有人能帮我一下这个电话有什么问题吗?非常感谢!

编辑:

我正在使用Hadoop2.6.0

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-03-06 23:55:16

当我在os.environ['mapreduce_map_input_file']文件中调用reduceos.environ['map_input_file']时,它没有工作。但是它可以在map文件中工作。这是有意义的,因为reducer无法知道您的mapper输出来自哪个输入文件(除非您直接从mapper发送信息)。

我无法直接运行os.environ['mapreduce_map_input_file']os.environ['map_input_file']。但是,在地图中使用这一点似乎是一贯有效的:

代码语言:javascript
运行
复制
try:
    input_file = os.environ['mapreduce_map_input_file']
except KeyError:
    input_file = os.environ['map_input_file']
票数 7
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28909145

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档