mrjob可以实现用python开发在Hadoop上实行 mrjob程序可以在本地测试运行也可以部署到Hadoop集群上运行 (1)首先,要在自己的python虚拟环境中安装mrjob库 pip install mrjob 完成后通过pip list查看是否安装成功
(2)写好python文件:
from mrjob.job import MRJob
class MRJobCount(MRJob):
def mapper(self, key, line):
yield "chars_number", len(line)
yield "words_number", len(line.split())
yield "lines_number", 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRJobCount.run()
(3)写好测试文件:
(4)实行命令查看结果
统计成功