文章/答案/技术大牛

发布

大数据Hadoop Streaming编程实战之C、Php、Python

文章来源：企鹅号 - 扣丁学堂

Streaming框架允许任何程序语言实现的程序在HadoopMapReduce中使用，方便已有程序向Hadoop平台移植。因此可以说对于hadoop的扩展性意义重大。接下来我们分别使用C++、Php、Python语言实现HadoopWordCount。

实战一：C++语言实现Wordcount

代码实现：

1）C++语言实现WordCount中的Mapper，文件命名为mapper.cpp，以下是详细代码

#include

usingnamespacestd;

intmain(){

stringkey;

stringvalue="1";

while(cin>>key){

return0;

}

2）C++语言实现WordCount中的Reducer，文件命名为reducer.cpp，以下是详细代码

#include

usingnamespacestd;

intmain(){

stringkey;

stringvalue;

mapword2count;

map::iteratorit;

while(cin>>key){

cin>>value;

it=word2count.find(key);

if(it!=word2count.end()){

(it->second)++;

}

else{

word2count.insert(make_pair(key,1));

}

for(it=word2count.begin();it!=word2count.end();++it){

cout}

return0;

}

测试运行C++实现Wordcount的具体步骤

1）在线安装C++

在Linux环境下，如果没有安装C++，需要我们在线安装C++

yum-yinstallgcc-c++

2）对c++文件编译，生成可执行文件

我们通过以下命令将C++程序编译成可执行文件，然后才能够运行

g++-omappermapper.cpp

g++-oreducerreducer.cpp

3）本地测试

集群运行C++版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

catdjt.txt|./mapper|sort|./reducer

4）集群运行

切换到hadoop安装目录下，提交C++版本的WordCount作业，进行单词统计。

hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-mapper"./mapper"

-reducer"./reducer"

-filemapper

-filereducer

-input/dajiangtai/djt.txt

-output/dajiangtai/out

如果最终出现想要的结果，说明C++语言成功实现Wordcount

实战二：Php语言实现Wordcount

代码实现：

1）Php语言实现WordCount中的Mapper，文件命名为wc_mapper.php，以下是详细代码

#!/usr/bin/php

error_reporting(E_ALL^E_NOTICE);

$word2count=array();

while(($line=fgets(STDIN))!==false){

$line=trim($line);

$words=preg_split('/\W/',$line,0,PREG_SPLIT_NO_EMPTY);

foreach($wordsas$word){

echo$word,chr(9),"1",PHP_EOL;

}

2）Php语言实现WordCount中的Reducer，文件命名为wc_reducer.php，以下是详细代码

#!/usr/bin/php

error_reporting(E_ALL^E_NOTICE);

$word2count=array();

while(($line=fgets(STDIN))!==false){

$line=trim($line);

list($word,$count)=explode(chr(9),$line);

$count=intval($count);

$word2count[$word]+=$count;

}

foreach($word2countas$word=>$count){

echo$word,chr(9),$count,PHP_EOL;

}

测试运行Php实现Wordcount的具体步骤

1）在线安装Php

在Linux环境下，如果没有安装Php，需要我们在线安装Php环境

yum-yinstallphp

2）本地测试

集群运行Php版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

catdjt.txt|phpwc_mapper.php|sort|phpwc_reducer.php

3）集群运行

切换到hadoop安装目录下，提交Php版本的WordCount作业，进行单词统计。

hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-mapper"phpwc_mapper.php"

-reducer"phpwc_reducer.php"

-filewc_mapper.php

-filewc_reducer.php

-input/dajiangtai/djt.txt

-output/dajiangtai/out

如果最终出现想要的结果，说明Php语言成功实现Wordcount

实战三：Python语言实现Wordcount

代码实现：

1）Python语言实现WordCount中的Mapper，文件命名为Mapper.py，以下是详细代码

#!/usr/java/hadoop/envpython

importsys

word2count={}

forlineinsys.stdin:

line=line.strip()

words=filter(lambdaword:word,line.split())

forwordinwords:

print'%s\t%s'%(word,1)

2）Python语言实现WordCount中的Reducer，文件命名为Reducer.py，以下是详细代码

#!/usr/java/hadoop/envpython

fromoperatorimportitemgetter

importsys

word2count={}

forlineinsys.stdin:

line=line.strip()

word,count=line.split()

try:

count=int(count)

word2count[word]=word2count.get(word,0)+count

exceptValueError:

pass

sorted_word2count=sorted(word2count.items(),key=itemgetter(0))

forword,countinsorted_word2count:

print'%s\t%s'%(word,count)

测试运行Python实现Wordcount的具体步骤

1）在线安装Python

在Linux环境下，如果没有安装Python，需要我们在线安装Python环境

yum-yinstallpython27

2）本地测试

集群运行Python版本的WordCount之前，首先要在Linux本地测试运行，调试成功，确保程序在集群中正常运行，测试运行命令如下：

catdjt.txt|pythonMapper.py|sort|pythonReducer.py

3）集群运行

切换到hadoop安装目录下，提交Python版本的WordCount作业，进行单词统计。

hadoopjar/usr/java/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

-mapper"pythonMapper.py"

-reducer"pythonReducer.py"

-fileMapper.py

-fileReducer.py

-input/dajiangtai/djt.txt

-output/dajiangtai/out

如果最终出现想要的结果，说明Python语言成功实现Wordcount

发表于: 2018-03-052018-03-05 15:45:59
原文链接：http://kuaibao.qq.com/s/20180305A0SYK300?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

大数据Hadoop Streaming编程实战之C、Php、Python

相关快讯

扫码

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐