前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Linux下搭建Hadoop详细步骤

Linux下搭建Hadoop详细步骤

作者头像
用户5166556
发布2019-04-16 11:59:38
1.6K0
发布2019-04-16 11:59:38
举报

装好虚拟机+Linux,并且主机网络和虚拟机网络互通。以及Linux上装好JDK

1:在Linux下输入命令vi /etc/profile 添加HADOOP_HOME

代码语言:javascript
复制
export  JAVA_HOME=/home/hadoop/export/jdk
export  HADOOP_HOME=/home/hadoop/export/hadoop
export  PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

2:修改hadoop/conf目录下面hadoop-env.sh第九行

代码语言:javascript
复制
export JAVA_HOME=/home/hadoop/export/jdk

3:修改hadoop/conf目录下面core-site.xml

代码语言:javascript
复制
<configuration>
         <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/.../tmp</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://127.0.0.1:9000</value>
        </property>
</configuration>

4:修改hadoop/conf目录下面hdfs-site.xml

代码语言:javascript
复制
<configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
</configuration>
代码语言:javascript
复制
5:修改hadoop/conf目录下面mapred-site.xml
代码语言:javascript
复制
<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>127.0.0.1:9001</value>
        </property>
</configuration>

修改完成。 转到hadoop/bin下面输入hadoop namenode -format 出现如下:(说明成功)

代码语言:javascript
复制
Warning: $HADOOP_HOME is deprecated.

14/07/15 16:06:27 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.7.0_55
************************************************************/

14/07/15 16:07:09 INFO util.GSet: Computing capacity for map BlocksMap
14/07/15 16:07:09 INFO util.GSet: VM type       = 32-bit
14/07/15 16:07:09 INFO util.GSet: 2.0% max memory = 1013645312
14/07/15 16:07:09 INFO util.GSet: capacity      = 2^22 = 4194304 entries
14/07/15 16:07:09 INFO util.GSet: recommended=4194304, actual=4194304
14/07/15 16:07:10 INFO namenode.FSNamesystem: fsOwner=hadoop
14/07/15 16:07:10 INFO namenode.FSNamesystem: supergroup=supergroup
14/07/15 16:07:10 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/07/15 16:07:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/07/15 16:07:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/07/15 16:07:10 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/07/15 16:07:10 INFO namenode.NameNode: Caching file names occuring more than 10 times 
14/07/15 16:07:10 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 118 bytes saved in 0 seconds.
14/07/15 16:07:10 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits
14/07/15 16:07:10 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
14/07/15 16:07:10 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/

在这一部分中有一部分人会出现失败的情况,但是你一定要去查hadoop下面logs里面的输出异常很详细。 第一次失败一定要记住删掉tmp下面的输出。因为有可能会出现不兼容的情况。 然后输入start-all.sh

代码语言:javascript
复制
Warning: $HADOOP_HOME is deprecated.


starting namenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-namenode-ubuntu.out
localhost: starting datanode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-ubuntu.out
starting jobtracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-ubuntu.out

在上面的过程中可能会提示你输入密码,这时你可以设置个ssh免密码登陆,我博客里面有。 输入jps 出现如下:(少一个datanode,这里我故意设置一个错误) 10666 NameNode 11547 Jps 11445 TaskTracker 11130 SecondaryNameNode 11218 JobTracker

查看logs

代码语言:javascript
复制
2014-07-15 16:13:43,032 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-07-15 16:13:43,094 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2014-07-15 16:13:43,098 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-07-15 16:13:43,118 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2014-07-15 16:13:43,999 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-07-15 16:13:44,044 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-07-15 16:13:45,484 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = 224603228; datanode namespaceID = 566757162
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
	at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

这时你只要删除tmp下的文件,问题解决。

然后你可以执行一个实例:具体操作如下

代码语言:javascript
复制
hadoop@ubuntu:~/export/hadoop$ ls
bin          hadoop-ant-1.2.1.jar          ivy          README.txt
build.xml    hadoop-client-1.2.1.jar       ivy.xml      sbin
c++          hadoop-core-1.2.1.jar         lib          share
CHANGES.txt  hadoop-examples-1.2.1.jar     libexec      src
conf         hadoop-minicluster-1.2.1.jar  LICENSE.txt  webapps
contrib      hadoop-test-1.2.1.jar         logs
docs         hadoop-tools-1.2.1.jar        NOTICE.txt

 进行上传hdfs文件操作

代码语言:javascript
复制
 hadoop@ubuntu:~/export/hadoop$ hadoop fs -put README.txt  /
Warning: $HADOOP_HOME is deprecated.

如上说明上传成功。 执行一段wordcount程序(进行对README.txt文件处理)

代码语言:javascript
复制
hadoop@ubuntu:~/export/hadoop$ hadoop jar hadoop-examples-1.2.1.jar word
count /README.txt /wordcountoutput
Warning: $HADOOP_HOME is deprecated.

14/07/15 15:23:01 INFO input.FileInputFormat: Total input paths to process : 1
14/07/15 15:23:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/07/15 15:23:01 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/15 15:23:02 INFO mapred.JobClient: Running job: job_201407141636_0001
14/07/15 15:23:03 INFO mapred.JobClient:  map 0% reduce 0%
14/07/15 15:23:15 INFO mapred.JobClient:  map 100% reduce 0%
14/07/15 15:23:30 INFO mapred.JobClient:  map 100% reduce 100%
14/07/15 15:23:32 INFO mapred.JobClient: Job complete: job_201407141636_0001
14/07/15 15:23:32 INFO mapred.JobClient: Counters: 29
14/07/15 15:23:32 INFO mapred.JobClient:   Job Counters 
14/07/15 15:23:32 INFO mapred.JobClient:     Launched reduce tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12563
14/07/15 15:23:32 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/15 15:23:32 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/07/15 15:23:32 INFO mapred.JobClient:     Launched map tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:     Data-local map tasks=1
14/07/15 15:23:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14550
14/07/15 15:23:32 INFO mapred.JobClient:   File Output Format Counters 
14/07/15 15:23:32 INFO mapred.JobClient:     Bytes Written=1306
14/07/15 15:23:32 INFO mapred.JobClient:   FileSystemCounters
14/07/15 15:23:32 INFO mapred.JobClient:     FILE_BYTES_READ=1836
14/07/15 15:23:32 INFO mapred.JobClient:     HDFS_BYTES_READ=1463
14/07/15 15:23:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=120839
14/07/15 15:23:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1306
14/07/15 15:23:32 INFO mapred.JobClient:   File Input Format Counters 
14/07/15 15:23:32 INFO mapred.JobClient:     Bytes Read=1366
14/07/15 15:23:32 INFO mapred.JobClient:   Map-Reduce Framework
14/07/15 15:23:32 INFO mapred.JobClient:     Map output materialized bytes=1836
14/07/15 15:23:32 INFO mapred.JobClient:     Map input records=31
14/07/15 15:23:32 INFO mapred.JobClient:     Reduce shuffle bytes=1836
14/07/15 15:23:32 INFO mapred.JobClient:     Spilled Records=262
14/07/15 15:23:32 INFO mapred.JobClient:     Map output bytes=2055
14/07/15 15:23:32 INFO mapred.JobClient:     Total committed heap usage (bytes)=212611072
14/07/15 15:23:32 INFO mapred.JobClient:     CPU time spent (ms)=2430
14/07/15 15:23:32 INFO mapred.JobClient:     Combine input records=179
14/07/15 15:23:32 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97
14/07/15 15:23:32 INFO mapred.JobClient:     Reduce input records=131
14/07/15 15:23:32 INFO mapred.JobClient:     Reduce input groups=131
14/07/15 15:23:32 INFO mapred.JobClient:     Combine output records=131
14/07/15 15:23:32 INFO mapred.JobClient:     Physical memory (bytes) snapshot=177545216
14/07/15 15:23:32 INFO mapred.JobClient:     Reduce output records=131
14/07/15 15:23:32 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=695681024
14/07/15 15:23:32 INFO mapred.JobClient:     Map output records=179
代码语言:javascript
复制
hadoop@ubuntu:~/export/hadoop$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 hadoop supergroup       1366 2014-07-15 15:21 /README.txt
drwxr-xr-x   - hadoop supergroup          0 2014-07-14 16:36 /home
drwxr-xr-x   - hadoop supergroup          0 2014-07-15 15:23 /wordcountoutput
hadoop@ubuntu:~/export/hadoop$ hadoop fs -get  /wordcountoutput  /home/hadoop/
Warning: $HADOOP_HOME is deprecated.
代码语言:javascript
复制
你可以下载下来看看这个文件
如下:
代码语言:javascript
复制
(see	1
5D002.C.1,	1
740.13)	1
<http://www.wassenaar.org/>	1
Administration	1
Apache	1
BEFORE	1
BIS	1
Bureau	1
Commerce,	1
Commodity	1
Control	1
Core	1
Department	1
ENC	1
Exception	1
Export	2
For	1
Foundation	1
Government	1
Hadoop	1
Hadoop,	1
Industry	1
Jetty	1
License	1
Number	1
Regulations,	1
SSL	1
Section	1
Security	1
See	1
Software	2
Technology	1
The	4
This	1
U.S.	1
Unrestricted	1
about	1
algorithms.	1
and	6
and/or	1
another	1
any	1
as	1
asymmetric	1
at:	2
both	1
by	1
check	1
classified	1
code	1
code.	1
concerning	1
country	1
country's	1
country,	1
cryptographic	3
currently	1
details	1
distribution	2
eligible	1
encryption	3
exception	1
export	1
following	1
for	3
form	1
from	1
functions	1
has	1
have	1
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2014年07月15日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档