专栏首页大数据应用腾讯云设备搭建简单的HADOOP
原创

腾讯云设备搭建简单的HADOOP

环境说明

本环境有三台机器。其中master(10.0.0.2),slave1(10.0.0.3),slave2(10.0.0.4)。

一、初始化hadoop环境

1. 创建hadoop帐号

useradd -d /data/hadoop -u 600 -g root hadoop

#修改hadoop的密码

passwd hadoop

2.修改主机名称

  • 将主机名称改成master,从机依次改成slave1,slave2.

vi /etc/hostname

maste

**注意:如果是slave1,则此处填写slave1,如果是slave2,则填写slave2**

  • 修改hosts的配置,其他机器复制此配置

vi /etc/hosts

10.0.0.2 maste

10.0.0.3 slave1

10.0.0.4 slave2

127.0.0.1  localhost  localhost.localdomain  localhost
  • 修改network的配置

vi /etc/sysconfig/network

# Created by anaconda

NETWORKING=yes

NETWORKING\_IPV6=no

HOSTNAME=maste
  • 重启master,slave1,slave2,使配置生效

3. 设置面密码登录

  • 生成密钥对

执行ssh-keygen -t rsa命令。一直按enter建进入即可

  • 将公钥复制到slave1,slave2

执行ssh-copy-id -i ~/.ssh/id_rsa.pub slave1

二、安装java

1.到oracle官网下载java

2.将java文件解压到安装目录

tar -xzvf jdk-8u91-linux-x64.tar.gz

3.设置java的环境变量

vi ~/.bash\_profile

.bash_profile 文件内容如下所示

# .bash\_profile



# Get the aliases and functions

if [ -f ~/.bashrc ]; then

        . ~/.bashrc

fi



# User specific environment and startup programs



PATH=$PATH:$HOME/.local/bin:$HOME/bin:/data/hadoop/hadoop-2.6.4/share/



export PATH



JAVA\_HOME=/data/hadoop/jdk1.8.0\_91

CLASSPATH=.:$JAVA\_HOME/lib

PATH=$JAVA\_HOME/bin:$PATH



export JAVA\_HOME CLASSPATH PATH

**注意:**

  1. 在CLASSPATH 前面必须有.:这个目录,否则会出现【找不到或无法加载主类】的报错

三、安装hadoop

1. 下载hadoop。我们选择hadoop-2.6.4.tar.gz文件,此文件是编译后版本,直接解压后即可。

2. 将文件解压到安装到目录

tar -xzvf hadoop-2.6.4.tar.gz

3.设置环境变量

vi ~/.bashrc

# .bashrc



# Source global definitions

if [ -f /etc/bashrc ]; then

        . /etc/bashrc

fi



# Uncomment the following line if you don't like systemctl's auto-paging feature:

# export SYSTEMD\_PAGER=



# User specific aliases and functions



export HADOOP\_PREFIX=$HOME/hadoop-2.6.4

export HADOOP\_COMMON\_HOME=$HADOOP\_PREFIX

export HADOOP\_HDFS\_HOME=$HADOOP\_PREFIX

export HADOOP\_MAPRED\_HOME=$HADOOP\_PREFIX

export HADOOP\_YARH\_HOME=$HADOOP\_PREFIX

export HADOOP\_CONF\_DIR=$HADOOP\_PREFIX/etc/hadoop



export PATH=$PATH:$HADOOP\_PREFIX/bin:$HADOOP\_PREFIX/sbin

source ~/.bashrc 使配置文件生效

4.修改hadoop配置文件

  • 进入配置文件目录

cd /data/hadoop/hadoop-2.6.4/etc/hadoop

  • 修改hadoop-env.sh ,在JAVA_HOME目录下面添加java路径
# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.



# Set Hadoop-specific environment variables here.



# The only required environment variable is JAVA\_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA\_HOME in this file, so that it is correctly defined on

# remote nodes.



# The java implementation to use.

export JAVA\_HOME=/data/hadoop/jdk1.8.0\_91
  • 将slave注册。
#localhost

slave1

slave2
  • 修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at



    http://www.apache.org/licenses/LICENSE-2.0



  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->



<!-- Put site-specific property overrides in this file. -->



<configuration>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://master:9000</value>

  </property>

  <property>

    <name>hadoop.tmp.dir</name>

    <value>/data/hadoop/tmp/hadoop-master</value>

    <description>Abase for other temporary directories.</description>

  </property>

</configuration>
  • 修改hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at



    http://www.apache.org/licenses/LICENSE-2.0



  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->



<!-- Put site-specific property overrides in this file. -->



<configuration>



   <property>

       <name>dfs.namenode.secondary.http-address</name>

       <value>Master:50090</value>

   </property>



  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:///data/hadoop/tmp/hdfs/datanode</value>

  </property>



  <property>

    <name>dfs.datanode.name.dir</name>

    <value>file:///data/hadoop/tmp/hdfs/namenode</value>

  </property>



  <property>

    <name>dfs.namenode.checkpoint.dir</name>

    <value>file:///data/hadoop/tmp/hdfs/namesecondary</value>

  </property>



  <property>

    <name>dfs.replication</name>

    <value>2</value>

  </property>



</configuration>

**注意:dfs.replication说明的是节点的数量。本案例中有两个slave,因此填写数值为2**

  • 修改mapred-site.xml
<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at



    http://www.apache.org/licenses/LICENSE-2.0



  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->



<!-- Put site-specific property overrides in this file. -->



<configuration>



  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>



  <property>

    <name>mapreduece.jobtracker.staging.root.dir</name>

    <value>/user</value>

  </property>



  <property>

      <name>mapreduce.jobhistory.address</name>

      <value>Master:10020</value>

  </property>

  <property>

      <name>mapreduce.jobhistory.webapp.address</name>

      <value>Master:19888</value>

  </property>



</configuration>
  • 修改yarn-site.xml
<?xml version="1.0"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at



    http://www.apache.org/licenses/LICENSE-2.0



  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->

<configuration>



<!-- Site specific YARN configuration properties -->



  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce\_shuffle</value>

  </property>



  <property>

    <name>yarn.resourcemanager.hostname</name>

    <value>master</value>

  </property>



</configuration>

5.将hadoop用户下面的目录全部打包,复制到slave1,slave2上。

四、hadoop 启动验证

1.启动hadoop

start-all.sh 即可

2.检查是否正常

hdfs dfsadmin -report

[hadoop@master hadoop]$ vi yarn-site.xml 

[hadoop@master hadoop]$ hdfs dfsadmin -report

Configured Capacity: 20867301376 (19.43 GB)

Present Capacity: 16041099264 (14.94 GB)

DFS Remaining: 15645147136 (14.57 GB)

DFS Used: 395952128 (377.61 MB)

DFS Used%: 2.47%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0



-------------------------------------------------

Live datanodes (2):



Name: 10.0.0.3:50010 (slave1)

Hostname: slave1

Decommission Status : Normal

Configured Capacity: 10433650688 (9.72 GB)

DFS Used: 197976064 (188.80 MB)

Non DFS Used: 2413105152 (2.25 GB)

DFS Remaining: 7822569472 (7.29 GB)

DFS Used%: 1.90%

DFS Remaining%: 74.97%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 11:03:48 CST 2016





Name: 10.0.0.4:50010 (slave2)

Hostname: slave2

Decommission Status : Normal

Configured Capacity: 10433650688 (9.72 GB)

DFS Used: 197976064 (188.80 MB)

Non DFS Used: 2413096960 (2.25 GB)

DFS Remaining: 7822577664 (7.29 GB)

DFS Used%: 1.90%

DFS Remaining%: 74.97%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 11:03:49 CST 2016

**说明:因为我们是两个节点,因此只要在这里看到两个节点,就说明正常**

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【腾讯云的1001种玩法】征文活动获奖名单公布

    腾讯云技术社区「腾云阁」举办【腾讯云的1001种玩法】征文活动吸引了大量程序员们入驻社区。活动期间,一共征集到来自 42 位作者的 83 篇征文。经过评委老师从...

    云加社区
  • 在腾讯云上搭建 Hadoop 完全分布式集群

    搭建完全分布式的 Hadoop 集群,需要三台同号同区腾讯云服务器,配置可根据所需求自行加减,三台系统为 CentOS 6.5 64位。

    不知雨
  • 【腾讯云的1001种玩法】征文活动

    腾讯云技术社区「腾云阁」上线以来得到了广大程序员们的支持,为了吸引更多的开发者入驻,现再次举办【腾讯云的1001种玩法】征文活动。只要是与「腾讯云」相关的干货原...

    云加社区
  • EMR(弹性MapReduce)入门之EMR集群的创建和集群的销毁(二)

    确定地域:EMR集群搭建的地理位置,由于集群是通过公网访问,一般建议选择接近企业所在位置,网络传输效率会更快。

    小司机带你入门EMR
  • 腾讯云加速构建云原生数据仓库,助力企业数字化转型

    在企业数字化转型的当下,数据仓库的云端构建成为主流趋势,Gartner 预测,到2023年全球3/4的数据库都会跑在云上。

    腾讯云大数据团队
  • 腾讯云加速构建云原生数据仓库,助力企业数字化转型

    在企业数字化转型的当下,数据仓库的云端构建成为主流趋势,Gartner 预测,到2023年全球3/4的数据库都会跑在云上。 12月20日,腾讯2020 Tec...

    腾讯QQ大数据
  • EMR入门学习之EMR初步介绍(一)

    Elastic MapReduce(EMR)是腾讯云提供的云上 Hadoop 托管服务,提供了便捷的 Hadoop 集群部署、软件安装、配置修改、监控告警、弹性...

    披荆斩棘
  • 在腾讯云CVM上搭建Hadoop集群

    本教程将介绍如何在腾讯云CVM上搭建Hadoop集群。Hadoop中是一个Apache的框架,可以让你通过基本的编程处理跨服务器集群的分布式方式的大型数据集。H...

    林岑影
  • 基于腾讯云CVM搭建Hadoop集群及数据迁移最佳实践

    本文主要介绍如何在腾讯云CVM上搭建Hadoop集群,以及如何通过distcp工具将友商云Hadoop中的数据迁移到腾讯云自建Hadoop集群。

    Vicwan

扫码关注云+社区

领取腾讯云代金券