前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >大数据学习之路01——让Hadoop在MacOS上跑起来

大数据学习之路01——让Hadoop在MacOS上跑起来

原创
作者头像
汪志宾
修改2019-05-16 17:33:13
2.8K0
修改2019-05-16 17:33:13
举报
文章被收录于专栏:文斌的专栏

2019-05-15

本安装文档是在MacOS中安装单机版Hadoop。

准备工作

安装目录

代码语言:txt
复制
WZB-MacBook:50_bigdata wangzhibin$ pwd
/Users/wangzhibin/00_dev_suite/50_bigdata

JDK

Mac安装JDK的过程略,参考:MAC下安装多版本JDK和切换几种方式

代码语言:txt
复制
WZB-MacBook:50_bigdata wangzhibin$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
WZB-MacBook:50_bigdata wangzhibin$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

下载Hadoop

代码语言:txt
复制
brew install wget
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.8.4/hadoop-2.8.4.tar.gz
WZB-MacBook:50_bigdata wangzhibin$ tar -zxvf hadoop-2.8.4.tar.gz

安装与配置Hadoop

修改JDK配置

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}改为
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

验证Hadoop

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

单机模式执行

代码语言:txt
复制
  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-z.]+'
  $ cat output/*
  1	dfsadmin

配置core-site.xml

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ mkdir -p hdfs/tmp
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/core-site.xml

增加如下配置:

代码语言:txt
复制
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

配置hdfs-site.xml

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hdfs-site.xml

增加如下配置:

代码语言:txt
复制
<configuration>
        <property>
             <name>dfs.replication</name>
             <value>1</value>
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/data</value>
        </property>
</configuration>

启动与停止Hadoop

配置.bash_profile

代码语言:txt
复制
# set hadoop
export HADOOP_HOME=/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

第一次启动hdfs需要格式化

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./bin/hdfs namenode -format
...
19/05/15 22:30:47 INFO common.Storage: Storage directory /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name has been successfully formatted.
...

启动HDFS

代码语言:txt
复制
./sbin/start-dfs.sh

停止HDFS

代码语言:txt
复制
./sbin/stop-dfs.sh

HDFS启动状态查看

  1. HDFS 状态:http://localhost:50070/dfshealth.html#tab-overview
  2. Secordary NameNode 状态:http://localhost:50090/status.html
  3. 本地官方文档:API文档
image.png
image.png

验证HDFS

简单的验证hadoop命令:

代码语言:txt
复制
$ hadoop fs -mkdir /test
WZB-MacBook:hadoop wangzhibin$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - wangzhibin supergroup          0 2019-05-16 11:26 /test
image.png
image.png

启动时遇到的坑

一、sh: connect to host localhost port 22: Connection refused

此时可能会出现如下错误。是因为没有配置ssh免密登录。

代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
19/05/15 22:38:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

采用如下方法解决:

1)解决方法是选择系统偏好设置->选择共享->点击远程登录

2)设置免密登录

代码语言:txt
复制
    $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    $ chmod 0600 ~/.ssh/authorized_keys
    $ ssh localhost
二、Unable to load native-hadoop library for your platform
代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:50:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-namenode-WZB-MacBook.local.out
localhost: starting datanode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-datanode-WZB-MacBook.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-secondarynamenode-WZB-MacBook.local.out
19/05/15 22:50:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

参考:

  1. 官方-Native Libraries Guide
  2. Mac OSX 下 Hadoop 使用本地库提高效率
  3. Hadoop native libraries: Installation on Mac Osx

解决方案:重新编译hadoop,将编译后的hadoop-dist/target/hadoop-2.8.4/lib/native替换$HADOOP_HOME/lib/native

  • 安装基础组件
代码语言:txt
复制
$ brew install gcc autoconf automake libtool cmake snappy gzip bzip2 zlib
  • 安装protobuf。
代码语言:txt
复制
   wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
   tar zxvf protobuf-2.5.0.tar.gz
   cd protobuf-2.5.0
   ./configure
   make
   make install
  • 重新编译hadoop
image.png
image.png
代码语言:txt
复制
wget http://apache.fayea.com/hadoop/common/hadoop-2.8.4/hadoop-2.8.4-src.tar.gz
tar zxvf hadoop-2.8.4-src.tar.gz
cd hadoop-2.8.4-src
mvn package -Pdist,native -DskipTests -Dtar -e
cp -r /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-dist/target/hadoop-2.8.4/lib/native .
三、An Ant BuildException has occured: exec returned
代码语言:txt
复制
WZB-MacBook:hadoop-2.8.4-src wangzhibin$ mvn package -Pdist,native -DskipTests -Dtar -e
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml

参考:mac下编译Hadoop 2.8.1报错An Ant BuildException has occured: exec returned: 1,排错过程

解决方案:配置环境变量OPENSSL_ROOT_DIR、OPENSSL_INCLUDE_DIR。修改~/.bash_profile

代码语言:txt
复制
# openssl
export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2r
export OPENSSL_INCLUDE_DIR=$OPENSSL_ROOT_DIR/include

配置与启动yarn

配置mapred-site.xml

代码语言:txt
复制
cd $HADOOP_HOME/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置yarn-site.xml

代码语言:txt
复制
vim yarn-site.xml
<configuration>
    <!-- reducer取数据的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

yarn启动与停止

代码语言:txt
复制
cd $HADOOP_HOME
./sbin/start-yarn.sh
./sbin/stop-yarn.sh

浏览器查看:http://localhost:8088

image.png
image.png

jps查看进程

代码语言:txt
复制
WZB-MacBook:hadoop wangzhibin$ jps
534 NutstoreGUI
49135 DataNode
49834 ResourceManager
49234 SecondaryNameNode
49973 Jps
67596
49912 NodeManager
49057 NameNode

到此,hadoop单机模式就配置成功了!

命令与验证

代码语言:txt
复制
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088/
Node Specific Info: http://localhost:8042/
Command
$ jps
$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs

参考资料

  1. Hadoop: Setting up a Single Node Cluster.
  2. centos7 hadoop 单机模式安装配置
  3. Hadoop in OSX El-Capitan
  4. Installing Hadoop on Mac OS X 10.9.4
  5. macOS上搭建伪分布式Hadoop环境

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 准备工作
    • 安装目录
      • JDK
      • 下载Hadoop
      • 安装与配置Hadoop
        • 修改JDK配置
          • 验证Hadoop
            • 单机模式执行
              • 配置core-site.xml
                • 配置hdfs-site.xml
                • 启动与停止Hadoop
                  • 配置.bash_profile
                    • 第一次启动hdfs需要格式化
                      • 启动HDFS
                        • 停止HDFS
                          • HDFS启动状态查看
                            • 验证HDFS
                              • 启动时遇到的坑
                                • 一、sh: connect to host localhost port 22: Connection refused
                                • 二、Unable to load native-hadoop library for your platform
                                • 三、An Ant BuildException has occured: exec returned
                            • 配置与启动yarn
                              • 配置mapred-site.xml
                                • 配置yarn-site.xml
                                  • yarn启动与停止
                                  • 命令与验证
                                  • 参考资料
                                  相关产品与服务
                                  大数据
                                  全栈大数据产品,面向海量数据场景,帮助您 “智理无数,心中有数”!
                                  领券
                                  问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档