大数据学习之路01——让Hadoop在MacOS上跑起来

2019-05-15

本安装文档是在MacOS中安装单机版Hadoop。

准备工作

安装目录

WZB-MacBook:50_bigdata wangzhibin$ pwd
/Users/wangzhibin/00_dev_suite/50_bigdata

JDK

Mac安装JDK的过程略,参考:MAC下安装多版本JDK和切换几种方式

WZB-MacBook:50_bigdata wangzhibin$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
WZB-MacBook:50_bigdata wangzhibin$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

下载Hadoop

brew install wget
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/core/hadoop-2.8.4/hadoop-2.8.4.tar.gz
WZB-MacBook:50_bigdata wangzhibin$ tar -zxvf hadoop-2.8.4.tar.gz

安装与配置Hadoop

修改JDK配置

WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}改为
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home

验证Hadoop

WZB-MacBook:hadoop-2.8.4 wangzhibin$ bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  CLASSNAME            run the class named CLASSNAME
 or
  where COMMAND is one of:
  fs                   run a generic filesystem user client
  version              print the version
  jar <jar>            run a jar file
                       note: please use "yarn jar" to launch
                             YARN applications, not this command.
  checknative [-a|-h]  check native hadoop and compression libraries availability
  distcp <srcurl> <desturl> copy file or directories recursively
  archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  classpath            prints the class path needed to get the
  credential           interact with credential providers
                       Hadoop jar and the required libraries
  daemonlog            get/set the log level for each daemon
  trace                view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

单机模式执行

  $ mkdir input
  $ cp etc/hadoop/*.xml input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar grep input output 'dfs[a-z.]+'
  $ cat output/*
  1	dfsadmin

配置core-site.xml

WZB-MacBook:hadoop-2.8.4 wangzhibin$ mkdir -p hdfs/tmp
WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/core-site.xml

增加如下配置:

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

配置hdfs-site.xml

WZB-MacBook:hadoop-2.8.4 wangzhibin$ vi etc/hadoop/hdfs-site.xml

增加如下配置:

<configuration>
        <property>
             <name>dfs.replication</name>
             <value>1</value>
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/data</value>
        </property>
</configuration>

启动与停止Hadoop

配置.bash_profile

# set hadoop
export HADOOP_HOME=/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

第一次启动hdfs需要格式化

WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./bin/hdfs namenode -format
...
19/05/15 22:30:47 INFO common.Storage: Storage directory /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/hdfs/tmp/dfs/name has been successfully formatted.
...

启动HDFS

./sbin/start-dfs.sh

停止HDFS

./sbin/stop-dfs.sh

HDFS启动状态查看

  1. HDFS 状态:http://localhost:50070/dfshealth.html#tab-overview
  2. Secordary NameNode 状态:http://localhost:50090/status.html
  3. 本地官方文档:API文档
image.png

验证HDFS

简单的验证hadoop命令:

$ hadoop fs -mkdir /test
WZB-MacBook:hadoop wangzhibin$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - wangzhibin supergroup          0 2019-05-16 11:26 /test
image.png

启动时遇到的坑

一、sh: connect to host localhost port 22: Connection refused

此时可能会出现如下错误。是因为没有配置ssh免密登录。

WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:38:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
19/05/15 22:38:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

采用如下方法解决:

1)解决方法是选择系统偏好设置->选择共享->点击远程登录

2)设置免密登录

    $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    $ chmod 0600 ~/.ssh/authorized_keys
    $ ssh localhost

二、Unable to load native-hadoop library for your platform

WZB-MacBook:hadoop-2.8.4 wangzhibin$ ./sbin/start-dfs.sh
19/05/15 22:50:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-namenode-WZB-MacBook.local.out
localhost: starting datanode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-datanode-WZB-MacBook.local.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4/logs/hadoop-wangzhibin-secondarynamenode-WZB-MacBook.local.out
19/05/15 22:50:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

参考:

  1. 官方-Native Libraries Guide
  2. Mac OSX 下 Hadoop 使用本地库提高效率
  3. Hadoop native libraries: Installation on Mac Osx

解决方案:重新编译hadoop,将编译后的hadoop-dist/target/hadoop-2.8.4/lib/native替换$HADOOP_HOME/lib/native

  • 安装基础组件
$ brew install gcc autoconf automake libtool cmake snappy gzip bzip2 zlib
  • 安装protobuf。
   wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
   tar zxvf protobuf-2.5.0.tar.gz
   cd protobuf-2.5.0
   ./configure
   make
   make install
  • 重新编译hadoop
image.png
wget http://apache.fayea.com/hadoop/common/hadoop-2.8.4/hadoop-2.8.4-src.tar.gz
tar zxvf hadoop-2.8.4-src.tar.gz
cd hadoop-2.8.4-src
mvn package -Pdist,native -DskipTests -Dtar -e
cp -r /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-dist/target/hadoop-2.8.4/lib/native .

三、An Ant BuildException has occured: exec returned

WZB-MacBook:hadoop-2.8.4-src wangzhibin$ mvn package -Pdist,native -DskipTests -Dtar -e
...
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1
around Ant part ...<exec failonerror="true" dir="/Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/native" executable="cmake">... @ 5:152 in /Users/wangzhibin/00_dev_suite/50_bigdata/hadoop-2.8.4-src/hadoop-tools/hadoop-pipes/target/antrun/build-main.xml

参考:mac下编译Hadoop 2.8.1报错An Ant BuildException has occured: exec returned: 1,排错过程

解决方案:配置环境变量OPENSSL_ROOT_DIR、OPENSSL_INCLUDE_DIR。修改~/.bash_profile

# openssl
export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2r
export OPENSSL_INCLUDE_DIR=$OPENSSL_ROOT_DIR/include

配置与启动yarn

配置mapred-site.xml

cd $HADOOP_HOME/etc/hadoop/
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置yarn-site.xml

vim yarn-site.xml
<configuration>
    <!-- reducer取数据的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

yarn启动与停止

cd $HADOOP_HOME
./sbin/start-yarn.sh
./sbin/stop-yarn.sh

浏览器查看:http://localhost:8088

image.png

jps查看进程

WZB-MacBook:hadoop wangzhibin$ jps
534 NutstoreGUI
49135 DataNode
49834 ResourceManager
49234 SecondaryNameNode
49973 Jps
67596
49912 NodeManager
49057 NameNode

到此,hadoop单机模式就配置成功了!

命令与验证

Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088/
Node Specific Info: http://localhost:8042/
Command
$ jps
$ yarn // For resource management more information than the web interface.
$ mapred // Detailed information about jobs

参考资料

  1. Hadoop: Setting up a Single Node Cluster.
  2. centos7 hadoop 单机模式安装配置
  3. Hadoop in OSX El-Capitan
  4. Installing Hadoop on Mac OS X 10.9.4
  5. macOS上搭建伪分布式Hadoop环境

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

编辑于

我来说两句

0 条评论
登录 后参与评论

扫码关注云+社区

领取腾讯云代金券