hadoop安装及配置入门篇

声明:
      author: 龚细军
      时间: 17-08-01
      类型: 笔记
      转载时请注明出处及相应链接。
      链接地址: http://www.cnblogs.com/gongxijun/p/5726024.html

本笔记所记录全部基于真实操作所得,所使用hadoop版本为hadoop-2.7.2,使用操作系统为kylin-linux.

默认是:已经安装好了jdk环境.并已经下载好hadoop&解压之后

1. 下载完成hadoo并解压之后

进入到安装目录,我们会看到如下几个文件夹和文件

/hadoop-2.7.2$ ls
bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share
etc  input    libexec  logs         output      sbin        wc-in

介绍一下基本情况:

bin目录: hadoop的指令集合存储区,例如 hadoop ,hdfs , yarn,mapred等  这个文件比较重要

我们可以如此使用它们:

/hadoop-2.7.2$ bin/hadoop dfs -cat output/* |more

include目录: C++/C 开发用的头文件 

lib目录: 提供各种库,c/c++开发库

etc目录: 环境配置包,其他的版本采用conf目录替换,进入该目录下会看到

/hadoop-2.7.2/etc/hadoop$ ls | grep .xml
capacity-scheduler.xml
core-site.xml
hadoop-policy.xml
hdfs-site.xml
hdfs-site.xml~
httpfs-site.xml
kms-acls.xml
kms-site.xml
mapred-queues.xml.template
mapred-site.xml.template
ssl-client.xml.example
ssl-server.xml.example
yarn-site.xml

关于如何伪分布式配置 

1.配置文件core.site.xml

  <configuration>
            <property>
            <name>fs.default.name</name>
            <value>hdfs://localhost:9000</value>
            </property>
    </configuration>

2.hdfs.site.xml文件配置

    <configuration>
            <property>
            <name>dfs.replication</name>
            <value>1</value>
            </property>
            <property>
            <name>dfs.name.dir</name>
            <value>/home/gongxijun/HDFS/fileinput</value>
            </property>
            <property>
            <name>dfs.data.dir</name>
            <value>/home/gongxijun/HDFS/fileoutput</value>
            </property>
            <property>
            <name>dfs.permissions</name>
            <value>false</value>
            <description>
            if "true" ,enable permission checking in HDFS. if "false",permission checking is turned off,but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode , owner or group of files or directories.
            </description>
            </property>
            </configuration>

3.配置mapred-site.xml文件,如要将mapred-site.xml.template文件复制一份mapred-site.xml,并对mapred-site.xml进行如下配置

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
    

之后,启动hadoop,输入./start-all.sh

程序pom.xml文件配置

   <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
            <scope>compile</scope>
            <exclusions>
                <exclusion>
                    <artifactId>zookeeper</artifactId>
                    <groupId>org.apache.zookeeper</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>slf4j-log4j12</artifactId>
                    <groupId>org.slf4j</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jsp-api</artifactId>
                    <groupId>javax.servlet.jsp</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jasper-runtime</artifactId>
                    <groupId>tomcat</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jasper-compiler</artifactId>
                    <groupId>tomcat</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jersey-server</artifactId>
                    <groupId>com.sun.jersey</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>asm</artifactId>
                    <groupId>asm</groupId>
                </exclusion>
            </exclusions>
        </dependency>

运行程序如下:

 1 package com.qunar.mapReduce;
 2 
 3 import org.apache.hadoop.conf.Configuration;
 4 import org.apache.hadoop.fs.Path;
 5 import org.apache.hadoop.io.IntWritable;
 6 import org.apache.hadoop.io.LongWritable;
 7 import org.apache.hadoop.io.Text;
 8 import org.apache.hadoop.mapreduce.Job;
 9 import org.apache.hadoop.mapreduce.Mapper;
10 import org.apache.hadoop.mapreduce.Reducer;
11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
12 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
15 
16 import java.io.IOException;
17 import java.util.Scanner;
18 import java.util.StringTokenizer;
19 
20 /**
21  * *********************************************************
22  * <p/>
23  * Author:     XiJun.Gong
24  * Date:       2016-07-29 14:59
25  * Version:    default 1.0.0
26  * Class description:
27  * <p/>
28  * *********************************************************
29  */
30 public class MapReduceDemo {
31 
32     public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
33 
34         private final static IntWritable one = new IntWritable(1);
35         private Text word = new Text();
36 
37         @Override
38         public void map(LongWritable key, Text value, Context context)
39                 throws IOException, InterruptedException {
40             String line = value.toString();
41             StringTokenizer tokenizer = new StringTokenizer(line);
42             while (tokenizer.hasMoreTokens()) {
43                 word.set(tokenizer.nextToken());
44                 context.write(word, one);
45             }
46         }
47 
48         public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
49 
50             @Override
51             public void reduce(Text key, Iterable<IntWritable> values, Context context)
52                     throws IOException, InterruptedException {
53                 int sum = 0;
54                 for (IntWritable val : values) {
55                     sum += val.get();
56                 }
57 
58                 context.write(key, new IntWritable(sum));
59             }
60         }
61 
62 
63         public static void main(String[] args) throws Exception {
64 
65             Configuration configuration = new Configuration();
66             Job job = new Job(configuration, "wordCount");
67             job.setOutputKeyClass(Text.class);
68             job.setOutputValueClass(IntWritable.class);
69             job.setMapperClass(Map.class);
70             job.setReducerClass(Reduce.class);
71             job.setInputFormatClass(TextInputFormat.class);
72             job.setOutputFormatClass(TextOutputFormat.class);
73             Scanner reader = new Scanner(System.in);
74             while (reader.hasNext()) {
75                 FileInputFormat.addInputPath(job, new Path(reader.next()));
76                 FileOutputFormat.setOutputPath(job, new Path(reader.next()));
77                 job.waitForCompletion(true);
78             }
79         }
80     }
81 }

运行程序:

Connected to the target VM, address: '127.0.0.1:51980', transport: 'socket'
12:41:05.404 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
12:41:05.441 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
12:41:05.442 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)
12:41:05.444 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
12:41:05.871 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty
12:41:05.883 [main] DEBUG org.apache.hadoop.security.Groups -  Creating new Groups object
12:41:05.895 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...
12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/home/gongxijun/Qunar/idea-IU-139.1117.1/bin::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
12:41:05.897 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12:41:05.900 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based
12:41:05.905 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
12:41:05.957 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0
12:41:05.957 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
12:41:05.961 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login
12:41:05.962 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit
12:41:05.968 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: gongxijun
12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: gongxijun" with name gongxijun
12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "gongxijun"
12:41:05.970 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:gongxijun (auth:SIMPLE)

输入命令:

/home/gongxijun/web进阶.txt /home/gongxijun/a.txt

显示结果:

12:44:36.992 [main] INFO  org.apache.hadoop.mapreduce.Job - Counters: 33
    File System Counters
        FILE: Number of bytes read=6316
        FILE: Number of bytes written=518809
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=84
        Map output records=85
        Map output bytes=1476
        Map output materialized bytes=1652
        Input split bytes=99
        Combine input records=0
        Combine output records=0
        Reduce input groups=82
        Reduce shuffle bytes=1652
        Reduce input records=85
        Reduce output records=82
        Spilled Records=170
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=9
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=459276288
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1335
    File Output Format Counters 
        Bytes Written=1311

 结果在a.txt文件夹中:

(kafuka卡夫卡)    1
(缺陷:    1
(需要重点学习)    1
---去查看QMQ--message---->broker    1
/    1
1.    2
1.判断线程安全的两个机准:    1
2.    3
3.    1
Apache    1
Cache    1
Client:    1
ConCurrentHashMap    1
Dubbo    1
Executor    1
Futrue/CountDownLatch    1
Guava    1
HTTP:    1
HashMap    1
Hession    1
HttpComponents    1
Java    1
Json    1
Key-Value    1
Kryo(重点)    1
LRU    1
Protobuf    1
QMQ/AMQ/rabbitimq    1
ReadWriterLock    1
ReentrantLock    1
async-http-client    1
c3p0    1
client实现    1
dbpc    1
redis    1
seriialization    1
servlet    1
snchronized    1
spymemcached    1
tomcat-jdbc    1
xmemcached    1
一致性Hash    1
一:    1
三:    1
乐观锁:    1
二:    1
互斥    1
共享数据    1
分布式锁?    1
分布式:    1
前端轮询,后端异步:    1
单例的    1
参数回调    1
可复用资源,创建代价大    1
可扩展性,服务降级,负载均衡,灰度    1
可重入锁    1
可靠性    1
回顾    1
场景:    1
对象池:    1
将对象的状态信息转换为可以存储或传输形式的过程.    1
尽量不要使用本地缓存    1
并发修改    1
序列化:    1
建议:    1
异步调用    1
异步:    1
形成环)    1
性能    1
方式:    1
本地缓存太大,可以使用对象池    1
概念:    1
池化技术    1
消息队列:    1
类型:    1
线程池    1
缓存--本地    1
读写锁:    1
连接池:    1
(分段锁)    1
(推荐使用)    1
,    1

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏小筱月

SSM 使用 mybatis 分页插件 pagehepler 实现分页

前几天在这里分享了手写 sql 分页查询实现分页,现在来看看使用 mybatis 分页插件 pagehepler 来实现分页

46720
来自专栏jeremy的技术点滴

SSM项目脚手架

58040
来自专栏小灰灰

RabbitMQ基础教程之Spring&JavaConfig使用篇

27370
来自专栏Jed的技术阶梯

利用Sqoop实现Hive的数据与MySQL数据的互导

注意: 在sqoop-1.4.6以前,从MySQL中导出数据到hive表中,不能指定文件格式为parquet,只能先导入到HDFS,在从HDFS上load p...

1.1K20
来自专栏解Bug之路

MySql之自动生成CRUD代码

MyBatis能够通过获取MySql中的information_schema从而获取表的字段等信息,最后通过这些信息生成代码。 笔者受此启发,将MyBatis...

15830
来自专栏开发技术

spring-boot-2.0.3之quartz集成,不是你想的那样哦!

    晚上回家,爸妈正在吵架,见我回来就都不说话了,看见我妈坐在那里瞪着我爸,我就问老爸“你干什么了惹我妈生这么大气?”  我爸说“没有什么啊,倒是你,这么大...

30230
来自专栏Java面试通关手册

从分析我抓取的60w知乎网民来学习如何在SSM项目中使用Echarts

去年在接触Java爬虫的时候,接触到了一个关于知乎的爬虫。个人觉得写的非常好,当时抓取的效率和成功率还是特别特别高,现在可能知乎反扒做的更好,这个开源知乎爬虫没...

25530
来自专栏Java3y

从零开始写项目第一篇【搭建环境】

使用Maven搭建SSM环境 SSM需要的基础jar包有哪些??整理一下: c3p0数据连接池 springMVC的JSON springAOP springC...

560100
来自专栏个人分享

Spark代码调优(一)

import org.apache.spark.sql.{DataFrame, Row, SQLContext}

28110
来自专栏码匠的流水账

使用RSQL实现端到端的动态查询

RSQL(RESTful Service Query Language)是Feed Item Query Language (FIQL) 的超集,是一种REST...

10300

扫码关注云+社区

领取腾讯云代金券