本地模式
伪分布式
集群模式
到hadoop官网下载对应的包 这里用的src带源码的hadoop2.7.1,因为需要自己编译(如果是32位的系统,直接下载编译的版本也行) 通过命令上传到linux后,解压 在目录下,可以观察对应的BUILDING.txt (这里用的SecureCRT lrzsz 应用安装已经下载好的文件, 通过 yum -y install lrzsz 安装应用)
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Jansson C XML parsing library ( if compiling libwebhdfs )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
所以,下载对应的
大体步骤: 添加环境变量
刷新,启用
注意安装cmake
最后,通过
Build options:
* Use -Pnative to compile/bundle native code #编译native
* Use -Pdocs to generate & bundle the documentation in the distribution (using # 编译文档
-Pdist)
* Use -Psrc to create a project source TAR.GZ # 编译源码
* Use -Dtar to create a TAR with the distribution (using -Pdist) # 编译生成
这里用(也是)
mvn package -Pdist,native,docs -DskipTests -Dtar
根据网络情况,可以编译通过
编译通过后,对应的src的 hadoop-dist 的 target的 hadoop-2.7.1 下面就是对应的编译后的文件 (其实,如果只用这个项目的话,为什么要用maven编译所有的项目,自己也不理解,基本上看过的所有的资料,都是编译所有的项目)
ftp工具 可以用 FileZilla, 或者 notepad++(ftp插件) 或者 FlashFXP, xFTP 都行 (方便传输)
在编译完之后的hadoop文件夹下(自己编译,或者用别人编译好的,都行)
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
之前有做过,所以,这里略
需要执行的
在 logs文件夹下面,会有对应的日志文件 可以通过查看日志,查看原因
也就是 hadoop1.x中对应的 JobTracker及TaskTracker 等的管理 (单独出来了,解耦了)
对应的配置 单结点yarn配置
配置参数
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
开启 ResourceManager(总资源分配) 和 NodeManager(某结点资源)
到对应的目录,新建文件,输入一些数据 将文件放入 hdfs中(因为hadoop是读取hdfs中的数据的)
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
-
目的是复习并且记录下过程
默认对应的2个界面端口
之前自己搭建的版本比较老,和新的方式有些方法不太一样 只是简单记录下遇到的问题 (想想,之前遇到的问题比现在多很多,坑走的多了,自然会考虑到原来遇到过的坑) 这里也只是相当于简单的环境 hadoop重要的是实践和算法, 有时间再弄弄