Hadoop总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 210
概述 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 211
CDH - - - - - - - - - - - - - - - - - - - - - - - - - - - - 211
安装Hadoop2.6.4 非Zookeeper集群版 - - - - - - - - - - - - - - - 211
安装Hadoop2.6.4 Zookeeper集群版 - - - - - - - - - - - - - - - 216
MapReduce整体的流程详解 - - - - - - - - - - - - - - - - - - - - 225
Hadoop HDFS 系统详解 - - - - - - - - - - - - - - - - - - - - - 226
JAVA 操作HDFS - - - - - - - - - - - - - - - - - - - - - - - - 241
Hadoop MapReduce 实例 - - - - - - - - - - - - - - - - - - - - 248
Hadoop 其他总结 - - - - - - - - - - - - - - - - - - - - - - - - 259
Hadoop 优化总结 - - - - - - - - - - - - - - - - - - - - - - - - 259
基于HDP2.6.0.3-8的Hadoop TestDFSIO、mrbench和nnbench是三个广泛被使用的测试
详细测试过程请查看:http://blog.csdn.net/xfg0218/article/details/78592512
# cd /usr/hdp/2.6.0.3-8/hadoop-mapreduce
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar
*****
主要目的是测试hadoop写的速度
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO
17/11/21 14:46:38 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -write -nrFiles 10 -size 10MB
*********
# hadoop fs -ls -h /benchmarks/TestDFSIO/io_data
Found 10 items
-rw-r--r-- 3 admin hdfs 10 M 2017-11-21 14:53 /benchmarks/TestDFSIO/io_data/test_io_0
-rw-r--r-- 3 admin hdfs 10 M 2017-11-21 14:53 /benchmarks/TestDFSIO/io_data/test_io_1
***********
# cat TestDFSIO_results.log
----- TestDFSIO ----- : write
Date & time: Tue Nov 21 14:53:44 CST 2017
Number of files: 10
Total MBytes processed: 100.0
Throughput mb/sec: 19.485580670303975
Average IO rate mb/sec: 24.091276168823242
IO rate std deviation: 9.242316274402379
Test exec time sec: 63.103
主要目的测试hadoop读文件的速度
TestDFSIO的用法如下:
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -read -nrFiles 10 -size 10
***************
# cat TestDFSIO_results.log
----- TestDFSIO ----- : write
Date & time: Tue Nov 21 14:53:44 CST 2017
Number of files: 10
Total MBytes processed: 100.0
Throughput mb/sec: 19.485580670303975
Average IO rate mb/sec: 24.091276168823242
IO rate std deviation: 9.242316274402379
Test exec time sec: 63.103
----- TestDFSIO ----- : read
Date & time: Tue Nov 21 15:04:33 CST 2017
Number of files: 10
Total MBytes processed: 100.0
Throughput mb/sec: 617.283950617284
Average IO rate mb/sec: 688.1331176757812
IO rate std deviation: 182.42935237458195
Test exec time sec: 36.148
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -clean
17/11/21 15:15:35 INFO fs.TestDFSIO: TestDFSIO.1.8
17/11/21 15:15:35 INFO fs.TestDFSIO: nrFiles = 1
17/11/21 15:15:35 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
17/11/21 15:15:35 INFO fs.TestDFSIO: bufferSize = 1000000
17/11/21 15:15:35 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
17/11/21 15:15:35 INFO fs.TestDFSIO: Cleaning up test files
# hadoop fs -ls /benchmarks/
nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。
这个测试能在HDFS上创建、读取、重命名和删除文件操作
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar nnbench
*********
以下例子使用10个mapper和5个reducer来创建1000个文件
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true
***************
# cat NNBench_results.log
-------------- NNBench -------------- :
Version: NameNode Benchmark 0.4
Date & time: 2017-11-21 15:21:35,703
Test Operation: create_write
Start time: 2017-11-21 15:21:08,692
Maps to run: 10
Reduces to run: 5
Block Size (bytes): 1
Bytes to write: 0
Bytes per checksum: 1
Number of files: 1000
Replication factor: 3
Successful file operations: 0
# maps that missed the barrier: 5
# exceptions: 5000
TPS: Create/Write/Close: 0
Avg exec time (ms): Create/Write/Close: Infinity
Avg Lat (ms): Create/Write: NaN
Avg Lat (ms): Close: NaN
RAW DATA: AL Total #1: 0
RAW DATA: AL Total #2: 0
RAW DATA: TPS Total (ms): 21176
RAW DATA: Longest Map Time (ms): 4535.0
RAW DATA: Late maps: 5
RAW DATA: # of exceptions: 5000
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar mrbench --help
MRBenchmark.0.0.2
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]
# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar mrbench -numRuns 2
MRBenchmark.0.0.2
*************
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 39012