专栏首页Greenplum《快学BigData》--Hadoop总结(I)(42)

《快学BigData》--Hadoop总结(I)(42)

Hadoop总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 210

概述 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 211

CDH - - - - - - - - - - - - - - - - - - - - - - - - - - - - 211

安装Hadoop2.6.4 非Zookeeper集群版 - - - - - - - - - - - - - - - 211

安装Hadoop2.6.4 Zookeeper集群版 - - - - - - - - - - - - - - - 216

MapReduce整体的流程详解 - - - - - - - - - - - - - - - - - - - - 225

Hadoop HDFS 系统详解 - - - - - - - - - - - - - - - - - - - - - 226

JAVA 操作HDFS - - - - - - - - - - - - - - - - - - - - - - - - 241

Hadoop MapReduce 实例 - - - - - - - - - - - - - - - - - - - - 248

Hadoop 其他总结 - - - - - - - - - - - - - - - - - - - - - - - - 259

Hadoop 优化总结 - - - - - - - - - - - - - - - - - - - - - - - - 259

基于HDP2.6.0.3-8的Hadoop TestDFSIO、mrbench和nnbench是三个广泛被使用的测试

详细测试过程请查看:http://blog.csdn.net/xfg0218/article/details/78592512

1-1)、Hadoop Test 的测试

A)、进入的目录

# cd /usr/hdp/2.6.0.3-8/hadoop-mapreduce

B)、查看参数

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar

*****

1-2)、TestDFSIO write的性能测试

主要目的是测试hadoop写的速度

A)、查看参数

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO

17/11/21 14:46:38 INFO fs.TestDFSIO: TestDFSIO.1.8

Missing arguments.

Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

B)、运行实例

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -write -nrFiles 10 -size 10MB

*********

C)、查看数据

# hadoop fs -ls -h /benchmarks/TestDFSIO/io_data

Found 10 items

-rw-r--r-- 3 admin hdfs 10 M 2017-11-21 14:53 /benchmarks/TestDFSIO/io_data/test_io_0

-rw-r--r-- 3 admin hdfs 10 M 2017-11-21 14:53 /benchmarks/TestDFSIO/io_data/test_io_1

***********

D)、查看执行的结果

# cat TestDFSIO_results.log

----- TestDFSIO ----- : write

Date & time: Tue Nov 21 14:53:44 CST 2017

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 19.485580670303975

Average IO rate mb/sec: 24.091276168823242

IO rate std deviation: 9.242316274402379

Test exec time sec: 63.103

1-3)、TestDFSIO Read的性能测试

主要目的测试hadoop读文件的速度

A)、运行命令

TestDFSIO的用法如下:

Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -read -nrFiles 10 -size 10

***************

B)、查看运行的情况

# cat TestDFSIO_results.log

----- TestDFSIO ----- : write

Date & time: Tue Nov 21 14:53:44 CST 2017

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 19.485580670303975

Average IO rate mb/sec: 24.091276168823242

IO rate std deviation: 9.242316274402379

Test exec time sec: 63.103

----- TestDFSIO ----- : read

Date & time: Tue Nov 21 15:04:33 CST 2017

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 617.283950617284

Average IO rate mb/sec: 688.1331176757812

IO rate std deviation: 182.42935237458195

Test exec time sec: 36.148

1-4)、清空测试数据

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar TestDFSIO -clean

17/11/21 15:15:35 INFO fs.TestDFSIO: TestDFSIO.1.8

17/11/21 15:15:35 INFO fs.TestDFSIO: nrFiles = 1

17/11/21 15:15:35 INFO fs.TestDFSIO: nrBytes (MB) = 1.0

17/11/21 15:15:35 INFO fs.TestDFSIO: bufferSize = 1000000

17/11/21 15:15:35 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO

17/11/21 15:15:35 INFO fs.TestDFSIO: Cleaning up test files

1-5)、查看hadoop文件系统

# hadoop fs -ls /benchmarks/

1-6)、nnbench 测试 [NameNode benchmark (nnbench)]

nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。

这个测试能在HDFS上创建、读取、重命名和删除文件操作

A)、查看nnbench选项

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar nnbench

*********

B)、运行命令

以下例子使用10个mapper和5个reducer来创建1000个文件

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true

***************

C)、查看结果

# cat NNBench_results.log

-------------- NNBench -------------- :

Version: NameNode Benchmark 0.4

Date & time: 2017-11-21 15:21:35,703

Test Operation: create_write

Start time: 2017-11-21 15:21:08,692

Maps to run: 10

Reduces to run: 5

Block Size (bytes): 1

Bytes to write: 0

Bytes per checksum: 1

Number of files: 1000

Replication factor: 3

Successful file operations: 0

# maps that missed the barrier: 5

# exceptions: 5000

TPS: Create/Write/Close: 0

Avg exec time (ms): Create/Write/Close: Infinity

Avg Lat (ms): Create/Write: NaN

Avg Lat (ms): Close: NaN

RAW DATA: AL Total #1: 0

RAW DATA: AL Total #2: 0

RAW DATA: TPS Total (ms): 21176

RAW DATA: Longest Map Time (ms): 4535.0

RAW DATA: Late maps: 5

RAW DATA: # of exceptions: 5000

1-7)、mrbench测试[MapReduce benchmark (mrbench)]

mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。

A)、查看帮助

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar mrbench --help

MRBenchmark.0.0.2

Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

B)、下面的例子会运行一个小作业2次

# hadoop jar hadoop-mapreduce-client-jobclient-2.7.3.2.6.0.3-8.jar mrbench -numRuns 2

MRBenchmark.0.0.2

*************

DataLines Maps Reduces AvgTime (milliseconds)

1 2 1 39012

本文分享自微信公众号 - 小徐的技术之路(xiaoxuBigdata),作者:小徐

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-03-28

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 《快学BigData》--Hadoop总结(G)(40)

    Hadoop总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 210

    小徐
  • 《快学BigData》--Hadoop总结(F)(39)

    Hadoop总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 210

    小徐
  • 《快学BigData》--Hadoop总结(D)(37)

    Hadoop总结 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 210

    小徐
  • Hadoop问题笔记之五问五答

    我是攻城师
  • poj-1008-玛雅历

    上周末,M.A. Ya教授对古老的玛雅有了一个重大发现。从一个古老的节绳(玛雅人用于记事的工具)中,教授发现玛雅人使用了一个一年有365天的叫做Haab的历法...

    瑾诺学长
  • JDK12 ShenandoahGC小试牛刀

    当allocation failure发生的时候,Shenandoah有一些优雅的degradation ladder用于处理这种情况,如下:

    codecraft
  • Spring事务失效的两种情况

    spring的事务默认是对RuntimeException进行回滚,而不继承RuntimeException的不回滚。因为在java的设计中,它认为不继承Run...

    逝兮诚
  • 小心你的网站被劫持,偷偷为他人挖矿

    腾讯云安全
  • 并非取代!Facebook的新纸牌游戏机器人表明AI可以与人进行“人机协作”

    大厂其实一直非常喜欢让人工智能玩游戏挑战相关专业人士。除了Google旗下的DeepMind的一众Alpha*外,Facebook也建立了人工智能驱动的机器人,...

    新智元
  • 诺贝尔奖唯一计算机领域评委亲临,中国未来能否凭AI得诺奖?

    ---- 新智元报道 作者:编辑部 2017年是AI爆发的一年,AlphaGO Master和AlphaZero不断刷新人工智能的新疆界,TPU+Te...

    新智元

扫码关注云+社区

领取腾讯云代金券

玩转腾讯云 有奖征文活动