查看Hadoop HDFS 中的一个文件对应block信息

本文地址:http://blog.csdn.net/chengyuqiang/article/details/78163091

如果需要查看Hadoop HDFS 中的一个文件对应block信息,比如block数、block大小、block所在位置等,可以使用hdfs fsck命令。

1. HDFS示例文件

hdfs dfs -ls /user/root/input

[root@node1 data]# hdfs dfs -ls /user/root/input
Found 7 items
-rw-r--r--   3 root supergroup     281498 2017-09-20 10:11 /user/root/input/Hamlet.txt
-rw-r--r--   3 root supergroup    9789248 2017-09-22 10:26 /user/root/input/age.txt
-rw-r--r--   3 root supergroup         71 2017-08-27 09:18 /user/root/input/books.txt
-rw-r--r--   3 root supergroup  264075431 2017-10-05 09:37 /user/root/input/cite75_99.txt
drwxr-xr-x   - root supergroup          0 2017-08-13 09:33 /user/root/input/emp.bak
drwxr-xr-x   - root supergroup          0 2017-09-24 04:08 /user/root/input/ml-1m
-rw-r--r--   3 root supergroup  871353053 2017-10-05 09:40 /user/root/input/ncdc.txt
[root@node1 data]#

2. hdfs fsck用法

[root@node1 data]# hdfs fsck 
Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>]
    <path>  start checking from this path
    -move   move corrupted files to /lost+found
    -delete delete corrupted files
    -files  print out files being checked
    -openforwrite   print out files opened for write
    -includeSnapshots   include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
    -list-corruptfileblocks print out list of missing blocks and files they belong to
    -blocks print out block report
    -locations  print out locations for every block
    -racks  print out network topology for data-node locations
    -storagepolicies    print out storage policy summary for the blocks
    -blockId    print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

Please Note:
    1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
    2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[root@node1 data]# 

3. 查看block基本信息

hdfs fsck input/cite75_99.txt

[root@node1 data]# hdfs fsck input/cite75_99.txt
Connecting to namenode via http://node1:50070/fsck?ugi=root&path=%2Fuser%2Froot%2Finput%2Fcite75_99.txt
FSCK started by root (auth:SIMPLE) from /192.168.80.131 for path /user/root/input/cite75_99.txt at Thu Oct 05 09:41:58 EDT 2017
.Status: HEALTHY
 Total size:    264075431 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  2 (avg. block size 132037715 B)
 Minimally replicated blocks:   2 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Thu Oct 05 09:41:58 EDT 2017 in 3 milliseconds


The filesystem under path '/user/root/input/cite75_99.txt' is HEALTHY
[root@node1 data]# hdfs fsck input/ncdc.txt
Connecting to namenode via http://node1:50070/fsck?ugi=root&path=%2Fuser%2Froot%2Finput%2Fncdc.txt
FSCK started by root (auth:SIMPLE) from /192.168.80.131 for path /user/root/input/ncdc.txt at Thu Oct 05 09:42:22 EDT 2017
.Status: HEALTHY
 Total size:    871353053 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  7 (avg. block size 124479007 B)
 Minimally replicated blocks:   7 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Thu Oct 05 09:42:22 EDT 2017 in 2 milliseconds


The filesystem under path '/user/root/input/ncdc.txt' is HEALTHY
[root@node1 data]# 

4. -files -blocks -locations选项

hdfs fsck input/ncdc.txt -files -blocks hdfs fsck input/ncdc.txt -files -blocks -locations

[root@node1 data]# hdfs fsck input/ncdc.txt -files -blocks
Connecting to namenode via http://node1:50070/fsck?ugi=root&files=1&blocks=1&path=%2Fuser%2Froot%2Finput%2Fncdc.txt
FSCK started by root (auth:SIMPLE) from /192.168.80.131 for path /user/root/input/ncdc.txt at Thu Oct 05 09:47:14 EDT 2017
/user/root/input/ncdc.txt 871353053 bytes, 7 block(s):  OK
0. BP-766589174-192.168.80.131-1500731607717:blk_1073742821_2026 len=134217728 repl=3
1. BP-766589174-192.168.80.131-1500731607717:blk_1073742822_2027 len=134217728 repl=3
2. BP-766589174-192.168.80.131-1500731607717:blk_1073742823_2028 len=134217728 repl=3
3. BP-766589174-192.168.80.131-1500731607717:blk_1073742824_2029 len=134217728 repl=3
4. BP-766589174-192.168.80.131-1500731607717:blk_1073742825_2030 len=134217728 repl=3
5. BP-766589174-192.168.80.131-1500731607717:blk_1073742826_2031 len=134217728 repl=3
6. BP-766589174-192.168.80.131-1500731607717:blk_1073742827_2032 len=66046685 repl=3

Status: HEALTHY
 Total size:    871353053 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  7 (avg. block size 124479007 B)
 Minimally replicated blocks:   7 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Thu Oct 05 09:47:14 EDT 2017 in 2 milliseconds


The filesystem under path '/user/root/input/ncdc.txt' is HEALTHY
[root@node1 data]# hdfs fsck input/ncdc.txt -files -blocks -locations
Connecting to namenode via http://node1:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fuser%2Froot%2Finput%2Fncdc.txt
FSCK started by root (auth:SIMPLE) from /192.168.80.131 for path /user/root/input/ncdc.txt at Thu Oct 05 09:47:45 EDT 2017
/user/root/input/ncdc.txt 871353053 bytes, 7 block(s):  OK
0. BP-766589174-192.168.80.131-1500731607717:blk_1073742821_2026 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK]]
1. BP-766589174-192.168.80.131-1500731607717:blk_1073742822_2027 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK]]
2. BP-766589174-192.168.80.131-1500731607717:blk_1073742823_2028 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK], DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK]]
3. BP-766589174-192.168.80.131-1500731607717:blk_1073742824_2029 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK]]
4. BP-766589174-192.168.80.131-1500731607717:blk_1073742825_2030 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK]]
5. BP-766589174-192.168.80.131-1500731607717:blk_1073742826_2031 len=134217728 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK]]
6. BP-766589174-192.168.80.131-1500731607717:blk_1073742827_2032 len=66046685 repl=3 [DatanodeInfoWithStorage[192.168.80.131:50010,DS-602e79bf-d01e-4b6b-8712-f6293e394ab1,DISK], DatanodeInfoWithStorage[192.168.80.133:50010,DS-0056ec91-47b7-4c48-8f6e-89ca33be49c6,DISK], DatanodeInfoWithStorage[192.168.80.132:50010,DS-d3917eb8-31b4-49d6-b5eb-1316f7c0f310,DISK]]

Status: HEALTHY
 Total size:    871353053 B
 Total dirs:    0
 Total files:   1
 Total symlinks:        0
 Total blocks (validated):  7 (avg. block size 124479007 B)
 Minimally replicated blocks:   7 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    3
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      3
 Number of racks:       1
FSCK ended at Thu Oct 05 09:47:45 EDT 2017 in 2 milliseconds


The filesystem under path '/user/root/input/ncdc.txt' is HEALTHY
[root@node1 data]#

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Android先生

【漫画技术】Android跨进程通信

Tips:4个环节,共计约9小时的精心打磨完成上线,同时也非常感谢参与审稿的同学。

832
来自专栏编程思想之路

Android5.0以后隐式启动ServiceBug

以前写过一篇关于进程间通信的博客 通信之进程间通信-AIDL 当时用的还是4.2的系统,跨进程 的服务可以根据action进行启动 ...

1947
来自专栏向治洪

android进程 清理及activity栈管理

MainActivity如下: package come.on;   import android.app.Activity;   import andro...

19810
来自专栏Android 研究

Android系统启动——6 SystemServer启动

SystemServer是Android系统的核心之一,大部分Android提供的服务都运行在这个进程里,SystemServer中运行的服务总共有60多种。为...

752
来自专栏blackheart的专栏

实用代码-C#获取本机网络适配器信息及MAC地址

System.Net.NetworkInformation空间提供对网络流量数据、网络地址信息和本地计算机的地址更改通知的访问。该命名空间还包含实现 Ping ...

1789
来自专栏BinarySec

一些pwn题目的解题思路[pwnable.kr] II

目录 以下是solution的目录 #mistake #shellshock #coin1 #blackjack #lotto #cmd1 Other 一些pw...

3285
来自专栏散尽浮华

使用Megacli64对服务器物理磁盘做Raid并通过uuid方式挂载

903
来自专栏史上最简单的Spring Cloud教程

Mybatis学习的一些细节

一.mybatis 基本配置 最近几天一直在学习mybatis,看了一些源码,本文讲述mybatis的一些基本配置和基本的用法和注意到一些细节。个人时间和精力有...

1786
来自专栏Android 开发学习

Android ANR 分析学习总结

2273
来自专栏跟着阿笨一起玩NET

TreeView控件中实现拖拽的功能

561

扫码关注云+社区