hadoop版本是2.8.3
今天发现有奇怪的问题,如下List-1所示,提示有俩个文件块丢失
List-1
There are 2 missing blocks. The following files may be corrupted:
blk_1073857294 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar
blk_1073857295 /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-
hcatalog-core-3.0.0.jar
Please check the logs or run fsck in order to identify the missing blocks. See the Hadoop FAQ for common causes and potential solutions.
由于是/tmp目录下,不是正常的业务数据,我们直接删除,如下List-2,之后再去看hdfs的页面,无此问题了。
List-2
[xx@xxx hadoop]# hadoop fsck -delete
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://xxxx:50070/fsck?ugi=root&delete=1&path=%2F
FSCK started by root (auth:SIMPLE) from /10.42.5.26 for path / at Wed Mar 25 12:35:39 CST 2020
..............................................................................
/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar: CORRUPT blockpool BP-604784226-10.42.1.102-1577681916881 block blk_1073857294
/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar: MISSING 1 blocks of total size 32441258 B..
/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-hcatalog-core-3.0.0.jar: CORRUPT blockpool BP-604784226-10.42.1.102-1577681916881 block blk_1073857295
/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-hcatalog-core-3.0.0.jar: MISSING 1 blocks of total size 269009 B......................
...
原因分析:
数据是按blk_1073857294、blk_1073857295方式存储在hdfs上的,删除了blk_1073857294、blk_1073857295后,元数据还在,但是数据块不在了,才报的这个错,但是这部分数据其实我不需要了,所以就直接把出异常的文件块的元数据信息也删除就可以了。