image
保存 文件树 保存 file->block (file == idnode) 不保存 block->location
File->block1 block2 (128M)block存储在不同datenode保存副本
(1) secondaryNamenode: 其实起初我对SN的理解也和大部分人相同,认为SN是NN(nameNode)的一个实时热备份实现HA,并且在一次笔试的过程中还写错了,尴尬,后台经过查看相关的书籍发现其实并不是这么一回事,SN主要是完成的Edits和fsImage的合并工作,以减少NN的工作压力。对于错误的理解其实并不是没有道理,现在版本的hadoop支持进行HA的实时备份。后面的章节后说。 (2) fsImage和edits: 虽然说不希望在这讲一些概念,但是该说的还是得说。简单的解释: fsimage包含Hadoop文件系统中的所有目录和文件idnode的序列化信息,其中对于文件包含了文件的修改时间、访问时间、块大小和组成一个文件块信息等。对于文件夹而言包含的信息主要有修改时间、访问控制权限等信息。而Edits文件主要是进行客户端对文件操作的记录,比如上传新文件等。并且edits文件会定期与fsimage文件进行合并操作。 写操作 先问namenode,我要写文件,先向namenode询问我要写128M,你分配给我3个节点。写文件用流水线方式,64K分片,边写边确认。
目录位置: /user/用户名/.Trash 配置回收站时间: core-site.xml fs.trash.interval 1440 24*60分钟
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>1440</value>
</property>
.Trash/Current --> .Trash/1799998 -->删除
hadoop fs -rm -skiptrash
Configured Capacity: 57585672192 (53.63 GB)
Present Capacity: 26695249920 (24.86 GB)
DFS Remaining: 26663706624 (24.83 GB)
DFS Used: 31543296 (30.08 MB)
DFS Used%: 0.12%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
Name: 172.18.0.83:50010 (node2.hadoopnet)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 19195224064 (17.88 GB)
DFS Used: 10506240 (10.02 MB)
Non DFS Used: 10296815616 (9.59 GB)
DFS Remaining: 8887902208 (8.28 GB)
DFS Used%: 0.05%
DFS Remaining%: 46.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Nov 07 06:01:34 UTC 2018
Name: 172.18.0.82:50010 (node1.hadoopnet)
Hostname: node1
Decommission Status : Normal
Configured Capacity: 19195224064 (17.88 GB)
DFS Used: 10514432 (10.03 MB)
Non DFS Used: 10296807424 (9.59 GB)
DFS Remaining: 8887902208 (8.28 GB)
DFS Used%: 0.05%
DFS Remaining%: 46.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Nov 07 06:01:33 UTC 2018
Name: 172.18.0.81:50010 (master)
Hostname: master
Decommission Status : Normal
Configured Capacity: 19195224064 (17.88 GB)
DFS Used: 10522624 (10.04 MB)
Non DFS Used: 10296799232 (9.59 GB)
DFS Remaining: 8887902208 (8.28 GB)
DFS Used%: 0.05%
DFS Remaining%: 46.30%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Nov 07 06:01:34 UTC 2018
root@master:/opt/hadoop-2.7.3# bin/hdfs fsck /jianshu/README.txt -files -blocks -locations
Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fjianshu%2FREADME.txt
FSCK started by root (auth:SIMPLE) from /172.18.0.81 for path /jianshu/README.txt at Tue Nov 06 14:56:22 UTC 2018
/jianshu/README.txt 1366 bytes, 1 block(s): OK
0. BP-2032280425-172.18.0.81-1541342618850:blk_1073741937_1113 len=1366 repl=3 [DatanodeInfoWithStorage[172.18.0.81:50010,DS-c75460d7-8bf9-486b-bec6-09da001483f1,DISK], DatanodeInfoWithStorage[172.18.0.82:50010,DS-8e79637d-5f3e-4af3-91f2-7bff58dad142,DISK], DatanodeInfoWithStorage[172.18.0.83:50010,DS-93524e7f-0fc4-4b87-ba8b-6c9a294b47ed,DISK]]
Status: HEALTHY
Total size: 1366 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 1366 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Tue Nov 06 14:56:22 UTC 2018 in 0 milliseconds
节点地址添加到namenode上的slaves(实验测试不需要) namenode hdfs dfsadmin -refreshNodes 动态节点减少 etc/hadoop/dfs.exclude 添加节点名称 编辑hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>etc/hadoop/dfs.exclude</value>
</property>
hdfs dfsadmin -refreshNodes 从Web ui查看,节点状态是Decommission in progress。当状态改为Decommissioned时,结束。之后主机可以做下架处理。
HDFS集群磁盘利用率不平衡。
image