前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >从磁盘存储维度观测ext4文件系统

从磁盘存储维度观测ext4文件系统

作者头像
用户4700054
发布2023-02-26 14:41:25
1.2K0
发布2023-02-26 14:41:25
举报
文章被收录于专栏:存储内核技术交流

Basic Concept

block

  • 文件系统把每个磁盘分割为多个block group.每个block group有被分割为多个block.每个block是文件在磁盘上连续空间。当使用默认文件系统参数时候,每个block大小是4k.当touch文件不写入任何数据时候,文件是不会被分配任何的block,当写入一定数据时候会以block为基本单位进行分配空间给文件。
代码语言:javascript
复制
// 创建空文件
$ touch /mnt/ext4/a.txt
// Blocks是磁盘扇区的大小,IO Block是操作系统每次IO的大小(4K),明显占用扇区的大小为0
$ stat /mnt/ext4/a.txt 
  File: /mnt/ext4/a.txt
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 840h/2112d      Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-10-03 01:54:44.733048293 +0000
Modify: 2022-10-03 01:54:44.733048293 +0000
Change: 2022-10-03 01:54:44.733048293 +0000
 Birth: 2022-10-03 01:54:44.733048293 +0000

// 尝试写入数据写入到文件
$ echo -n "hello" > /mnt/ext4/a.txt 
// 这里看到ext4文件系统分配了8个扇区,每个扇区大小是512个字节,所以8个扇区大小就是IO Block大小
$ stat /mnt/ext4/a.txt 
  File: /mnt/ext4/a.txt
  Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 840h/2112d      Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-10-03 01:54:44.733048293 +0000
Modify: 2022-10-03 01:55:13.578830808 +0000
Change: 2022-10-03 01:55:13.578830808 +0000
 Birth: 2022-10-03 01:54:44.733048293 +0000

// 查看文件block信息,这里a.txt文件没有分配任何的extent.因为大小就几个字节,用一个直接数据块存储即可。
$ filefrag -v /mnt/ext4/a.txt 
Filesystem type is: ef53
File size of /mnt/ext4/a.txt is 5 (1 block of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:    1015808..   1015808:      1:             last,eof
/mnt/ext4/a.txt: 1 extent found

block group

  • block group是一组连续的block的集合。默认的情况下8个 block组成一个block group
代码语言:javascript
复制
$ dumpe2fs /dev/sde |egrep 'Blocks per group|Block size'
dumpe2fs 1.46.5 (30-Dec-2021)
Block size:               4096
// 每个block group大小,这里是32K,Block 大小是4K,Block Group包含了8个Block
Blocks per group:         32768

inode

inode是文件系统中每个文件的唯一标识,映射IO Block到磁盘扇区的对应关系。inode一般存储了文件的acess/modify/create的时间、访问权限、以及最重要的这个文件包含了哪些Blocks.这里需要注意的是ext4系统中当删除文件的时候,这个文件的inode是可以被回收然后被新文件重用。

代码语言:javascript
复制
$ stat /mnt/ext4/a.txt 
  File: /mnt/ext4/a.txt
  Size: 5               Blocks: 8          IO Block: 4096   regular file
Device: 840h/2112d      Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-10-03 01:54:44.733048293 +0000
Modify: 2022-10-03 01:55:13.578830808 +0000
Change: 2022-10-03 01:55:13.578830808 +0000
 Birth: 2022-10-03 01:54:44.733048293 +0000
$ rm -rf /mnt/ext4/a.txt 
$ touch /mnt/ext4/b.ext
$ stat /mnt/ext4/b.ext 
  File: /mnt/ext4/b.ext
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 840h/2112d      Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-10-03 02:23:49.608093750 +0000
Modify: 2022-10-03 02:23:49.608093750 +0000
Change: 2022-10-03 02:23:49.608093750 +0000
 Birth: 2022-10-03 02:23:49.608093750 +0000

superblock

  • superblock是存储了ext4文件系统的元数据信息,一旦superblock损坏整个文件系统是无法访问的。因此superblock会有多个备份。ext4文件系统采用了sparse_super方式备份superblock.
代码语言:javascript
复制
$ mkfs.ext4 /dev/sde
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: a670d8f3-a153-43d3-a8c7-e4d67ecb2c63
// 超级块的备份,这里以32K为基本基线,分别是32k 1倍、3倍、5倍、7倍位置的block备份超级块
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 
  • 转储ex4文件系统的sparse_super.INODE_UNINIT是inode bitmap和inode table没有被初始化。BLOCK_UNINIT是block bitmap没有被初始化。
代码语言:javascript
复制
$ dumpe2fs /dev/sdf |grep superblock -B1
dumpe2fs 1.46.5 (30-Dec-2021)
Group 0: (Blocks 0-32767) csum 0xceed [ITABLE_ZEROED]
  Primary superblock at 0, Group descriptors at 1-1
--
Group 1: (Blocks 32768-65535) csum 0x35e5 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 32768, Group descriptors at 32769-32769
--
Group 3: (Blocks 98304-131071) csum 0x7ae0 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 98304, Group descriptors at 98305-98305
--
Group 5: (Blocks 163840-196607) csum 0xa0ab [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 163840, Group descriptors at 163841-163841
--
Group 7: (Blocks 229376-262143) csum 0x921b [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 229376, Group descriptors at 229377-229377
--
Group 9: (Blocks 294912-327679) csum 0x6988 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 294912, Group descriptors at 294913-294913
--
Group 25: (Blocks 819200-851967) csum 0x220e [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 819200, Group descriptors at 819201-819201
--
Group 27: (Blocks 884736-917503) csum 0x68a9 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
  Backup superblock at 884736, Group descriptors at 884737-884737
file on disk

extent

  • ext4文件系统为了保证读写大文件的效率,采用了extentB-Tree数据结构。这样可以减少大文件的元数据存储量同时也能提供数据块的检索和更新的效率。
代码语言:javascript
复制
/********************ext4文件系统**************************/
// 创建一个180M文件file1
[root@ubuntu /mnt/ext4]$ ls -lih
total 180M
12 -rw-r--r-- 1 root root    0 Oct  3 02:23 b.ext
13 -rw-r--r-- 1 root root 180M Oct  3 03:02 file1
11 drwx------ 2 root root  16K Oct  3 01:47 lost+found

// 查看ext4下file1的extent的分配情况,这里180M文件分配了2个extent
[root@ubuntu /mnt/ext4]$ debugfs -R 'stat <13>' /dev/sde
debugfs 1.46.5 (30-Dec-2021)
Inode: 13   Type: regular    Mode:  0644   Flags: 0x80000
Generation: 3968155522    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 188477052
File ACL: 0
Links: 1   Blockcount: 368120
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x633a50bc:8166c4c0 -- Mon Oct  3 03:02:20 2022
 atime: 0x633a50bc:056bd75c -- Mon Oct  3 03:02:20 2022
 mtime: 0x633a50bc:8166c4c0 -- Mon Oct  3 03:02:20 2022
crtime: 0x633a50bc:056bd75c -- Mon Oct  3 03:02:20 2022
Size of extra inode fields: 32
Inode checksum: 0xad1ad685
EXTENTS:
(0-32767):1015808-1048575, (32768-46014):983040-996286


/********************ext3文件系统**************************/
[root@ubuntu /mnt/ext3]$ ls -lih
total 180M
12 -rw-r--r-- 1 root root 180M Oct  3 03:02 file1
11 drwx------ 2 root root  16K Oct  3 03:01 lost+found
[root@ubuntu /mnt/ext3]$ debugfs -R 'stat <12>' /dev/sdf
debugfs 1.46.5 (30-Dec-2021)
Inode: 12   Type: regular    Mode:  0644   Flags: 0x0
Generation: 2196076409    Version: 0x00000000:00000001
User:     0   Group:     0   Project:     0   Size: 188477052
File ACL: 0
Links: 1   Blockcount: 368488
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x633a50bf:7d99e208 -- Mon Oct  3 03:02:23 2022
 atime: 0x633a50be:ec398a54 -- Mon Oct  3 03:02:22 2022
 mtime: 0x633a50bf:7d99e208 -- Mon Oct  3 03:02:23 2022
crtime: 0x633a50be:ec398a54 -- Mon Oct  3 03:02:22 2022
Size of extra inode fields: 32
BLOCKS:
(0-11):17408-17419, (IND):17178, (12-15):17420-17423, (16-63):17200-17247, (64-127):17280-17343, (128-255):17920-18047, (256-1023):18176-18943, (1024-1035):19456-19467, (DIND):17179, (IND):17180, (1036-2059):19468-20491, (IND):17181, (2060-3083):20492-21515, (IND):17182, (3084-4107):21516-22539, (IND):17183, (4108-5131):22540-23563, (IND):17344, (5132-6155):23564-24587, (IND):17345, (6156-7179):24588-25611, (IND):17346, (7180-8203):25612-26635, (IND):17347, (8204-9227):26636-27659, (IND):17348, (9228-10251):27660-28683, (IND):17349, (10252-11275):28684-29707, (IND):17350, (11276-12299):29708-30731, (IND):17351, (12300-13323):30732-31755, (IND):17352, (13324-14335):31756-32767, (14336-14347):34816-34827, (IND):17353, (14348-15371):34828-35851, (IND):17354, (15372-16395):35852-36875, (IND):17355, (16396-17419):36876-37899, (IND):17356, (17420-18443):37900-38923, (IND):17357, (18444-19467):38924-39947, (IND):17358, (19468-20491):39948-40971, (IND):17359, (20492-21515):40972-41995, (IND):17360, (21516-22539):41996-43019, (IND):17361, (22540-23563):43020-44043, (IND):17362, (23564-24587):44044-45067, (IND):17363, (24588-25611):45068-46091, (IND):17364, (25612-26635):46092-47115, (IND):17365, (26636-27659):47116-48139, (IND):17366, (27660-28683):48140-49163, (IND):17367, (28684-29707):49164-50187, (IND):17368, (29708-30731):50188-51211, (IND):17369, (30732-31755):51212-52235, (IND):17370, (31756-32779):52236-53259, (IND):17371, (32780-33803):53260-54283, (IND):17372, (33804-34827):54284-55307, (IND):17373, (34828-35851):55308-56331, (IND):17374, (35852-36875):56332-57355, (IND):17375, (36876-37899):57356-58379, (IND):17376, (37900-38923):58380-59403, (IND):17377, (38924-39947):59404-60427, (IND):17378, (39948-40971):60428-61451, (IND):17379, (40972-41995):61452-62475, (IND):17380, (41996-43019):62476-63499, (IND):17381, (43020-44043):63500-64523, (IND):17382, (44044-45055):64524-65535, (45056-45067):67584-67595, (IND):17383, (45068-46014):67596-68542
TOTAL: 46061
  • file1这个文件在ext4文件系统仅仅使用了2个extent.180M大小的文件大大减少了元数据的存储量,这个和ext3完全不一样
代码语言:javascript
复制
[root@ubuntu /mnt/ext4]$ ls -l -ihl
total 180M
12 -rw-r--r-- 1 root root    0 Oct  3 02:23 b.ext
13 -rw-r--r-- 1 root root 180M Oct  3 03:02 file1
11 drwx------ 2 root root  16K Oct  3 01:47 lost+found
[root@ubuntu /mnt/ext4]$ debugfs -R 'extents file1' /dev/sdf
debugfs 1.46.5 (30-Dec-2021)
Level Entries       Logical          Physical Length Flags
 0/ 0   1/  2     0 - 32767 1015808 - 1048575  32768 
 0/ 0   2/  2 32768 - 46014  983040 -  996286  13247 

flex_bg

  • flexible block group(flex_bg)是一组连续的block group的集合。每个flex_bg中的第一个block group存储当前flex_bg的bitmap、inode table.
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-10-03,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 存储内核技术交流 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Basic Concept
  • file on disk
相关产品与服务
对象存储
对象存储(Cloud Object Storage,COS)是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限,无需分区管理,适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档