使用压缩软件打开一个ZIP文件的时候,可以看到这个ZIP文件里面的文件信息,如下是使用7z压缩软件打开一个xlsm Excel文件:
从中主要可以看到文件的名称,文件压缩后的大小以及文件压缩前的大小。
其实这个时候,7z压缩软件并没有真正解压这个ZIP文件,仅仅是读取了它的信息,而这些信息的读取就是解析ZIP文件结构的过程。
在前面介绍ZIP压缩过程的时候,主要是讲了压缩软件如何将原始的文件进行压缩,然后保存压缩信息,保存压缩信息之前,压缩软件还会在压缩信息的前面保存一些文件的信息,主要结构如下:
文件1的LocalFileHeader |
---|
文件1的压缩信息 |
文件1的data descriptor |
……………… |
文件N的LocalFileHeader |
文件N的压缩信息 |
文件N的data descriptor |
文件1的CentralDirectoryHeader………………文件N的CentralDirectoryHeader |
EndOfCentralDirectory |
主要结构是这样的,和实际的可能还有差异。
解析这些结构主要就是在ZIP中指定的位置,读取相应数量的Byte数据,判断Signature标志位没有错误就可以。
创建类模块,命名:CPKZip,完成解析函数Parse:
01
EndOfCentralDirectory
结构信息:
Private Type EndOfCentralDirectory
Signature As Long '核心目录结束标记 0x06054b50
NumberOfThisDisk As Integer '当前磁盘编号
DiskDirectoryStarts As Integer '第一条Central Directory起始位置所在的磁盘编号
NumberOfCDRecordsOnThisDisk As Integer '当前磁盘上的Central Directory数量
TotalNumberOfCDRecords As Integer 'Zip文件中全部Central Directory的总数量
SizeOfCD As Long '全部Central Directory的合计字节长度
OffsetOfCD As Long '第一条Central directory的起始位置在zip文件中的位置
CommentLength As Integer '注释长度
' Comment() as Byte '注释内容
End Type
解析ZIP文件结构一般都是从EndOfCentralDirectory入手,因为它是在文件的最后,虽然位置因为Comment的长度而不确定,但是从后面往前找0x06054b50这个标志,正常很快就能够找到:
'解析EndOfCentralDirectory
Private Function parseEOCD() As String
'查找EndOfCentralDirectory的Signature标识
posEOCD = Len(tEOCD)
Do Until tEOCD.Signature = &H6054B50
cf.SeekFile posEOCD, SeekPos.EndF
tEOCD.Signature = cf.ReadLong()
posEOCD = posEOCD - 1
If posEOCD < 0 Then
parseEOCD = ErrFormat
Exit Function
End If
Loop
'读取EndOfCentralDirectory信息
tEOCD.NumberOfThisDisk = cf.ReadInteger()
tEOCD.DiskDirectoryStarts = cf.ReadInteger()
tEOCD.NumberOfCDRecordsOnThisDisk = cf.ReadInteger()
tEOCD.TotalNumberOfCDRecords = cf.ReadInteger()
tEOCD.SizeOfCD = cf.ReadLong()
tEOCD.OffsetOfCD = cf.ReadLong()
tEOCD.CommentLength = cf.ReadInteger()
ReDim LFHs(tEOCD.TotalNumberOfCDRecords - 1) As LocalFileHeader
ReDim CDHs(tEOCD.TotalNumberOfCDRecords - 1) As CentralDirectoryHeader
ReDim FileArr(tEOCD.TotalNumberOfCDRecords - 1) As String
End Function
这个结构主要给我们提供的就是TotalNumberOfCDRecords(Zip文件中全部Central Directory的总数量)以及OffsetOfCD(第一条Central directory的起始位置在zip文件中的位置)。
02
CentralDirectoryHeader
结构信息:
Private Type CentralDirectoryHeader
Signature As Long 'HEX 50 4B 01 02
VersionMadeBy As Integer
VersionNeeded As Integer
GeneralBitFlag As Integer
CompressionMethod As Integer
LastModifyTime As Integer
LastModifyDate As Integer
CRC32 As Long
CompressedSize As Long
UnZipSize As Long
FileNameLength As Integer '文件名长度(n)
ExtraFieldLength As Integer '附加信息长度 (m)
FileCommentLength As Integer '文件附注长度 (k)
StartDiskNumber As Integer '文件起始位置的磁盘编号【3】
InteralFileAttrib As Integer '内部文件属性
ExternalFileAttrib As Long '外部文件属性
LocalFileHeaderOffset As Long '对应的Local File Header在文件中的起始位置。
FileName As String '文件名
ExtraField As String '附加信息
Comment As String '文件附注
End Type
有了EndOfCentralDirectory提供的TotalNumberOfCDRecords(Zip文件中全部Central Directory的总数量)以及OffsetOfCD(第一条Central directory的起始位置在zip文件中的位置),就能够正确的读取所有文件的CentralDirectoryHeader信息:
Private Function parseCDH() As String
Dim i As Long
Dim b() As Byte
cf.SeekFile tEOCD.OffsetOfCD, SeekPos.OriginF
For i = 0 To tEOCD.TotalNumberOfCDRecords - 1
CDHs(i).Signature = cf.ReadLong()
If CDHs(i).Signature <> &H2014B50 Then
parseCDH = "parseCDH ERR " & ErrFormat
Exit Function
End If
CDHs(i).VersionMadeBy = cf.ReadInteger()
CDHs(i).VersionNeeded = cf.ReadInteger()
CDHs(i).GeneralBitFlag = cf.ReadInteger()
CDHs(i).CompressionMethod = cf.ReadInteger()
CDHs(i).LastModifyTime = cf.ReadInteger()
CDHs(i).LastModifyDate = cf.ReadInteger()
CDHs(i).CRC32 = cf.ReadLong()
CDHs(i).CompressedSize = cf.ReadLong()
CDHs(i).UnZipSize = cf.ReadLong()
CDHs(i).FileNameLength = cf.ReadInteger()
CDHs(i).ExtraFieldLength = cf.ReadInteger()
CDHs(i).FileCommentLength = cf.ReadInteger()
CDHs(i).StartDiskNumber = cf.ReadInteger()
CDHs(i).InteralFileAttrib = cf.ReadInteger()
CDHs(i).ExternalFileAttrib = cf.ReadLong()
CDHs(i).LocalFileHeaderOffset = cf.ReadLong()
ReDim b(CDHs(i).FileNameLength - 1) As Byte
cf.Read b
CDHs(i).FileName = VBA.StrConv(b, vbUnicode)
If CDHs(i).ExtraFieldLength Then
ReDim b(CDHs(i).ExtraFieldLength - 1) As Byte
cf.Read b
CDHs(i).ExtraField = VBA.StrConv(b, vbUnicode)
End If
If CDHs(i).FileCommentLength Then
ReDim b(CDHs(i).FileCommentLength - 1) As Byte
cf.Read b
CDHs(i).Comment = VBA.StrConv(b, vbUnicode)
End If
Next
End Function
每一个CentralDirectoryHeader结构信息里都记录了对应的LocalFile Header在文件中的起始位置。
03
LocalFileHeader
结构信息:
Private Type LocalFileHeader
Signature As Long '文件头标识 0x04034b50
VersionExtract As Integer '解压文件所需最低版本
GeneralBit As Integer '通用位标记
CompressionMethod As Integer '压缩方法
FileModiTime As Integer '文件最后修改时间
FileModiDate As Integer '文件最后修改日期
CRC_32 As Long '校验码
CompressedSize As Long '压缩后的大小
UnZipSize As Long '压缩前的大小
FileNameLength As Integer '文件名长度 (n)
ExtraFieldLength As Integer '附加信息长度 (m)
FileName As String '文件名
ExtraField As String '扩展区
End Type
使用每一个CentralDirectoryHeader结构信息里记录的对应LocalFile Header在文件中的起始位置进行解析:
Private Function parseLFH() As String
Dim i As Long
Dim ret As String
For i = 0 To tEOCD.TotalNumberOfCDRecords - 1
cf.SeekFile CDHs(i).LocalFileHeaderOffset, SeekPos.OriginF
ret = readLFH(LFHs(i))
If VBA.Len(ret) Then
parseLFH = ret
Exit Function
End If
'记录文件名对应的下标到Hash
dicFileName.Add LFHs(i).FileName, i
FileArr(i) = LFHs(i).FileName
Next
End Function
Private Function readLFH(lfh As LocalFileHeader) As String
lfh.Signature = cf.ReadLong()
If lfh.Signature <> &H4034B50 Then
readLFH = "parseLFH ERR " & ErrFormat
Exit Function
End If
lfh.VersionExtract = cf.ReadInteger()
lfh.GeneralBit = cf.ReadInteger()
lfh.CompressionMethod = cf.ReadInteger()
lfh.FileModiTime = cf.ReadInteger()
lfh.FileModiDate = cf.ReadInteger()
lfh.CRC_32 = cf.ReadLong()
lfh.CompressedSize = cf.ReadLong()
lfh.UnZipSize = cf.ReadLong()
lfh.FileNameLength = cf.ReadInteger()
lfh.ExtraFieldLength = cf.ReadInteger()
ReDim b(lfh.FileNameLength - 1) As Byte
cf.Read b
lfh.FileName = VBA.StrConv(b, vbUnicode)
If lfh.ExtraFieldLength Then
ReDim b(lfh.ExtraFieldLength - 1) As Byte
cf.Read b
lfh.ExtraField = b ' VBA.StrConv(b, vbUnicode)
End If
End Function
04
Parse函数
最后Parse函数调用以上几个结构的解析函数即可:
'解析zip文件,获取zip的压缩文件信息
'FileName ZIP文件完整路径
'Return 返回出错信息
Function Parse(FileName As String) As String
If VBA.Dir(FileName) = "" Then
Parse = "不存在的文件。"
Exit Function
End If
fn = FileName
Set cf = NewCFile()
cf.OpenFile fn, O_RDONLY
Dim Signature As Long
'读取前4个字节,判断是否是zip文件
Signature = cf.ReadLong()
If Signature <> &H4034B50 Then
Parse = ErrFormat
Exit Function
End If
Dim ret As String
'解析EndOfCentralDirectory
'主要获取文件总数TotalNumberOfCDRecords、OffsetOfCD(第一条Central directory的起始位置在zip文件中的位置)
ret = parseEOCD()
If VBA.Len(ret) Then
Parse = ret
Exit Function
End If
'初始化Hash记录文件名称
Set dicFileName = NewCHash(VBA.CLng(tEOCD.TotalNumberOfCDRecords))
'根据上面的OffsetOfCD,解析CentralDirectoryHeader
ret = parseCDH()
If VBA.Len(ret) Then
Parse = ret
Exit Function
End If
'根据CentralDirectoryHeader的LocalFileHeaderOffset
'从指定位置读取LocalFileHeader的信息
ret = parseLFH()
If VBA.Len(ret) Then
Parse = ret
Exit Function
End If
End Function