前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[MYSQL] mysql undo文件解析(2)

[MYSQL] mysql undo文件解析(2)

原创
作者头像
大大刺猬
发布2024-08-08 17:28:01
1430
发布2024-08-08 17:28:01
举报
文章被收录于专栏:大大刺猬

导读

上一章讲了基础的undo文件结构.我们知道了undo文件和ibd文件一样. 只不过把index_page换成了 21(FIL_PAGE_TYPE_RSEG_ARRAY) 6(FIL_PAGE_TYPE_SYS) 2(FIL_PAGE_UNDO_LOG) , 所有本章主要讲这仨儿.

undo_001

这仨页其实就是 对应的rollback segment,undo log segment,undo log. 名字比较绕. undo_001就是undo tablespace. 是不是更绕了

undo log segment is collection of undo logs. Undo log segments exists within rollback segments.An undo log segment might contain undo logs from multiple transactions. An undo log segment can only be used by one transaction at a time but can be reused after it is released at transaction commit or rollback.

An undo tablespace contains undo logs. Undo logs exist within undo log segments, which are contained within rollback segments

每个undo文件有128个rollback segment(由FIL_PAGE_TYPE_RSEG_ARRAY页记录), 每个rollback segement有1024个undo segement(由FIL_PAGE_TYPE_SYS页记录), 每个undo segement由若干个undo log构成(由FIL_PAGE_UNDO_LOG构成). 每个undo segement某一时刻只能由1个事务所有, 所以1个undo tablespace最大事务数量就为: 1*128*1024=131072

FIL_PAGE_TYPE_RSEG_ARRAY

先来看看存rollback segment的页.

代码语言:c++
复制
/** The RSEG ARRAY base version is a number derived from the string
'RSEG' [0x 52 53 45 47] for extra validation. Each new version
increments the base version by 1. */
constexpr uint32_t RSEG_ARRAY_VERSION = 0x52534547 + 1;

/** The RSEG ARRAY version offset in the header. */
constexpr uint32_t RSEG_ARRAY_VERSION_OFFSET = 0;

/** The current number of rollback segments being tracked in this array */
constexpr uint32_t RSEG_ARRAY_SIZE_OFFSET = 4;

/** This is the pointer to the file segment inode that tracks this
rseg array page. */
constexpr uint32_t RSEG_ARRAY_FSEG_HEADER_OFFSET = 8;

/* Slot size of the array of rollback segment header page numbers */
constexpr uint32_t RSEG_ARRAY_SLOT_SIZE = 4;

Each undo tablespace and the global temporary tablespace individually support a maximum of 128 rollback segments. The innodb_rollback_segments variable defines the number of rollback segments.

变量sysvar_innodb_rollback_segments可以定义回滚段数量

来个图看起来方便点

我们主要看RSEG_ARRAT_SLOT, 这里的是槽的PAGE_ID. 可使用如下代码快速得到回滚段的PAGE ID信息

代码语言:python
代码运行次数:0
复制
import struct
filename = '/data/mysql_3314/mysqldata/undo_001'
with open(filename,'rb') as f:
	status = f.seek(3*16384,0)
	data = f.read(16384)
	slot_list = struct.unpack('>128L',data[56:56+4*128])
	for x in range(128):
		print(slot_list[x], end='\n' if (x+1)%16==0 else '\t' )

结合上一章的内容, 我们知道这些page id都是对应的FIL_PAGE_TYPE_SYS (回滚段)

page_id=88是inode信息.

FIL_PAGE_TYPE_SYS

这种页是rollback segment, 用来记录的undo segment信息的.

代码语言:c++
复制
/* Transaction rollback segment header */
/*-------------------------------------------------------------*/
/** Maximum allowed size for rollback segment in pages */
constexpr uint32_t TRX_RSEG_MAX_SIZE = 0;
/** Number of file pages occupied by the logs in the history list */
constexpr uint32_t TRX_RSEG_HISTORY_SIZE = 4;
/* The update undo logs for committed transactions */
constexpr uint32_t TRX_RSEG_HISTORY = 8;
/* Header for the file segment where this page is placed */
constexpr uint32_t TRX_RSEG_FSEG_HEADER = 8 + FLST_BASE_NODE_SIZE;

SLOT的数量定义如下:

代码语言:c++
复制
/** Number of undo log slots in a rollback segment file copy */
#define TRX_RSEG_N_SLOTS (UNIV_PAGE_SIZE / 16)

/** Maximum number of transactions supported by a single rollback segment */
#define TRX_RSEG_MAX_N_TRXS (TRX_RSEG_N_SLOTS / 2)

我们可以使用如下python代码解析某个FIL_PAGE_TYPE_SYS 页

代码语言:python
代码运行次数:0
复制
import struct
filename = '/data/mysql_3314/mysqldata/undo_001'
with open(filename,'rb') as f:
	status = f.seek(4*16384,0) # 仅解析第4页
	data = f.read(16384)
	slot_list = struct.unpack('>1024L',data[72:72+4*1024])
	for x in range(1024):
		print(slot_list[x], end='\n' if (x+1)%16==0 else '\t' )

我们可以看到很多undo segment都是空的. 毕竟我本地的环境没啥业务. 而且这只是其中一个rollback segment.

FIL_PAGE_UNDO_LOG

undo log page就是存储具体的回滚数据的. 第1页(rollback segment记录的)比较特殊, 会存储一些额外信息(undo segment header)

好家伙! 不算fil_header(图片中的TRX_UNDO_PAGE_HDR) 都有3种header了.

我们来分别简单看一下吧

undo log page header

这是undo log page都有的,用来记录基础信息(undo page的类型之类的).

代码语言:c++
复制
/** The offset of the undo log page header on pages of the undo log */
constexpr uint32_t TRX_UNDO_PAGE_HDR = FSEG_PAGE_DATA;
/*-------------------------------------------------------------*/
/** Transaction undo log page header offsets */
/** @{ */
/** TRX_UNDO_INSERT or TRX_UNDO_UPDATE */
constexpr uint32_t TRX_UNDO_PAGE_TYPE = 0;
/** Byte offset where the undo log records for the LATEST transaction start on
 this page (remember that in an update undo log, the first page can contain
 several undo logs) */
constexpr uint32_t TRX_UNDO_PAGE_START = 2;
/** On each page of the undo log this field contains the byte offset of the
 first free byte on the page */
constexpr uint32_t TRX_UNDO_PAGE_FREE = 4;
/** The file list node in the chain of undo log pages */
constexpr uint32_t TRX_UNDO_PAGE_NODE = 6;
/*-------------------------------------------------------------*/
/** Size of the transaction undo log page header, in bytes */
constexpr uint32_t TRX_UNDO_PAGE_HDR_SIZE = 6 + FLST_NODE_SIZE;

这里的page start是指最新事务(这又涉及到undo复用问题了..)中的起始undo log位置(offset). page_free是第一个可用的undo log位置, 通常看作是undo log的结束位置.

上面提到的FLST_NODE_SIZE这种结构是由 PRE_PAGE_ID(4),PRE_OFFSET(2), NEXT_PAGE_ID(4), NEXT_OFFSET(2)这种结构. 之前讲ibd2sql的时候说过. FLST_BASE_NODE_SIZE的话, 就是在前面多了4字节的SPACE_ID.

undo log segment header

这是只有第一页才有的.

代码语言:c++
复制
/** The offset of the undo log segment header on the first page of the undo
log segment */
/** TRX_UNDO_ACTIVE, ... */
constexpr uint32_t TRX_UNDO_STATE = 0;
/** Offset of the last undo log header on the segment header page, 0 if none */
constexpr uint32_t TRX_UNDO_LAST_LOG = 2;
/** Header for the file segment which the undo log segment occupies */
constexpr uint32_t TRX_UNDO_FSEG_HEADER = 4;
/** Base node for the list of pages in the undo log segment; defined only on the
 undo log segment's first page */
constexpr uint32_t TRX_UNDO_PAGE_LIST = 4 + FSEG_HEADER_SIZE;

和事务相关的,比如:

undo log header

这个才是真正的undo log头信息.也是一些事务信息.我这里就不去细看了 (storage/innobase/include/trx0undo.h)

我们本次解析undo文件是按照结构来的, 而不是按照事务来的.

undo log

讲了那么多,实际上还没有到Undo log -_-

undo log分为2大类: insert(TRX_UNDO_INSERT_REC)和update

TRX_UNDO_INSERT_REC

insert比较简单, 只需要记录主键即可(没得主键就是row_id). 结构如下:

(叉会儿腰.jpg)

但是各值的计算就比较复杂了...

undo_log_type_flag 0-3这4字节表示类型, 第4,5字节是flag, 第6字节表示是否有lob,第7字节有额外存储

undo no和table id的存储就更复杂了.

代码语言:c++
复制
static inline uint32_t mach_read_next_compressed(const byte **b) {
  ulint val = mach_read_from_1(*b);

  if (val < 0x80) {
    /* 0nnnnnnn (7 bits) */
    ++*b;
  } else if (val < 0xC0) {
    /* 10nnnnnn nnnnnnnn (14 bits) */
    val = mach_read_from_2(*b) & 0x3FFF;
    ut_ad(val > 0x7F);
    *b += 2;
  } else if (val < 0xE0) {
    /* 110nnnnn nnnnnnnn nnnnnnnn (21 bits) */
    val = mach_read_from_3(*b) & 0x1FFFFF;
    ut_ad(val > 0x3FFF);
    *b += 3;
  } else if (val < 0xF0) {
    /* 1110nnnn nnnnnnnn nnnnnnnn nnnnnnnn (28 bits) */
    val = mach_read_from_4(*b) & 0xFFFFFFF;
    ut_ad(val > 0x1FFFFF);
    *b += 4;
  } else if (val < 0xF8) {
    /* 11110000 nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn (32 bits) */
    ut_ad(val == 0xF0);
    val = mach_read_from_4(*b + 1);
    /* this can treat not-extended format also. */
    ut_ad(val > 0xFFFFFFF);
    *b += 5;
  } else if (val < 0xFC) {
    /* 111110nn nnnnnnnn (10 bits) (extended) */
    val = (mach_read_from_2(*b) & 0x3FF) | 0xFFFFFC00;
    *b += 2;
  } else if (val < 0xFE) {
    /* 1111110n nnnnnnnn nnnnnnnn (17 bits) (extended) */
    val = (mach_read_from_3(*b) & 0x1FFFF) | 0xFFFE0000;
    ut_ad(val < 0xFFFFFC00);
    *b += 3;
  } else {
    /* 11111110 nnnnnnnn nnnnnnnn nnnnnnnn (24 bits) (extended) */
    ut_ad(val == 0xFE);
    val = mach_read_from_3(*b + 1) | 0xFF000000;
    ut_ad(val < 0xFFFE0000);
    *b += 4;
  }

  return (static_cast<uint32_t>(val));
}

static inline uint64_t mach_read_next_much_compressed(const byte **b) {
  uint64_t val = mach_read_from_1(*b);

  if (val < 0x80) {
    /* 0nnnnnnn (7 bits) */
    ++*b;
  } else if (val < 0xC0) {
    /* 10nnnnnn nnnnnnnn (14 bits) */
    val = mach_read_from_2(*b) & 0x3FFF;
    ut_ad(val > 0x7F);
    *b += 2;
  } else if (val < 0xE0) {
    /* 110nnnnn nnnnnnnn nnnnnnnn (21 bits) */
    val = mach_read_from_3(*b) & 0x1FFFFF;
    ut_ad(val > 0x3FFF);
    *b += 3;
  } else if (val < 0xF0) {
    /* 1110nnnn nnnnnnnn nnnnnnnn nnnnnnnn (28 bits) */
    val = mach_read_from_4(*b) & 0xFFFFFFF;
    ut_ad(val > 0x1FFFFF);
    *b += 4;
  } else if (val < 0xF8) {
    /* 11110000 nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn (32 bits) */
    ut_ad(val == 0xF0);
    val = mach_read_from_4(*b + 1);
    /* this can treat not-extended format also. */
    ut_ad(val > 0xFFFFFFF);
    *b += 5;
  } else if (val < 0xFC) {
    /* 111110nn nnnnnnnn (10 bits) (extended) */
    val = (mach_read_from_2(*b) & 0x3FF) | 0xFFFFFC00;
    *b += 2;
  } else if (val < 0xFE) {
    /* 1111110n nnnnnnnn nnnnnnnn (17 bits) (extended) */
    val = (mach_read_from_3(*b) & 0x1FFFF) | 0xFFFE0000;
    ut_ad(val < 0xFFFFFC00);
    *b += 3;
  } else if (val == 0xFE) {
    /* 11111110 nnnnnnnn nnnnnnnn nnnnnnnn (24 bits) (extended) */
    ut_ad(val == 0xFE);
    val = mach_read_from_3(*b + 1) | 0xFF000000;
    ut_ad(val < 0xFFFE0000);
    *b += 4;
  } else {
    /* 11111111 followed by up to 80 bits */
    ut_ad(val == 0xFF);
    ++*b;
    val = mach_read_next_compressed(b);
    ut_ad(val > 0);
    val <<= 32;
    val |= mach_read_next_compressed(b);
  }

  return (val);
}

不要紧, 我们不需要了解为啥这么存(基本上都是处于空间考虑), 我们直接用py写一遍就是了.

代码语言:python
代码运行次数:0
复制
def mach_read_next_compressed(tdata):
	val = struct.unpack('>B',tdata.readn(1))[0]
	if val < 0x80:
		val = struct.unpack('>B',tdata.read(1))[0]
	elif val < 0xC0:
		val = struct.unpack('>H',tdata.read(2))[0] & 0x3FFF
	elif val < 0xE0:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		val = ((val2<<8)+val1) & 0x1FFFFF
	elif val < 0xF0:
		val = struct.unpack('>L',tdata.read(4))[0] & 0xFFFFFFF
	elif val < 0xF8:
		tdata.read(1)
		val = struct.unpack('>L',tdata.read(4))[0]
	elif val < 0xFC:
		val = (struct.unpack('>H',tdata.read(2))[0] & 0x3FF) | 0xFFFFFC00
		
	elif val < 0xFE:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		valt = (val2<<8)+val1
		val = (valt & 0x1FFFF) | 0xFFFE0000;
	else:
		val = (((struct.unpack('>L',tdata.read(4))[0])<<8)>>8) | 0xFF000000
	return val

# 花里胡哨的存储
def mach_read_next_much_compressed(tdata):
	val = struct.unpack('>B',tdata.readn(1))[0]
	if val < 0x80:
		#pass
		val = struct.unpack('>B',tdata.read(1))[0]
	elif val < 0xC0:
		val = struct.unpack('>H',tdata.read(2))[0] & 0x3FFF
	elif val < 0xE0:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		val = ((val2<<8)+val1) & 0x1FFFFF
	elif val < 0xF0:
		val = struct.unpack('>L',tdata.read(4))[0] & 0xFFFFFFF
	elif val < 0xF8:
		tdata.read(1)
		val = struct.unpack('>L',tdata.read(4))[0]
	elif val < 0xFC:
		val = (struct.unpack('>H',tdata.read(2))[0] & 0x3FF) | 0xFFFFFC00
		
	elif val < 0xFE:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		valt = (val2<<8)+val1
		val = (valt & 0x1FFFF) | 0xFFFE0000;
	elif val == 0xFE:
		val = (((struct.unpack('>L',tdata.read(4))[0])<<8)>>8) | 0xFF000000
	else:
		tdata.read(1)
		val = mach_read_next_much_compressed(tdata)
		val <<= 32
		val |= mach_read_next_compressed(tdata)
	return val

或许这就是python的魅力吧.

TRX_UNDO_UPD_XXX_REC

update要比insert多一点事务信息.

我们就不去细究了, 太复杂了. 有兴趣的可以去看mysql源码: storage/innobase/trx/trx0rec.cc

验证

还是来使用代码进行愉快的验证吧.

我们显示开启一个事务, 先不提交

然后我们使用ibd2sql的debug功能来获取对应数据行的回滚段信息

然后我们根据这个rollptr去解析实际的undo文件

然后我们再完整的解析undo文件, 并过滤指定的页

这里解析得到是做了update, 并且是TRX_UNDO_ACTIVE的.table_id是19151 我们去数据库里面验证下

查询出来确实是我们更新的那张表.再来看看数据.

解析出来为: (我们不知道元数据信息, 所以实际上是无法解析出来具体数据的. 所以这个工具后续应该不会再更新了. 没啥dio用)

代码语言:txt
复制
\x80\x00\x00\x02  innodb解析多了, 直接就能看出来是2
\x02bb bb
\x02bb bb
\x02bb bb
x05newdd newdd

我们再开一个会话去验证下数据:

也是能对得上的. 到这里了还是会有点成就感的, 但不多.

总结

mysql会自己去使用undo, 即使undo有问题了,重新初始化就行, 所以undo一般不咋重视. 后续也不在对undo进行解析了. 如果存在长时间不提交的事务, 可能会导致undo文件异常大.(之前遇到过1个, 几百GB了). 如下为完整的undo文件结构图.

参考: https://github.com/mysql/mysql-server

https://dev.mysql.com/doc/refman/8.0

附源码

github地址: https://github.com/ddcw/ddcw/blob/master/python/undo_reader

代码语言:python
代码运行次数:0
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# write by ddcw @https://github.com/ddcw
# 解析undo log的

import argparse
import struct
import os,sys

class bdata_buffer(object):
	def __init__(self,bdata):
		self.bdata = bdata
		self.offset = 0
	def read(self,n):
		data = self.bdata[self.offset:self.offset+n]
		self.offset += n
		return data
	def readn(self,n):
		return self.bdata[self.offset:self.offset+n]

def mach_read_next_compressed(tdata):
	val = struct.unpack('>B',tdata.readn(1))[0]
	if val < 0x80:
		val = struct.unpack('>B',tdata.read(1))[0]
	elif val < 0xC0:
		val = struct.unpack('>H',tdata.read(2))[0] & 0x3FFF
	elif val < 0xE0:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		val = ((val2<<8)+val1) & 0x1FFFFF
	elif val < 0xF0:
		val = struct.unpack('>L',tdata.read(4))[0] & 0xFFFFFFF
	elif val < 0xF8:
		tdata.read(1)
		val = struct.unpack('>L',tdata.read(4))[0]
	elif val < 0xFC:
		val = (struct.unpack('>H',tdata.read(2))[0] & 0x3FF) | 0xFFFFFC00
		
	elif val < 0xFE:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		valt = (val2<<8)+val1
		val = (valt & 0x1FFFF) | 0xFFFE0000;
	else:
		val = (((struct.unpack('>L',tdata.read(4))[0])<<8)>>8) | 0xFF000000
	return val

# 花里胡哨的存储
def mach_read_next_much_compressed(tdata):
	val = struct.unpack('>B',tdata.readn(1))[0]
	if val < 0x80:
		#pass
		val = struct.unpack('>B',tdata.read(1))[0]
	elif val < 0xC0:
		val = struct.unpack('>H',tdata.read(2))[0] & 0x3FFF
	elif val < 0xE0:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		val = ((val2<<8)+val1) & 0x1FFFFF
	elif val < 0xF0:
		val = struct.unpack('>L',tdata.read(4))[0] & 0xFFFFFFF
	elif val < 0xF8:
		tdata.read(1)
		val = struct.unpack('>L',tdata.read(4))[0]
	elif val < 0xFC:
		val = (struct.unpack('>H',tdata.read(2))[0] & 0x3FF) | 0xFFFFFC00
		
	elif val < 0xFE:
		val2,val1 = struct.unpack('>HB',tdata.read(3))
		valt = (val2<<8)+val1
		val = (valt & 0x1FFFF) | 0xFFFE0000;
	elif val == 0xFE:
		val = (((struct.unpack('>L',tdata.read(4))[0])<<8)>>8) | 0xFF000000
	else:
		tdata.read(1)
		val = mach_read_next_much_compressed(tdata)
		val <<= 32
		val |= mach_read_next_compressed(tdata)
	return val

UNDO_LOG_SEGMENT_STAT = [
None,
'TRX_UNDO_ACTIVE',
'TRX_UNDO_CACHED',
'TRX_UNDO_TO_FREE',
'TRX_UNDO_TO_PURGE',
'TRX_UNDO_PREPARED_80028',
'TRX_UNDO_PREPARED',
'TRX_UNDO_PREPARED_IN_TC'
]
UNDO_LOG_SEGEMT_TYPE = [
None,
'TRX_UNDO_INSERT',
'TRX_UNDO_UPDATE' # update+delete
]
UNDO_LOG_TYPE = {11:'TRX_UNDO_INSERT_REC',12:'TRX_UNDO_UPD_EXIST_REC',13:'TRX_UNDO_UPD_DEL_REC',14:'TRX_UNDO_DEL_MARK_REC',16:'TRX_UNDO_CMPL_INFO_MULT',64:'TRX_UNDO_MODIFY_BLOB',128:'TRX_UNDO_UPD_EXTERN'}
UNDO_LOG_FLAG = {1:'TRX_UNDO_INSERT_OP',2:'TRX_UNDO_MODIFY_OP'}

def undo_log_parse(bdata,f):
	print(struct.unpack('>B',bdata[2:3]))

def inode_page(bdata):
	return # 懒得打印了, 太多了
	data = bdata_buffer(bdata)
	FIL_PAGE_SPACE_OR_CHKSUM, FIL_PAGE_OFFSET, FIL_PAGE_PREV, FIL_PAGE_NEXT, FIL_PAGE_LSN, FIL_PAGE_TYPE, FIL_PAGE_FILE_FLUSH_LSN, FIL_PAGE_SPACE_ID = struct.unpack('>4LQHQL',data.read(38))
	print(f"[INODE]PAGE TYPE:{FIL_PAGE_TYPE}")
	INODE_PRE_PAGE,INODE_PRE_OFFSET,INODE_NEXT_PAGE,INODE_NEXT_OFFSET  = struct.unpack('>LHLH',data.read(12))
	print(f"[INODE] PRE:{INODE_PRE_PAGE}:{INODE_PRE_OFFSET}  NEXT:{INODE_NEXT_PAGE}:{INODE_NEXT_OFFSET}")
	FLST_BASE_NODE = "LLHLH" # LENGTH,FIRST_PAGE,FIRST_OFFSET,LAST_PAGE,LAST_OFFSET
	for _ in range(85):
		FSEG_ID = struct.unpack('>Q',data.read(8))
		FSEG_NOT_FULL_N_USED = struct.unpack('>L',data.read(4))
		FSEG_FREE     = struct.unpack(f'>{FLST_BASE_NODE}',data.read(16))
		FSEG_NOT_FULL = struct.unpack(f'>{FLST_BASE_NODE}',data.read(16))
		FSEG_FULL     = struct.unpack(f'>{FLST_BASE_NODE}',data.read(16))
		FSEG_MAGIC    = struct.unpack('>L',data.read(4))
		FSEG_FRAG_ARR = struct.unpack('>32L',data.read(32*4))
		print(f"[INODE] FSEG_ID:{FSEG_ID}")
		print(f"[INODE] FSEG_NOT_FULL_N_USED:{FSEG_NOT_FULL_N_USED}")
		print(f"[INODE] FSEG_FREE:{FSEG_FREE}")
		print(f"[INODE] FSEG_NOT_FULL:{FSEG_NOT_FULL}")
		print(f"[INODE] FSEG_FULL:{FSEG_FULL}")
		print(f"[INODE] FSEG_MAGIC:{FSEG_MAGIC} (? 97937874)")
		print(f"[INODE] FSEG_FRAG_ARR:{FSEG_FRAG_ARR}")

def rseg_array_page(bdata):
	#data = bdata_buffer(bdata)
	array_version,array_size = struct.unpack('>2L',bdata[38:46])
	array_fseg_header = struct.unpack('>LLH',bdata[46:56]) # space_id,page_id,page_offset
	array_slot = struct.unpack('>128L',bdata[56:56+128*4])
	#print(array_slot)
	print("[RSEG_ARRAY] array_fseg_header:",array_fseg_header)
	print("[RSEG_ARRAY] array_slot:",array_slot)
	return array_slot

def sys_page(bdata):
	# TRX_RSEG_N_SLOTS = (UNIV_PAGE_SIZE / 16) = 1024
	# TRX_RSEG_SLOT_SIZE = 4
	rseg_max_size,rseg_history_size = struct.unpack('>2L',bdata[38:46])
	rseg_history = struct.unpack('>LLHLH',bdata[46:62])
	space_id,page_id,offset = struct.unpack('>2LH',bdata[62:72])
	undo_slot = struct.unpack('>1024L',bdata[72:72+4*1024])
	print(f"[SYS] rseg_max_size:{rseg_max_size}")
	print(f"[SYS] rseg_history_size:{rseg_history_size}")
	print(f"[SYS] rseg_history:{rseg_history}")
	print(f"[SYS] space_id:page_id {space_id}:{page_id}")
	print(f"[SYS] offset: {offset}")
	print(f"[SYS] undo_slot:{undo_slot}")

def _argparse():
	parser = argparse.ArgumentParser(add_help=True, description='解析mysql的undo文件的脚本 https://github.com/ddcw')
	parser.add_argument('--rollptr', '-r', dest="ROLLPTR", default=-1, type=int,  help='要解析的rollptr')
	parser.add_argument(dest='FILENAME', help='undo filename', nargs='?')
	return parser.parse_args()

if __name__ == '__main__':
	parser = _argparse()
	filename = parser.FILENAME
	if filename is None or not os.path.exists(filename): # 只考虑一个undo文件的情况
		sys.stderr.write(f"\nno file {filename}\n\n")
		sys.exit(1)
	
	if parser.ROLLPTR > -1:
		rolll_ptr = parser.ROLLPTR
		offset = rolll_ptr & 0xFFFF
		page_no = (rolll_ptr>>16) & 0xFFFFFFFF
		rseg_id = (rolll_ptr>>48) & 0x7F
		is_insert = True if rolll_ptr>>55 == 1 else False
		if rseg_id != int(filename.split('_')[-1]):
			sys.stderr.write(f"\nno 这个rollptr({rolll_ptr})不在这个undo里面\n\n")
			sys.exit(1)
		with open(filename,'rb') as f:
			f.seek(page_no*16384,0)
			data = f.read(16384)
			end_offset = struct.unpack('>H',data[offset:offset+2])[0]
			print(f"PAGENO:{page_no}  OFFSET:{offset} --> {end_offset}  rseg_id:{rseg_id}  is_insert:{is_insert}")
			print("DATA:",data[offset+2:end_offset-2])
		sys.exit(0)
	# 完整的解析这个undo文件
	f = open(filename,'rb')
	# FIL_PAGE_TYPE_FSP_HDR  好像没必要解析
	# FIL_PAGE_IBUF_BITMAP 从来都没解析过...
	# FIL_PAGE_INODE 页没必要 -_-
	f.seek(16384*3,0) # 21:FIL_PAGE_TYPE_RSEG_ARRAY
	bdata = f.read(16384)
	array_version,array_size = struct.unpack('>2L',bdata[38:46])
	array_fseg_header = struct.unpack('>LLH',bdata[46:56]) # space_id,page_id,page_offset
	array_slot = struct.unpack('>128L',bdata[56:56+128*4])
	print(f"[ROLLBACK SEGMENT] VERSION     : {hex(array_version)}")
	print(f"[ROLLBACK SEGMENT] SIZE        : {array_size}")
	print(f"[ROLLBACK SEGMENT] SPACE_ID    : {array_fseg_header[0]}")
	print(f"[ROLLBACK SEGMENT] PAGE_ID     : {array_fseg_header[1]}")
	print(f"[ROLLBACK SEGMENT] PAGE_OFFSET : {array_fseg_header[2]}")
	#print(f"[ROLLBACK SEGMENT] SLOT PAGE   : {array_slot}")

	for slot in array_slot:
		f.seek(slot*16384,0)
		bdata = f.read(16384) # 6:FIL_PAGE_TYPE_SYS
		rseg_max_size,rseg_history_size = struct.unpack('>2L',bdata[38:46])
		rseg_history = struct.unpack('>LLHLH',bdata[46:62])
		space_id,page_id,offset = struct.unpack('>2LH',bdata[62:72])
		undo_slot = struct.unpack('>1024L',bdata[72:72+4*1024])
		undo_slot_var = [ ]
		for x in undo_slot:
			if x != 4294967295:
				undo_slot_var.append(x)
		print(f"\t[UNDO SEGMENT] CURRENT PAGE:{slot}")
		print(f"\t[UNDO SEGMENT] MAX_SIZE:{rseg_max_size}")
		print(f"\t[UNDO SEGMENT] HISTORY_SIZE:{rseg_history_size}")
		print(f"\t[UNDO SEGMENT] HISTORY:{rseg_history}")
		print(f"\t[UNDO SEGMENT] SPACE_ID:{space_id}")
		print(f"\t[UNDO SEGMENT] PAGE_ID:{page_id} (inode)")
		print(f"\t[UNDO SEGMENT] PAGE_OFFSET:{offset}")
		#print(f"[UNDO SEGMENT] SLOT PAGE:{undo_slot_var} (去除无效页(4294967295))")
		for undo_page in undo_slot_var:
			f.seek(undo_page*16384,0)
			bdata = f.read(16384) # 2:FIL_PAGE_UNDO_LOG
			# fil header
			FIL_PAGE_SPACE_OR_CHKSUM, FIL_PAGE_OFFSET, FIL_PAGE_PREV, FIL_PAGE_NEXT, FIL_PAGE_LSN, FIL_PAGE_TYPE, FIL_PAGE_FILE_FLUSH_LSN, FIL_PAGE_SPACE_ID = struct.unpack('>4LQHQL',bdata[:38])
			# undo page header
			undo_page_type,undo_page_start,undo_page_free = struct.unpack('>3H',bdata[38:44])
			undo_page_node = struct.unpack('>LHLH',bdata[44:56])
			# undo segment header 仅第一个undo log有 同一时刻只能由一个事务所有, 但是可能记录多个事务的信息
			trx_undo_state,trx_undo_last_log = struct.unpack('>HH',bdata[56:60])
			trx_undo_fseg_header = struct.unpack('>LLH',bdata[60:70])
			trx_undo_page_list = struct.unpack('>LLHLH',bdata[70:86])
			print(f"\t\t[UNDO LOG] CURRENT PAGE: {undo_page}")
			print(f"\t\t[UNDO LOG] PAGE_TYPE   : {undo_page_type} ({UNDO_LOG_SEGEMT_TYPE[undo_page_type]})")
			print(f"\t\t[UNDO LOG] PAGE_START  : {undo_page_start}") # 开始位置
			print(f"\t\t[UNDO LOG] PAGE_FREE   : {undo_page_free}")  # 结束位置, 遇到这个offset表示结束了
			print(f"\t\t[UNDO LOG] PAGE_NODE   : {undo_page_node}")
			print(f"\t\t[UNDO LOG] TRX_UNDO_STATE       : {trx_undo_state} ({UNDO_LOG_SEGMENT_STAT[trx_undo_state]})")
			print(f"\t\t[UNDO LOG] TRX_UNDO_LAST_LOG    : {trx_undo_last_log}")
			print(f"\t\t[UNDO LOG] TRX_UNDO_FSEG_HEADER : {trx_undo_fseg_header}")
			print(f"\t\t[UNDO LOG] TRX_UNDO_PAGE_LIST   : {trx_undo_page_list}")

			# undo log header (86-->276)
			undo_log_header = bdata[86:276]


			if undo_page_start == undo_page_free:
				continue
			# 仅解析部分
			not_end = True 
			undo_log_start_offset = undo_page_start
			while not_end:
				undo_log_end_offset = struct.unpack('>H',bdata[undo_log_start_offset:undo_page_start+2])[0]
				not_end = False if undo_log_end_offset == undo_page_free else True
				tdata = bdata_buffer(bdata[undo_log_start_offset:undo_log_end_offset][2:-2])
				#tdata.offset = 2
				undo_log_type_flag = struct.unpack('>B',tdata.read(1))[0]
				undo_log_type = undo_log_type_flag&0x0F # bit 0-3
				undo_log_flag = (undo_log_type_flag>>4)& 0x03 # bit 4-5
				if undo_log_type_flag & 64: #TRX_UNDO_MODIFY_BLOB
					undo_rec_flags = struct.unpack('>B',tdata.read(1))[0]
				undo_no = mach_read_next_much_compressed(tdata)
				table_id = mach_read_next_much_compressed(tdata)
				print(f'\t\t\t[UNDO DATA]',bdata[undo_log_start_offset:undo_log_end_offset])
				print(f'\t\t\t[UNDO DATA] TYPE: {undo_log_type} ({UNDO_LOG_TYPE[undo_log_type]})')
				print(f'\t\t\t[UNDO DATA] FLAG: {undo_log_flag}')# ({UNDO_LOG_FLAG[undo_log_flag]})')
				print(f'\t\t\t[UNDO DATA] UNDO NO  : {undo_no}')
				print(f'\t\t\t[UNDO DATA] TABLE ID : {table_id}')
				print(f'\t\t\t[UNDO DATA] REST_DATA: {tdata.bdata[tdata.offset:]}')
				#undo_log_parse(tdata,f) # 把文件描述一并丢过去
				#trx_undo_trx_id,trx_undo_trx_no,trx_undo_del_marks,trx_undo_log_start,trx_undo_flags,trx_undo_dict_trans,trx_undo_table_id,trx_undo_next_log,trx_undo_prev_log = struct.unpack('>QQHHBBQHH',tdata[0:34])
				#trx_undo_history_node = struct.unpack('>LHLH',tdata[34:46])
				if undo_page_node[2] == FIL_PAGE_SPACE_ID:
					undo_log_start_offset = undo_page_node[2]
				else:
					not_end = False # 跨FILE的哒咩


	f.close()

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 导读
  • undo_001
    • FIL_PAGE_TYPE_RSEG_ARRAY
      • FIL_PAGE_TYPE_SYS
        • FIL_PAGE_UNDO_LOG
          • undo log page header
          • undo log segment header
          • undo log header
      • undo log
        • TRX_UNDO_INSERT_REC
          • TRX_UNDO_UPD_XXX_REC
          • 验证
          • 总结
          • 附源码
          相关产品与服务
          云数据库 MySQL
          腾讯云数据库 MySQL(TencentDB for MySQL)为用户提供安全可靠,性能卓越、易于维护的企业级云数据库服务。其具备6大企业级特性,包括企业级定制内核、企业级高可用、企业级高可靠、企业级安全、企业级扩展以及企业级智能运维。通过使用腾讯云数据库 MySQL,可实现分钟级别的数据库部署、弹性扩展以及全自动化的运维管理,不仅经济实惠,而且稳定可靠,易于运维。
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档