前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[MYSQL] MY-012095 表空间ID不对

[MYSQL] MY-012095 表空间ID不对

原创
作者头像
大大刺猬
发布2024-09-13 16:01:00
2470
发布2024-09-13 16:01:00
举报
文章被收录于专栏:大大刺猬

导读

有时候需要使用mysql的数据文件做恢复, 或者其它某些异常导致mysqld启动的时候报错如下:

代码语言:txt
复制
2024-09-13T02:20:33.489262Z 0 [Warning] [MY-013244] [Server] --collation-server: 'utf8_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead.
2024-09-13T02:20:33.495902Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-09-13T02:20:33.832733Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-09-13T02:20:33.919585Z 0 [ERROR] [MY-012095] [InnoDB] [FATAL] Tablespace id is 2 in the data dictionary but in file ./db1/t20240912.ibd it is 49591!
2024-09-13T02:20:33.919622Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: fil0fil.cc:2582:ib::fatal triggered thread 140154526787328
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:20:33 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f9ffde]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(print_fatal_signal(int)+0x2eb) [0x103628b]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_server_abort()+0x5e) [0x103638e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(my_abort()+0xa) [0x1f9a3aa]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x30c) [0x225c4bc]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(ib::fatal::~fatal()+0x98) [0x225ee18]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::get_file_size(fil_node_t*, bool)+0x3fe) [0x239df2e]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::open_file(fil_node_t*)+0x294) [0x23ad124]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::prepare_file_for_io(fil_node_t*)+0x33) [0x23adfd3]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(Fil_shard::do_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*)+0x1a9) [0x23aeb69]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*)+0x57) [0x23af5f7]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t const&, page_size_t const&, bool)+0x13e) [0x22ed5be]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_read_page_background(page_id_t const&, page_size_t const&, bool)+0x26) [0x22ed966]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x22d1503]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(buf_dump_thread()+0xe5) [0x22d1c55]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld(std::thread::_State_impl<std::thread::_Invoker<std::tuple<Detached_thread, void (*)()> > >::_M_run()+0xbc) [0x212061c]
/soft/mysql_3306/mysqlbase/mysql/bin/mysqld() [0x279e1ef]

分析

根据描述来看是 db1/t20240912.ibd 数据字典里面记录的Tablespace id是2, 但实际文件记录的是49591. 然后就挂了

既然是表空间id不一致, 那解决办法就至少有3种了.

1: 修改数据字典

2: 修改ibd文件里面的表空间id

3: 使用alter table xxx import tablespace的方式导入表

基于我们之前修改 lower_case_table_names 的经验来看, 修改ibd文件里面的信息更实际一点-_-

虽然之前讲过ibd的结构, 但这里再来回顾一下

也就是每页的第34-38字节 和 第1页的38-42 字节记录了表空间ID. 我们只需要修改这里就行.(记得同步校验字段)

验证

方法1(推荐)

我们先使用import tablespace的方式导入, 该方式需要先discard表空间, 所以记得先备份表空间文件, 不然discard了就没得了. 当然现在数据库都起不来了, 我们直接mv走就是了

代码语言:shell
复制
mv /data/mysql_3306/mysqldata/db1/t20240912.ibd /tmp
systemctl start mysqld_3306

应该会在日志里面发现如下Warning, 但我们并不在意它

代码语言:txt
复制
2024-09-13T06:23:30.463261Z 1 [Warning] [MY-012351] [InnoDB] Tablespace 2, name 'db1/t20240912', file './db1/t20240912.ibd' is missing!

然后我们再拷贝回去并导入数据库

代码语言:sql
复制
SHELL> mv /tmp/t20240912.ibd /data/mysql_3306/mysqldata/db1
SQL> alter table db1.t20240912 import tablespace;

方法2

方法1看起来没毛病, 我们先使用脚本看下t20240912.ibd的表空间id是多少(2还是49591呢?) 脚本放在文末

好家伙, 居然给我修改为了2... 汽油磁力!

我们也来使用脚本来修改ibd文件里面的表空间id为2吧. (先回退快照,或者重新模拟下故障)

代码语言:shell
复制
systemctl stop mysqld_3306
rm -rf /data/mysql_3306/mysqldata/db1/t20240912.ibd
python mysql_replace_tablespaceid.py t20240912.ibd 2 /data/mysql_3306/mysqldata/db1/t20240912.ibd
chown mysql:mysql /data/mysql_3306/mysqldata/db1/t20240912.ibd
systemctl start mysqld_3306

看起来没报错, 我们登录数据库验证下呢

说:该ibd文件的事务比我们系统的事务要高.... (使用旧的ibd文件在新环境恢复的事情被发现了). 官方竟然给我们来了这么一手. 汽油磁力!*2

error日志里面也是类似的信息(但更具体了):

代码语言:txt
复制
2024-09-13T06:41:20.444460Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=4] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444507Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.
2024-09-13T06:41:20.444523Z 8 [ERROR] [MY-011971] [InnoDB] Tablespace 'db1/t20240912' Page [page id: space=2, page number=1] log sequence number 113002135714 is in the future! Current system log sequence number 18335077.
2024-09-13T06:41:20.444533Z 8 [ERROR] [MY-011972] [InnoDB] Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery.

理论上通过拷贝redo,ibdata等文件是可以解决这个问题的. 但实际环境哪来那么多东西呢.(当然也可以使用ibd2sql之类的工具直接将ibd文件转为SQL语句). 也可以直接修改ibd文件里面的trx_id, 但这就太复杂了. 有没得简单点的方法呢???

按照官方的设置 innodb_force_recovery应该可以(毕竟可以不看redo了都).

或者就是修改系统的LSN(18335077)为大于等于我们当前值(113002135714). 毕竟ibdata也是page 比如:

那我们小试一波.

代码语言:python
代码运行次数:0
复制
import struct
filename = '/data/mysql_3306/mysqldata/ibdata1'
filename2 = '/tmp/ibdata1'
PAGE_SIZE = 16384
CURRENT_LSN = 18335077
LSN = 113002135714 + 100000
def create_crc32c_table():
	poly = 0x82f63b78
	table = []
	for i in range(256):
		crc = i
		for _ in range(8):
			if crc & 1:
				crc = (crc >> 1) ^ poly
			else:
				crc >>= 1
		table.append(crc)
	return table

crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
	crc = 0xFFFFFFFF
	for byte in bytearray(data): # for PY2
		crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
	return crc ^ 0xFFFFFFFF

def replace_crc32(data):
	c1 = calculate_crc32c(data[4:26])
	c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
	cb = struct.pack('>L',(c1^c2)&(2**32-1))
	data = cb + data[4:PAGE_SIZE-8] + cb + data[PAGE_SIZE-4:]
	return data

def find_xx_positions(s,x):
	positions = []
	xl = len(x)
	start = 0
	while True:
		pos = s.find(x, start)
		if pos == -1:
			break
		positions.append(pos)
		start = pos + xl
	return positions

LSN_OLD = struct.pack('>Q',CURRENT_LSN)
LSN_NEW = struct.pack('>Q',LSN)
f2 = open(filename2,'wb')
with open(filename, 'rb') as f:
	while True:
		data = f.read(PAGE_SIZE)
		if data == b'':
			break
		for x in find_xx_positions(data,LSN_OLD):
			data = data[:x] + LSN_NEW + data[x+8:]
			data = replace_crc32(data)
		f2.write(data)

f2.flush()
f2.close()

可惜, 不行. 虽然报错看起来像是坏块, 但检查了不是坏块, 这回应该是内部的数据存在校验了. 或者是ibdata1根本没记录LSN, lsn只记录在redo里面

LSN修改

既然改不了系统的, 那就还是来修改我们自己的ibd文件的LSN咯... 根据上面的图, 我们知道1个page里面有多个lsn

那我们直接将lsn修改为0吧. 再改改代码

修改之后查询数据,并查看日志信息如下: LSN问题确实没得了, 但trx问题还是没处理啊....

虽然我们也可以修改trx_id, 但这涉及到行数据的解析了, 比较麻烦. 还是老老实实的使用import tablespace吧.

总结

数据恢复的时候还是先使用官方的命令,(不但帮忙修改了tablespace id, 甚至还修改了lsn&trx?). 如果尝试手动修改tablespace信息的话, 还需要考虑LSNTRX ID之类的问题.老麻烦了

附源码

这里附的是修改ibd文件的tablespace id和lsn的源码, github上那个没得lsn的修改的.(也就两三行代码的差距)

代码语言:python
代码运行次数:0
复制
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# write by ddcw
# innodb 表空间中的 space_id的替换
# 表空间id位于34-38 大端字节序

PAGE_SIZE = 16384
import sys,struct,os
args = len(sys.argv)
if args == 2:
	with open(sys.argv[1],'rb') as f:
		data = f.read(PAGE_SIZE) 
		space_id = struct.unpack('>L',data[34:38])[0]
		msg = "TABLESPACE ID: " + str(space_id) + '\n'
		sys.stdout.write(msg)
		sys.exit()
elif args == 4:
	filename = sys.argv[1]
	space_id = int(sys.argv[2])
	filename2 = sys.argv[3]
	if not os.path.exists(filename):
		msg = filename + " IS NOT EXISTS.\n"
		sys.stdout.write(msg)
		sys.exit(1)
	elif int(os.stat(filename).st_size % PAGE_SIZE) != 0:
		msg = filename + " Maybe not mysql's ibd file\n"
		sys.stdout.write(msg)
		sys.exit(2)
	if os.path.exists(filename2):
		msg = filename2 + " IS EXISTS. Please rename it\n"
		sys.stdout.write(msg)
		sys.exit(3)
else:
	msg = "\nExample: \npython " + sys.argv[0] + " test.ibd\npython " + sys.argv[0] + " test.ibd 123456 new_test.ibd\n\n"
	sys.stdout.write(msg)
	sys.exit(4)

def create_crc32c_table():
	poly = 0x82f63b78
	table = []
	for i in range(256):
		crc = i
		for _ in range(8):
			if crc & 1:
				crc = (crc >> 1) ^ poly
			else:
				crc >>= 1
		table.append(crc)
	return table

crc32_slice_table = create_crc32c_table()
def calculate_crc32c(data):
	crc = 0xFFFFFFFF
	for byte in bytearray(data): # for PY2
		crc = crc32_slice_table[(crc ^ byte) & 0xFF] ^ (crc >> 8)
	return crc ^ 0xFFFFFFFF

def replace_lsn(data):
	return data[:16] + struct.pack('>Q',0) + data[24:PAGE_SIZE-4:] + struct.pack('>L',0)

def replace_crc32(data):
	data = replace_lsn(data)
	c1 = calculate_crc32c(data[4:26])
	c2 = calculate_crc32c(data[38:PAGE_SIZE-8])
	cb = struct.pack('>L',(c1^c2)&(2**32-1))
	data = cb + data[4:PAGE_SIZE-8] + cb + data[PAGE_SIZE-4:]
	return data



f2 = open(filename2,'wb')
SPACE_ID = struct.pack('>L',space_id)
with open(filename, 'rb') as f:
	# FSP 38-42 (SPACE_HEADER:4 is SPACE ID) 
	data = f.read(PAGE_SIZE)
	data = data[:34] + SPACE_ID + SPACE_ID + data[42:]
	data = replace_crc32(data)
	f2.write(data)
	while True:
		data = f.read(PAGE_SIZE)
		if data == b'':
			break
		if data[34:38] != b'\x00\x00\x00\x00':
			data = replace_crc32(data[:34] + SPACE_ID + data[38:])
		f2.write(data)

f2.flush()
f2.close()
msg = 'Write to filename: ' + filename2 + '\n'
sys.stdout.write(msg)

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 导读
  • 分析
  • 验证
    • 方法1(推荐)
      • 方法2
      • LSN修改
      • 总结
      • 附源码
      相关产品与服务
      云数据库 MySQL
      腾讯云数据库 MySQL(TencentDB for MySQL)为用户提供安全可靠,性能卓越、易于维护的企业级云数据库服务。其具备6大企业级特性,包括企业级定制内核、企业级高可用、企业级高可靠、企业级安全、企业级扩展以及企业级智能运维。通过使用腾讯云数据库 MySQL,可实现分钟级别的数据库部署、弹性扩展以及全自动化的运维管理,不仅经济实惠,而且稳定可靠,易于运维。
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档