[MYSQL] 自定义mysql慢日志分析工具

原创

大大刺猬

发布于 2024-05-28 15:15:18

1180

发布于 2024-05-28 15:15:18

导读

MYSQL性能分析最快的就是慢日志分析, 慢日志可以记录超过long_query_time的SQL, 也可以记录未使用索引的SQL(log_queries_not_using_indexes), 一般未使用索引的SQL会比较慢, 但也不是绝对的, 所以这个参数默认是OFF的.

通常我们可以使用pt-query-digest或者mysqldumpslow之类的工具来分析慢日志. 但存在局限性, 比如我想查看返回行数超过200的SQL, 就没得相关参数来实现了, 虽然可以使用awk之类的来实现, 但比较麻烦, 而且效率存在一定问题.

慢日志查看

慢日志里面记录的信息还是很全的, 比如执行时间, 执行账号,锁时间,返回行数之类的.

对象	描述	介绍
Query_time	The statement execution time in seconds.	执行时间,单位:秒
Lock_time	The time to acquire locks in seconds.	获取锁的时间
Rows_sent	The number of rows sent to the client.	返回客户端的行数
Rows_examined	The number of rows examined by the server layer (not counting any processing internal to storage engines).	服务端产生的行数

如果启用了log_slow_extra参数的话, 还会记录更多的信息

column1	column2
Thread_id	The statement thread identifier.
Errno	The statement error number, or 0 if no error occurred.
Killed	If the statement was terminated, the error number indicating why, or 0 if the statement terminated normally.
Bytes_received	The Bytes_received value for the statement.
Bytes_sent	The Bytes_sent value for the statement.
Read_first	The Handler_read_first value for the statement.
Read_last	The Handler_read_last value for the statement.
Read_key	The Handler_read_key value for the statement.
Read_next	The Handler_read_next value for the statement.
Read_prev	The Handler_read_prev value for the statement.
Read_rnd	The Handler_read_rnd value for the statement.
Read_rnd_next	The Handler_read_rnd_next value for the statement.
Sort_merge_passes	The Sort_merge_passes value for the statement.
Sort_range_count	The Sort_range value for the statement.
Sort_rows	The Sort_rows value for the statement.
Sort_scan_count	The Sort_scan value for the statement.
Created_tmp_disk_tables	The Created_tmp_disk_tables value for the statement.
Created_tmp_tables	The Created_tmp_tables value for the statement.
Start	The statement execution start time.
End	The statement execution end time.

比如存在如下的一条慢日志:

# Time: 2024-05-28T06:17:48.704559Z
# User@Host: root[root] @ localhost [127.0.0.1]  Id:  3224
# Query_time: 0.000057  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 1 Thread_id: 3224 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 67 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-05-28T06:17:48.704502Z End: 2024-05-28T06:17:48.704559Z
SET timestamp=1716877068;
select USER();

# Time: 2024-05-28T06:17:48.704559Z 表示记录这条慢日志的时间, 也可以当作是SQL执行结束时间. 这个时间是Z结尾的, 表明是UTC时间, 未加上时区, 所以我们计算的时候得人为的加上 8 小数. 如果带有时间的, 则为# Time: 2024-05-28T11:04:32.649161+08:00 这种样子.

# User@Host: rootroot @ localhost 127.0.0.1 Id: 3224 表明是 root@127.0.0.1 账号执行的SQL, 连接id是3224 (show processlist看到的id)

# Query_time: 0.000057 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 1 Thread_id: 3224 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 67 Read_first: 0 Read_last: 0 Read_key: 0 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 0 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 0 Sort_scan_count: 0 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2024-05-28T06:17:48.704502Z End: 2024-05-28T06:17:48.704559Z

我这里启用了log_slow_extra, 所以能看到Thread_id及其之后的信息, 我们可以看到这里的End的时间和上面的# Time的时间是一样的, 代表SQL结束时间(不一定是正常结束,ctrl+c之类的也会记录下来.)

SET timestamp=1716877068;

select USER(); 就是我们执行的SQL语句, 当然也可能是某些客户端工具自动执行的.

也有一些特殊的SQL, 比如退出连接的时候, 记录的是 # administrator command: Quit; (QUIT:0x01 表示断开连接)

自定义脚本分析慢日志

既然慢日志的格式我们知道了, 那我们只需要取两个 # Time: 之间的信息即可得到一条慢日志, 然后记录下来, 按照我们的需求做分析, 我们本次的需求比较简单, 就文件读取,字符串匹配, 只是简单的打印平均值, 最大值, 最小值即可. 当然还有执行次数. 还是使用python3来实现, 比较简单, 我就直接演示了.

python3 slowlog_analyze.py /data/mysql_3314/mysqllog/dblogs/slow3314.log

当然我们可以加一些过滤规则, 比如执行次数大于1000, Rows_examined > 200 之类的SQL再展示. 我这里就不去演示了.

我们的SQL没有做脱敏或者其它处理, 所以速度很快,嘎嘎快

工具	时间(秒)
slowlog_analyze.py	1.3
pt-query-digest	4.3
mysqldumpslow	31.8

可以看到速度遥遥领先其它工具. 当然如果加上过滤规则,SQL处理等之后, 速度会慢一点, 但不会很明显, 最主要是能符合我们的要求.

附脚本

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# write by ddcw @https://github.com/ddcw
# 分析mysql慢日志的, 

import sys,os
import datetime
import re
from decimal import Decimal

if len(sys.argv) <= 1:
	msg = "USAGE: python3 " + str(sys.argv[0]) + " /PATH/slow3314.log\n"
	sys.stdout.write(msg)
	sys.exit(1)

filename = sys.argv[1]

SLOWLOG_DICT = {} #取HASH,没必要, 直接把SQL作为key
ADDHOURS = datetime.timedelta(hours=8)
PATTERN_UH = re.compile(r"\[([^\[\]]+)\]")
PATTERN_KV = re.compile(r"(\w+):\s+([0-9.]+)")
with open(filename,'r') as f:
	log_time = None #日志记录时间, 基本上可以算是sql执行结束时间
	user_host = None #执行慢SQL的账号
	Query_time = None
	Lock_time = None
	Rows_sent = None
	Rows_examined = None # 服务端所产生的行数. 是可能大于表的总行数的. select * from db1.t1 limit 10 则为总行数+10
	extra_info = None #Thread_id Errno Bytes_received ...
	sql = ""
	NOT_BREAK_FLAG = True
	while NOT_BREAK_FLAG:
		data = f.readline()
		if data == '':
			NOT_BREAK_FLAG = False
		# Time: 
		if data[:8] == "# Time: " or not NOT_BREAK_FLAG:
			if log_time is not None: #记录上一条SQL
				if sql in SLOWLOG_DICT and SLOWLOG_DICT[sql]['user_host'] == user_host:
					SLOWLOG_DICT[sql]['log_time'].append(      log_time)
					SLOWLOG_DICT[sql]['Query_time'].append(    Query_time)
					SLOWLOG_DICT[sql]['Lock_time'].append(     Lock_time)
					SLOWLOG_DICT[sql]['Rows_sent'].append(     Rows_sent)
					SLOWLOG_DICT[sql]['Rows_examined'].append( Rows_examined)
					SLOWLOG_DICT[sql]['extra_info'].append(    extra_info)
				else:
					SLOWLOG_DICT[sql] = {}
					SLOWLOG_DICT[sql]['user_host']     = user_host
					SLOWLOG_DICT[sql]['log_time']      = [log_time]
					SLOWLOG_DICT[sql]['Query_time']    = [Query_time]
					SLOWLOG_DICT[sql]['Lock_time']     = [Lock_time]
					SLOWLOG_DICT[sql]['Rows_sent']     = [Rows_sent]
					SLOWLOG_DICT[sql]['Rows_examined'] = [Rows_examined]
					SLOWLOG_DICT[sql]['extra_info']    = [extra_info]
			elif data != "":
				log_time = data.split()[-1]
				if log_time[-1] == 'Z':
					log_time = datetime.datetime.strptime(log_time, "%Y-%m-%dT%H:%M:%S.%fZ") + ADDHOURS
				else:
					log_time = datetime.datetime.strptime(log_time, "%Y-%m-%dT%H:%M:%S.%f+08:00")
				#log_time = log_time..timestamp() # 转为float
			sql = ""
				
		# User@Host: 
		elif data[:13] == "# User@Host: ":
			user_host = "@".join(PATTERN_UH.findall(data))

		# Query_time: 
		elif data[:14] == "# Query_time: ":
			kv = {key: Decimal(value) for key, value in PATTERN_KV.findall(data)}
			Query_time = kv['Query_time']
			Lock_time = kv['Lock_time']
			Rows_sent = kv['Rows_sent']
			Rows_examined = kv['Rows_examined']
			# extra_info TODO

		elif data[:14] == "SET timestamp=":
			continue
		# Time mysqld刚启动时产生的header.
		elif data[:4] == "Time":
			log_time = None
		else: # 剩下的就是SQL了
			sql += data


# 输出慢日志信息了.
for sql in SLOWLOG_DICT:
	# 过滤规则
	if len(SLOWLOG_DICT[sql]['Query_time']) < 0:
		continue
	sys.stdout.write("\n")
	sys.stdout.write(f"{''.ljust(20)}{'TOTAL'.ljust(20)}{'MIN'.ljust(20)}{'MAX'.ljust(20)}{'AVG'.ljust(20)}\n")
	for k in ['Query_time','Lock_time','Rows_sent','Rows_examined']:
		_total = sum(SLOWLOG_DICT[sql][k])
		_min = min(SLOWLOG_DICT[sql][k])
		_max = max(SLOWLOG_DICT[sql][k])
		_avg = round(_total / len(SLOWLOG_DICT[sql][k]),6)
		sys.stdout.write(f"{k.ljust(20)}{str(_total).ljust(20)}{str(_min).ljust(20)}{str(_max).ljust(20)}{str(_avg).ljust(20)}\n")
	sys.stdout.write(f"EXECUTE_TIME:{len(SLOWLOG_DICT[sql]['Query_time'])}\n{sql}\n")

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

python