一、做性能压测,实现准实时从mysql同步数据到oracle,mysql采用分库分表,oracle是传统rac架构。源端使用全量初始化+增量解析binlog方式来保证实现数据同步,之前主要采用oracle goldengate同步数据,但是goldengate不支持mysql xa事务,从而测试其他同步工具解决mysql到oracle数据同步问题(具体细节后面讲述)。对同步工具进行性能压测如期进行,但是收到数据库连接数告警信息,不符合常理,经过检查发现,数据库等待事件中enq:HW–contention占大头.将近60%.
二、enq:HW–contention官方解释
如果你遇到等待事件,可以通过查询官方能够得到大概意思.等待事件从分类来看属于enq,这个是内存结构锁且是serial。
Enqueues are shared memory structures (locks) that serialize access to database resources. They can be associated with a session or transaction.
The HW enqueue is used to manage the allocation of space beyond the high water mark of a segment. The high water mark of a segment is the boundary between used and unused space in that segment. If contention is occurring for “enq: HW – contention” it is possible that automatic extension is occuring to allow the extra data to be stored since the High Water Mark has been reached. Frequent allocation of extents, reclaiming chunks, and sometimes poor I/O performance may be causing contention for the LOB segments high water mark.
据官方文档描述,Oracle设计 HW – contention队列意义在于对于数据库资源来讲,对于保护内存结构都需要“锁”的概念来控制并发,当申请超过高水位空间时,为防止多个进程同时修改HWM而提供的锁称为HW锁。想要移动HWM的进程必须获得HW锁,但是获取HW锁还是排他锁,因为是加排他锁来保证结构不被破坏.enq: HW – contention这个等待,简言之为’HW等待’,每当请求扩展segment高水位线high water mark之外的空间时需要获取申请锁。
三:enq:HW–contention问题分析
同步工具大概实现方式:
根据之前等待事件解释,我们是积压数据后,源端多个表对应到目标端一个表,类似A001写入A到A031写入A,理论上并发写入时对应进程空间不够时需要高水位之上的空间时才需要申请HW排他锁,否则不影响并发写入,除非每个进程必须强制在高水位之上写入数据,这样大并发下主动加锁会变成窜行操作,出现相应阻塞操作。
经过分析性能测试工具每次写入的空间都在高水位(使用append hint强制),造成每次需要排他锁造成阻塞其他delete、update、insert在此表上。
1、对于并发写入进程或者表还存在其他业务时不适合这种方式
2、对于凌晨定时跑批的系统采用append方式提升大批量写入速度.
HW出现场景:
1、大并发插入造成HW
2、使用APPEND方式
3、表存在LOB字段
具体处理思路:
1、通过ash来分析这个等待事件发生次数,单个节点1小时内等待次数高达6000次以上.实际环境中更严重。每次等待时间是2s.
select instance_num,p1,p2,p3,count(*) from dba_hist_active_sess_history t
where event='enq: HW - contention' and t.sample_time>=timestamp'2019-07-12 15:00:00' and t.sample_time<=timestamp'2019-07-12 16:00:00'
group by instance_num,p1,p2,p3;
p1可以通过转换出来请求锁的模式都是6,6是排他锁.其中4857是HW的ASCILL name.
其中p1,p2,p3解释:
The lock name and requested mode are encoded into P1 as "name<<16 | mode". This is best seen in P1RAW (or you can convert P1 to hexadecimal).
For this wait:
In most cases mode will be either mode 4 or mode 6:
eg: P1RAW of 48570006 = eXclusive mode HW wait, P1RAW of 48570004 = Shared mode HW wait.
2、定位具体对象--这不直观,不如查询具体SQL
用P3定位到具体争用定位对象
SELECT DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(425098194) RELFILE#, DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(425098194) BLOCK# FROM dual;
FILE# BLOCK#
101 1473490
根据FILE#与BLOCK#定位具体OBJECT
select OWNER,SEGMENT_NAME from dba_extents where FILE_ID =101 and 1473490 between block_id and block_id + block_id -1;
OWNER SEGMENT_NAME ;
PIPELINE T_XXX_TASK_SHIPMENT
3、获取具体SQL以及原因分析
不显示完整SQL,可以看出来都是使用APPEND方式,append采用分配高水位之上的空间,根据前面P1知道等待
都是需要获取排他锁6级别.大并发情况造成系统性能问题,主要由于采用append方式造成.数据库em显示等待情况如下:
总结:
通过分析是同步工具采用append方式且多进程操作相同表导致严重HW等待。每次等待时间是2s.总等待时间是107755s,一个小时总的等待次数是:
107755/2=50000次.同时将总的连接数使用超过85%(每个节点2000个session*4=8000*85%=6800).解决思路去掉append方式。
四:采用MERGE原因以及好处
MERGE官方说明:
Purpose
Use the MERGE statement to select rows from one or more sources for update or insertion into a table or view. You can specify conditions to determine whether to update or insert into the target table or view.
This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT, UPDATE, and DELETE DML statements.
MERGE is a deterministic statement. You cannot update the same row of the target table multiple times in the same MERGE statement.
Note:
Oracle Database does not implement fine-grained access control during MERGE statements. If you are using the fine-grained access control feature on the target table or tables, then use equivalent INSERT and UPDATE statements instead of MERGE to avoid error messages and to ensure correct access control.
注意:从上面来看,相同行不能多次更新,所以通过merge方式必须先去重,同时非细粒化访问数据,如果需要细粒化访问表,则不能使用merge方式。
同步工具数据库日志解析成消息写入KAFKA中,然后通过从KAFKA写入ORACLE数据库临时表中(针对更新多次的行,取最后一行方式来避免merge报错),通过设置写入batch size以及时间阈值来作为终点,然后通过临时表与原表进行merge来实现更新与插入操作.通过批次实现高效性能。对于delete操作走正常操作。
MERGE优势说明:
merge方式屏蔽update以及insert的区别,实现one pass处理数据.
merge可以结合parallel加速处理.
由于批量匹配减少扫描原表次数从而提升性能
五:HW相关BUG
备注:每次性能问题不涉及BUG.
Known Issues / Bugs:
You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:
NB | Prob | Bug | Fixed | Description |
---|---|---|---|---|
III | 7319120 | HW contention for BasicFile LOBs - special workaround | ||
III | 20687474 | 12.2.0.1 | Excessive "enq: HW - contention" / "latch: cache buffers chains" inserting into very large non-partitioned table | |
II | 18221857 | 11.2.0.4.4.SAPEXADBBP, 11.2.0.4.BP15, 12.1.0.2, 12.2.0.1 | "enq: HW - contention" on large segments with lots of extents and high insert rate | |
III | 13357632 | 11.2.0.4, 12.1.0.1 | High ST contention from parallel direct load & MERGE | |
III | 9801919 | 11.2.0.1.BP08, 11.2.0.2.10, 11.2.0.2.BP20, 11.2.0.3, 12.1.0.1 | "enq: HW - contention" against segments that add an extent frequently during high concurrency | |
D | III | 6413373 | 10.2.0.5, 11.1.0.7, 11.2.0.1 | Lob HW enqueue contention in RAC environments with ASSM space management |
D | IIII | 6376915 | 10.2.0.4, 11.1.0.7, 11.2.0.1 | HW enqueue contention for ASSM LOB segments |
II | 4867884 | 10.2.0.3, 11.1.0.6 | Lob HW lock contention with space reclaimation |