前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >WL#7047: InnoDB: Optimize buffer pool list scans and related batch processing

WL#7047: InnoDB: Optimize buffer pool list scans and related batch processing

作者头像
王文安@DBA
修改2020-09-14 15:11:44
3680
修改2020-09-14 15:11:44
举报
文章被收录于专栏:Sth interesting, as a DBA

Hazard Pointers:

================

There was one hazard pointer (hp) based scan in buffer pool. With this work

we'll have four hp based scans.

We have used the concept of hazard pointer in kind of reverse order.

Academically a hazard pointer is a pointer that the thread working on it will

declare as such and as long as that thread is not done no other thread is

allowed to do anything with it.

In our case, we declare the pointer as a hazard pointer and then if any other

thread attempts to work on it, it is allowed to do so but it has to adjust the

hazard pointer to the next valid value. We use hazard pointer solely for reverse

traversal of lists within a buffer pool instance.

Example:

  • Traversal of LRU list during LRU flushing batch.
  • If we are working on, say, bpage we set bpage->lru->prev as hazard pointer
  • We release buf_pool mutex and start processing bpage
  • Meanwhile if any other thread attempts to remove prev from the LRU list, it will adjust the hazard pointer by setting it to prev->lru->prev under buf_pool::mutex protection.
  • When we are done processing bpage we grab buf_pool::mutex and read the hazard pointer and start processing it.

For each buffer pool instance we'll have:

  • flush_hp: hp for flush list batches
  • lru_hp: hp for lru batches
  • lru_scan_itr: special purpose iterator we'll use to scan LRU list to find a victim for eviction. Threads doing scan will start scan not from tail of LRU but from where the previous scan ended.
  • single_scan_itr: special purpose iterator we'll use to scan LRU list to find a victim for single page flushing/eviction. Threads doing scan will start scan not from tail of LRU but from where the previous scan ended.

The last two types of iterators will be capped by LRU->old i.e.: if we reach the

end of 'old' pages within LRU list we'll reset to the tail of LRU.

Page Eviction from LRU to free list:

====================================

We evict pages from LRU to free list when:

  • We do LRU batch
  • We do single page flushing to find a victim

What we have currently is quite involved. During LRU batch if we hit a page that

is good for eviction (it is clean) we evict it i.e., put it on free list. If,

however, it is dirty we flush it but don't put it on free list. This is because

during flushing we release the buf_pool::mutex and we don't want to start a

rescan for eviction when we are done flushing of that page. We simply leave it

for eviction in the next iteration.

In case of single page flushing we do:

  • scan to find a victim to flush
  • flush the page
  • scan to find that free page (scan needed because we released buf_pool::mutex)
  • evict the page

In new scheme of things we'll move the eviction to the IO completion routine

(typically run in IO helper thread). This means we can do both flushing and

eviction in a single pass during LRU and single page flushing.

LRU batches in single pass:

===========================

Currently we divide LRU batches into smaller chunks of 100 pages. Why?

We do this because we want to go through quick iterations of flushing and

eviction as described above. This ensures that thread waiting for a clean page

don't have to wait very long in case of a very large LRU batch i.e., high value

of innodb_lru_scan_depth.

However, in new scheme of things the eviction is done in the same pass as

flushing. That means we can have arbitrarily large LRU batch and the clean

blocks will become available to the waiting threads as and when they are flushed

(because they will be evicted in the IO completion routine).

Therefore, the logic to divide LRU batch into smaller chunks is not needed

anymore.

No wait to end LRU batch:

=========================

The threads looking for a victim don't need to wait for the LRU batch to end. If

they can find a page in free list that is fine, otherwise they can retry as the

pages are becoming available as and when they are flushed and if that doesn't

help they should try single page flushing.

Other Changes:

==============

One other thing that showed up on perf when working on high end system is that

when doing a search for replaceable page we call buf_LRU_free_page(). This will

compute the fold value and grab page_hash::lock before checking whether or not

page is replaceable. This is called very frequently and we should only try this

function if buf_flush_ready_for_replace() returns true.

Changes based on Performance meeting discussions:

=================================================

  • When scanning a clean page from common LRU list, only scan minimal pages, hard coded to 100 for now. Why minimal? Because with special iterator this much is enough to tell us the density of dirty pages. If it is higher then 99% i.e.: 100 dirty pages without a clean page, in the tail of LRU then we are better off doing a single page flush.
  • Above means that now innodb_lru_scan_depth only controls the LRU batches i.e.: how many pages the page_cleaner thread will try to keep in free list.
  • Add extra counters to distinguish between pages flushed and pages evicted during and LRU batch.

本文系转载,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系转载前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
云数据库 SQL Server
腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档