WL#7047: InnoDB: Optimize buffer pool list scans and related batch processing

王文安@DBA

修改于 2020-09-14 15:11:44

3720

修改于 2020-09-14 15:11:44

文章被收录于专栏：Sth interesting, as a DBA

Hazard Pointers:

================

There was one hazard pointer (hp) based scan in buffer pool. With this work

we'll have four hp based scans.

We have used the concept of hazard pointer in kind of reverse order.

Academically a hazard pointer is a pointer that the thread working on it will

declare as such and as long as that thread is not done no other thread is

allowed to do anything with it.

In our case, we declare the pointer as a hazard pointer and then if any other

thread attempts to work on it, it is allowed to do so but it has to adjust the

hazard pointer to the next valid value. We use hazard pointer solely for reverse

traversal of lists within a buffer pool instance.

Example:

Traversal of LRU list during LRU flushing batch.
If we are working on, say, bpage we set bpage->lru->prev as hazard pointer
We release buf_pool mutex and start processing bpage
Meanwhile if any other thread attempts to remove prev from the LRU list, it will adjust the hazard pointer by setting it to prev->lru->prev under buf_pool::mutex protection.
When we are done processing bpage we grab buf_pool::mutex and read the hazard pointer and start processing it.

For each buffer pool instance we'll have:

flush_hp: hp for flush list batches
lru_hp: hp for lru batches
lru_scan_itr: special purpose iterator we'll use to scan LRU list to find a victim for eviction. Threads doing scan will start scan not from tail of LRU but from where the previous scan ended.
single_scan_itr: special purpose iterator we'll use to scan LRU list to find a victim for single page flushing/eviction. Threads doing scan will start scan not from tail of LRU but from where the previous scan ended.

The last two types of iterators will be capped by LRU->old i.e.: if we reach the

end of 'old' pages within LRU list we'll reset to the tail of LRU.

Page Eviction from LRU to free list:

====================================

We evict pages from LRU to free list when:

We do LRU batch
We do single page flushing to find a victim

What we have currently is quite involved. During LRU batch if we hit a page that

is good for eviction (it is clean) we evict it i.e., put it on free list. If,

however, it is dirty we flush it but don't put it on free list. This is because

during flushing we release the buf_pool::mutex and we don't want to start a

rescan for eviction when we are done flushing of that page. We simply leave it

for eviction in the next iteration.

In case of single page flushing we do:

scan to find a victim to flush
flush the page
scan to find that free page (scan needed because we released buf_pool::mutex)
evict the page

In new scheme of things we'll move the eviction to the IO completion routine

(typically run in IO helper thread). This means we can do both flushing and

eviction in a single pass during LRU and single page flushing.

LRU batches in single pass:

===========================

Currently we divide LRU batches into smaller chunks of 100 pages. Why?

We do this because we want to go through quick iterations of flushing and

eviction as described above. This ensures that thread waiting for a clean page

don't have to wait very long in case of a very large LRU batch i.e., high value

of innodb_lru_scan_depth.

However, in new scheme of things the eviction is done in the same pass as

flushing. That means we can have arbitrarily large LRU batch and the clean

blocks will become available to the waiting threads as and when they are flushed

(because they will be evicted in the IO completion routine).

Therefore, the logic to divide LRU batch into smaller chunks is not needed

anymore.

No wait to end LRU batch:

=========================

The threads looking for a victim don't need to wait for the LRU batch to end. If

they can find a page in free list that is fine, otherwise they can retry as the

pages are becoming available as and when they are flushed and if that doesn't

help they should try single page flushing.

Other Changes:

==============

One other thing that showed up on perf when working on high end system is that

when doing a search for replaceable page we call buf_LRU_free_page(). This will

compute the fold value and grab page_hash::lock before checking whether or not

page is replaceable. This is called very frequently and we should only try this

function if buf_flush_ready_for_replace() returns true.

Changes based on Performance meeting discussions:

=================================================

When scanning a clean page from common LRU list, only scan minimal pages, hard coded to 100 for now. Why minimal? Because with special iterator this much is enough to tell us the density of dirty pages. If it is higher then 99% i.e.: 100 dirty pages without a clean page, in the tail of LRU then we are better off doing a single page flush.
Above means that now innodb_lru_scan_depth only controls the LRU batches i.e.: how many pages the page_cleaner thread will try to keep in free list.
Add extra counters to distinguish between pages flushed and pages evicted during and LRU batch.

本文系转载，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

云数据库 SQL Server

本文系转载，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

云数据库 SQL Server

登录后参与评论

0 条评论

热度

WL#7047: InnoDB: Optimize buffer pool list scans and related batch processing

WL#7047: InnoDB: Optimize buffer pool list scans and related batch processing

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐