Postgresql源码（12）重温BufferDesc

mingjie

发布于 2022-07-14 13:37:59

1490

发布于 2022-07-14 13:37:59

/*
 *	BufferDesc -- shared descriptor/state data for a single shared buffer.
 *
 * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
 * the tag, state or wait_backend_pid fields.  In general, buffer header lock
 * is a spinlock which is combined with flags, refcount and usagecount into
 * single atomic variable.  This layout allow us to do some operations in a
 * single atomic operation, without actually acquiring and releasing spinlock;
 * for instance, increase or decrease refcount.  buf_id field never changes
 * after initialization, so does not need locking.  freeNext is protected by
 * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
 * of itself.  The buffer header lock is *not* used to control access to the
 * data in the buffer!
 *

这里主要讲《Buffer header lock》即BM_LOCKED

desc结构中的Buffer header lock没有单独的变量，保存在state中的第22位。 #define BM_LOCKED (1U << 22) /* buffer header is locked */ #define BM_DIRTY (1U << 23) /* data needs writing */ #define BM_VALID (1U << 24) /* data is valid */ #define BM_TAG_VALID (1U << 25) /* tag is assigned */ #define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress */ #define BM_IO_ERROR (1U << 27) /* previous I/O failed */ #define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started */ #define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin */ #define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint */ #define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged,* or init fork) */

header lock保护三个变量：tag、state、wait_backend_pid。
state原子变量整合了状态、refcount、usagecount。使用原子操作一次可以更新多个值，实际上操作这几个值可以不用加锁。
buf_id不会改变，所以不用加锁
freeNext使用buffer_strategy_lock锁
注意Buffer header lock不是用来保护数据内容修改的

 * It's assumed that nobody changes the state field while buffer header lock
 * is held.  Thus buffer header lock holder can do complex updates of the
 * state variable in single write, simultaneously with lock release (cleaning
 * BM_LOCKED flag).  On the other hand, updating of state without holding
 * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
 * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed.
 *

BM_LOCKED持有时，别人都不能更改state字段。所以BM_LOCKED持有者可以在单次写入中对状态变量进行复杂的更新，同时释放锁（清除 BM_LOCKED 标志）。
在不持有BM_LOCKED的情况下更新状态仅限于 CAS（compare and swap）。不允许原子递增/递减、OR/AND 等。

 * An exception is that if we have the buffer pinned, its tag can't change
 * underneath us, so we can examine the tag without locking the buffer header.
 * Also, in places we do one-time reads of the flags without bothering to
 * lock the buffer header; this is generally for situations where we don't
 * expect the flag bit being tested to be changing.
 *

一个例外是，如果我pin住了缓冲区，它的标签就不能在我的逻辑里修改，所以我可以在不锁定缓冲区头的情况下检查标签。

 * We can't physically remove items from a disk page if another backend has
 * the buffer pinned.  Hence, a backend may need to wait for all other pins
 * to go away.  This is signaled by storing its own PID into
 * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present,
 * there can be only one such waiter per buffer.
 *

如果另一个后端pin住了缓冲区，我们就无法从磁盘页面物理删除items。因此，backend可能需要等待所有其他pin结束。
将自己的 PID 存储到 wait_backend_pid 并设置标志位 BM_PIN_COUNT_WAITER 来发出信号的。目前，每个缓冲区只能有一个这样的等待者。

 * We use this same struct for local buffer headers, but the locks are not
 * used and not all of the flag bits are useful either. To avoid unnecessary
 * overhead, manipulations of the state field should be done without actual
 * atomic operations (i.e. only pg_atomic_read_u32() and
 * pg_atomic_unlocked_write_u32()).
 *
 * Be careful to avoid increasing the size of the struct when adding or
 * reordering members.  Keeping it below 64 bytes (the most common CPU
 * cache line size) is fairly important for performance.
 */

本地缓冲区头使用相同的结构，但没有使用锁，也不是所有的标志位都有用。
为了避免不必要的开销，状态字段的操作应该在没有实际原子操作的情况下完成（即只有 pg_atomic_read_u32() 和 pg_atomic_unlocked_write_u32()）。
添加或重新排序成员时，请小心避免增加结构的大小。将其保持在 64 字节以下（最常见的 CPU cache line size）对于性能来说非常重要。

typedef struct BufferDesc
{
	BufferTag	tag;			/* ID of page contained in buffer */
	int			buf_id;			/* buffer's index number (from 0) */

	/* state of the tag, containing flags, refcount and usagecount */
	pg_atomic_uint32 state;

	int			wait_backend_pid;	/* backend PID of pin-count waiter */
	int			freeNext;		/* link in freelist chain */

	LWLock		content_lock;	/* to lock access to buffer contents */
} BufferDesc;

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2021-11-04，如有侵权请联系 cloudcommunity@tencent.com 删除

html

编程算法

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

html

编程算法

登录后参与评论

0 条评论

热度

Postgresql源码（12）重温BufferDesc

Postgresql源码（12）重温BufferDesc

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐