前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Postgresql源码(12)重温BufferDesc

Postgresql源码(12)重温BufferDesc

作者头像
mingjie
发布2022-07-14 13:37:59
1490
发布2022-07-14 13:37:59
举报
代码语言:javascript
复制
/*
 *	BufferDesc -- shared descriptor/state data for a single shared buffer.
 *
 * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
 * the tag, state or wait_backend_pid fields.  In general, buffer header lock
 * is a spinlock which is combined with flags, refcount and usagecount into
 * single atomic variable.  This layout allow us to do some operations in a
 * single atomic operation, without actually acquiring and releasing spinlock;
 * for instance, increase or decrease refcount.  buf_id field never changes
 * after initialization, so does not need locking.  freeNext is protected by
 * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
 * of itself.  The buffer header lock is *not* used to control access to the
 * data in the buffer!
 *

这里主要讲《Buffer header lock》即BM_LOCKED

desc结构中的Buffer header lock没有单独的变量,保存在state中的第22位。 #define BM_LOCKED (1U << 22) /* buffer header is locked */ #define BM_DIRTY (1U << 23) /* data needs writing */ #define BM_VALID (1U << 24) /* data is valid */ #define BM_TAG_VALID (1U << 25) /* tag is assigned */ #define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress */ #define BM_IO_ERROR (1U << 27) /* previous I/O failed */ #define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started */ #define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin */ #define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint */ #define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged,* or init fork) */

  • header lock保护三个变量:tag、state、wait_backend_pid。
  • state原子变量整合了状态、refcount、usagecount。使用原子操作一次可以更新多个值,实际上操作这几个值可以不用加锁。
  • buf_id不会改变,所以不用加锁
  • freeNext使用buffer_strategy_lock锁
  • 注意Buffer header lock不是用来保护数据内容修改的
代码语言:javascript
复制
 * It's assumed that nobody changes the state field while buffer header lock
 * is held.  Thus buffer header lock holder can do complex updates of the
 * state variable in single write, simultaneously with lock release (cleaning
 * BM_LOCKED flag).  On the other hand, updating of state without holding
 * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
 * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed.
 *
  • BM_LOCKED持有时,别人都不能更改state字段。所以BM_LOCKED持有者可以在单次写入中对状态变量进行复杂的更新,同时释放锁(清除 BM_LOCKED 标志)。
  • 在不持有BM_LOCKED的情况下更新状态仅限于 CAS(compare and swap)。不允许原子递增/递减、OR/AND 等。
代码语言:javascript
复制
 * An exception is that if we have the buffer pinned, its tag can't change
 * underneath us, so we can examine the tag without locking the buffer header.
 * Also, in places we do one-time reads of the flags without bothering to
 * lock the buffer header; this is generally for situations where we don't
 * expect the flag bit being tested to be changing.
 *
  • 一个例外是,如果我pin住了缓冲区,它的标签就不能在我的逻辑里修改,所以我可以在不锁定缓冲区头的情况下检查标签。
代码语言:javascript
复制
 * We can't physically remove items from a disk page if another backend has
 * the buffer pinned.  Hence, a backend may need to wait for all other pins
 * to go away.  This is signaled by storing its own PID into
 * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present,
 * there can be only one such waiter per buffer.
 *
  • 如果另一个后端pin住了缓冲区,我们就无法从磁盘页面物理删除items。 因此,backend可能需要等待所有其他pin结束。
  • 将自己的 PID 存储到 wait_backend_pid 并设置标志位 BM_PIN_COUNT_WAITER 来发出信号的。 目前,每个缓冲区只能有一个这样的等待者。
代码语言:javascript
复制
 * We use this same struct for local buffer headers, but the locks are not
 * used and not all of the flag bits are useful either. To avoid unnecessary
 * overhead, manipulations of the state field should be done without actual
 * atomic operations (i.e. only pg_atomic_read_u32() and
 * pg_atomic_unlocked_write_u32()).
 *
 * Be careful to avoid increasing the size of the struct when adding or
 * reordering members.  Keeping it below 64 bytes (the most common CPU
 * cache line size) is fairly important for performance.
 */
  • 本地缓冲区头使用相同的结构,但没有使用锁,也不是所有的标志位都有用。
  • 为了避免不必要的开销,状态字段的操作应该在没有实际原子操作的情况下完成(即只有 pg_atomic_read_u32() 和 pg_atomic_unlocked_write_u32())。
  • 添加或重新排序成员时,请小心避免增加结构的大小。 将其保持在 64 字节以下(最常见的 CPU cache line size)对于性能来说非常重要。
代码语言:javascript
复制
typedef struct BufferDesc
{
	BufferTag	tag;			/* ID of page contained in buffer */
	int			buf_id;			/* buffer's index number (from 0) */

	/* state of the tag, containing flags, refcount and usagecount */
	pg_atomic_uint32 state;

	int			wait_backend_pid;	/* backend PID of pin-count waiter */
	int			freeNext;		/* link in freelist chain */

	LWLock		content_lock;	/* to lock access to buffer contents */
} BufferDesc;
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2021-11-04,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档