Postgresql源码（11）热备KnownAssignedTransactions子模块

mingjie

发布于 2022-07-14 13:37:40

3430

发布于 2022-07-14 13:37:40

数据结构

/*
 * Initialize the shared PGPROC array during postmaster startup.
 */
void
CreateSharedProcArray(void)
{
	bool		found;

	/* Create or attach to the ProcArray shared structure */
	procArray = (ProcArrayStruct *)
		ShmemInitStruct("Proc Array",
						add_size(offsetof(ProcArrayStruct, pgprocnos),
								 mul_size(sizeof(int),
										  PROCARRAY_MAXPROCS)),
						&found);

	if (!found)
	{
		/*
		 * We're the first - initialize.
		 */
		procArray->numProcs = 0;
		procArray->maxProcs = PROCARRAY_MAXPROCS;
		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
		procArray->numKnownAssignedXids = 0;
		procArray->tailKnownAssignedXids = 0;
		procArray->headKnownAssignedXids = 0;
		SpinLockInit(&procArray->known_assigned_xids_lck);
		procArray->lastOverflowedXid = InvalidTransactionId;
		procArray->replication_slot_xmin = InvalidTransactionId;
		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
	}

	allProcs = ProcGlobal->allProcs;
	allPgXact = ProcGlobal->allPgXact;

	/* Create or attach to the KnownAssignedXids arrays too, if needed */
	if (EnableHotStandby)
	{
		KnownAssignedXids = (TransactionId *)
			ShmemInitStruct("KnownAssignedXids",
							mul_size(sizeof(TransactionId),
									 TOTAL_MAX_CACHED_SUBXIDS),
							&found);
		KnownAssignedXidsValid = (bool *)
			ShmemInitStruct("KnownAssignedXidsValid",
							mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
							&found);
	}

	/* Register and initialize fields of ProcLWLockTranche */
	LWLockRegisterTranche(LWTRANCHE_PROC, "proc");
}

每个进程给准备了64个位置用于保存xid，也就是可以最多村64个子事务ID

备库更新过程

/* ----------------------------------------------
 *		KnownAssignedTransactions sub-module
 * ----------------------------------------------
 */

/*
 * In Hot Standby mode, we maintain a list of transactions that are (or were)
 * running in the master at the current point in WAL.  These XIDs must be
 * treated as running by standby transactions, even though they are not in
 * the standby server's PGXACT array.
 *

在热备模式下，我们在 WAL 的当前点维护一个正在（或曾经）在主服务器中运行的事务列表。这些 XID 必须被视为运行中的事务，即使它们不在备用服务器的 PGXACT 队列中。

 * We record all XIDs that we know have been assigned.  That includes all the
 * XIDs seen in WAL records, plus all unobserved XIDs that we can deduce have
 * been assigned.  We can deduce the existence of unobserved XIDs because we
 * know XIDs are assigned in sequence, with no gaps.  The KnownAssignedXids
 * list expands as new XIDs are observed or inferred, and contracts when
 * transaction completion records arrive.
 *

我们记录我们知道已分配的所有 XID。包括在 WAL 记录中看到的所有 XID，以及推断出的所有未观察到的 XID。我们可以推断出未观察到的 XID 的存在，因为我们知道 XID 是按顺序分配的，没有gap。当观察到或推断出新的 XID 时，KnownAssignedXids 列表会扩展，并在交易完成记录到达时收缩。

 * During hot standby we do not fret too much about the distinction between
 * top-level XIDs and subtransaction XIDs. We store both together in the
 * KnownAssignedXids list.  In backends, this is copied into snapshots in
 * GetSnapshotData(), taking advantage of the fact that XidInMVCCSnapshot()
 * doesn't care about the distinction either.  Subtransaction XIDs are
 * effectively treated as top-level XIDs and in the typical case pg_subtrans
 * links are *not* maintained (which does not affect visibility).
 *

在热备期间，我们不太担心顶级 XID 和子事务 XID 之间的区别。我们将两者一起存储在 KnownAssignedXids 列表中。在后端xid被复制到 GetSnapshotData() 中的快照中，利用 XidInMVCCSnapshot()判断。子事务 XID 被有效地视为顶级 XID，并且在典型情况下 pg_subtrans 链接不维护（这不会影响可见性）。

 * We have room in KnownAssignedXids and in snapshots to hold maxProcs *
 * (1 + PGPROC_MAX_CACHED_SUBXIDS) XIDs, so every master transaction must
 * report its subtransaction XIDs in a WAL XLOG_XACT_ASSIGNMENT record at
 * least every PGPROC_MAX_CACHED_SUBXIDS.  When we receive one of these
 * records, we mark the subXIDs as children of the top XID in pg_subtrans,
 * and then remove them from KnownAssignedXids.  This prevents overflow of
 * KnownAssignedXids and snapshots, at the cost that status checks for these
 * subXIDs will take a slower path through TransactionIdIsInProgress().
 * This means that KnownAssignedXids is not necessarily complete for subXIDs,
 * though it should be complete for top-level XIDs; this is the same situation
 * that holds with respect to the PGPROC entries in normal running.
 *

我们在 knownAssignedXids 和快照中有空间来保存 maxProcs * (1 + PGPROC_MAX_CACHED_SUBXIDS) XID，因此每个主事务必须至少在每个 PGPROC_MAX_CACHED_SUBXIDS 的 WAL XLOG_XACT_ASSIGNMENT 记录中报告其子事务 XID。当我们收到这些记录之一时，我们将 subXID 标记为 pg_subtrans 中顶部 XID 的子项，然后将它们从 KnownAssignedXids 中删除。这可以防止 knownAssignedXids 和快照溢出，代价是对这些 subXID 的状态检查将通过 TransactionIdIsInProgress() 采取较慢的路径。这意味着对于子 XID，KnownAssignedXids 不一定是完整的，但对于顶级 XID 应该是完整的；这与正常运行中 PGPROC 条目的情况相同。

 * When we throw away subXIDs from KnownAssignedXids, we need to keep track of
 * that, similarly to tracking overflow of a PGPROC's subxids array.  We do
 * that by remembering the lastOverflowedXID, ie the last thrown-away subXID.
 * As long as that is within the range of interesting XIDs, we have to assume
 * that subXIDs are missing from snapshots.  (Note that subXID overflow occurs
 * on primary when 65th subXID arrives, whereas on standby it occurs when 64th
 * subXID arrives - that is not an error.)

当我们从 knownAssignedXids 中丢弃 subXIDs 时，我们需要跟踪它，类似于跟踪 PGPROC 的 subxids 数组的溢出。我们通过记住 lastOverflowedXID 来做到这一点，即最后一个丢弃的 subXID。只要它在有趣的 XID 范围内，我们就必须假设快照中缺少 subXID。（请注意，当第 65 个 subXID 到达时，subXID 溢出发生在主服务器上，而在待机时它发生在第 64 个 subXID 到达时 - 这不是错误。）

 * Should a backend on primary somehow disappear before it can write an abort
 * record, then we just leave those XIDs in KnownAssignedXids. They actually
 * aborted but we think they were running; the distinction is irrelevant
 * because either way any changes done by the transaction are not visible to
 * backends in the standby.  We prune KnownAssignedXids when
 * XLOG_RUNNING_XACTS arrives, to forestall possible overflow of the
 * array due to such dead XIDs.
 */

如果主服务器上的后端在它可以写入中止记录之前以某种方式消失，那么我们只留下那些已知分配的 Xid 中的 XID。它们实际上中止了，但我们认为它们正在运行；区别是无关紧要的，因为无论哪种方式，事务所做的任何更改对于备用数据库中的后端都是不可见的。当 XLOG_RUNNING_XACTS 到达时，我们修剪 knownAssignedXids，以防止由于此类死 XID 可能导致数组溢出。

/*
 * RecordKnownAssignedTransactionIds
 *		Record the given XID in KnownAssignedXids, as well as any preceding
 *		unobserved XIDs.

在已知分配的 id 中记录给定的 XID，以及任何之前未观察到的XIDS。

 * RecordKnownAssignedTransactionIds() should be run for *every* WAL record
 * associated with a transaction. Must be called for each record after we
 * have executed StartupCLOG() et al, since we must ExtendCLOG() etc..
 *
 * Called during recovery in analogy with and in place of GetNewTransactionId()
 */

RecordKnownAssignedTransactionIds() 应该为与事务关联的每条 WAL 记录运行。必须在我们执行 StartupCLOG() 等之后为每条记录调用，因为我们必须 ExtendCLOG() 等。在恢复期间调用类似于并代替 GetNewTransactionId()

void
RecordKnownAssignedTransactionIds(TransactionId xid)
{
	Assert(standbyState >= STANDBY_INITIALIZED);
	Assert(TransactionIdIsValid(xid));
	Assert(TransactionIdIsValid(latestObservedXid));

	elog(trace_recovery(DEBUG4), "record known xact %u latestObservedXid %u",
		 xid, latestObservedXid);

	/*
	 * When a newly observed xid arrives, it is frequently the case that it is
	 * *not* the next xid in sequence. When this occurs, we must treat the
	 * intervening xids as running also.
	 */

当新观察到的 xid 到达时，通常情况下它不是顺序的下一个 xid。发生这种情况时，我们必须将中间的 xid 也视为正在运行。

	if (TransactionIdFollows(xid, latestObservedXid))
	{
		TransactionId next_expected_xid;

		/*
		 * Extend subtrans like we do in GetNewTransactionId() during normal
		 * operation using individual extend steps. Note that we do not need
		 * to extend clog since its extensions are WAL logged.
		 *
		 * This part has to be done regardless of standbyState since we
		 * immediately start assigning subtransactions to their toplevel
		 * transactions.
		 */
		next_expected_xid = latestObservedXid;
		
...

		/*
		 * Add (latestObservedXid, xid] onto the KnownAssignedXids array.
		 */
		next_expected_xid = latestObservedXid;
		TransactionIdAdvance(next_expected_xid);
		KnownAssignedXidsAdd(next_expected_xid, xid, false);

		/*
		 * Now we can advance latestObservedXid
		 */
		latestObservedXid = xid;

		/* ShmemVariableCache->nextXid must be beyond any observed xid */
		next_expected_xid = latestObservedXid;
		TransactionIdAdvance(next_expected_xid);
		LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
		if (TransactionIdFollows(next_expected_xid, ShmemVariableCache->nextXid))
			ShmemVariableCache->nextXid = next_expected_xid;
		LWLockRelease(XidGenLock);
	}
}

例如新事务100来了KnownAssignedXidsAdd具体处理过程：

head = 4 tail = 3