我们有一个sql服务器数据库,用作数据池和数据仓库。数据库中的每个表都有一些标准化的定义,因为我们现在有大约600个表,所以维护需要一定程度的自动化。
加载每个表的一般过程是首先将表的副本加载到changeLog模式中的哈希表中(有时只有更改的记录(如果我们能够确定更改的记录是什么),然后将changeLog表与目标表进行比较。目标表用于报告,因此这种changeLog方法允许我们持久化目标表,并且只应用最小的更新/插入操作。
每个目标表都有一个唯一的键/业务密钥,可以通过配置表来标识,并且每个表中的名称都是相同的标准化审计列。审计栏告诉我们
更改后的记录标识符以前是CHECKSUM(),但我们发现CHECKSUM()的碰撞率太高,无法信赖。我刚刚在每个表中添加了一个HASHBYTES()列并填充了它。
我将HASHBYTES()列创建为VARBINARY(MAX)。现在,每次加载表时,我们都可以通过将changeLog表中计算的新HASHBYTES()值与目标表中持久化的HASHBYTES()值进行比较,判断是否需要更新记录。
我立即注意到,从INT ()切换到VARBINARY(MAX) HASHBYTES()导致更新检查过程明显减慢。我在每个校验和列上没有索引,但在我刚才添加的HASHBYTES列上没有索引。每个表的唯一键也有聚集索引。
希望这是一个足够的背景,使这个问题有意义。我需要尽快加快这一进程。
编辑:我正在添加一个大型的SQL脚本作为示例,它具有表的changeLog版本和目标版本的示例表定义以及为更新目标版本而运行的查询。
--OBJECT DEFINITIONS
CREATE TABLE [changeLog].[DimSalesOffice](
[Sales Office Code] [varchar](3) NOT NULL,
[CCN_Key] [uniqueidentifier] NOT NULL,
[Sales Office Name (Short)] [nvarchar](14) NOT NULL,
[Sales Office Name (Long)] [nvarchar](50) NOT NULL,
[Sales Office City] [nvarchar](50) NOT NULL,
[Sales Office StateProvince] [nvarchar](50) NOT NULL,
[Sales Office Postal Code] [nvarchar](20) NOT NULL,
[Sales Office Country Code] [varchar](3) NOT NULL,
[Sales Office Address] [nvarchar](65) NOT NULL,
[Sales Office Address (Line 2)] [nvarchar](65) NOT NULL,
[Sales Office Name (Short - Native Language)] [nvarchar](14) NOT NULL,
[Sales Office Name (Long - Native Language)] [nvarchar](50) NOT NULL,
[Sales Office Address (Native Language)] [nvarchar](65) NOT NULL,
[Sales Office Address (Line 2 - Native Language)] [nvarchar](65) NOT NULL,
[Sales Office City (Native Language)] [nvarchar](100) NOT NULL,
[Sales Office StateProvince (Native Language)] [nvarchar](100) NOT NULL,
[Sales Office Region] [nvarchar](100) NULL,
[Native Language Code] [varchar](2) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DimSalesOffice](
[SalesOffice_Key] [uniqueidentifier] NOT NULL,
[Sales Office Code] [varchar](3) NOT NULL,
[CCN_Key] [uniqueidentifier] NOT NULL,
[Sales Office Name (Short)] [nvarchar](14) NOT NULL,
[Sales Office Name (Long)] [nvarchar](50) NOT NULL,
[Sales Office City] [nvarchar](50) NOT NULL,
[Sales Office StateProvince] [nvarchar](50) NOT NULL,
[Sales Office Postal Code] [nvarchar](20) NOT NULL,
[Sales Office Country Code] [varchar](3) NOT NULL,
[Sales Office Address] [nvarchar](65) NOT NULL,
[Sales Office Address (Line 2)] [nvarchar](65) NOT NULL,
[Sales Office Name (Short - Native Language)] [nvarchar](14) NOT NULL,
[Sales Office Name (Long - Native Language)] [nvarchar](50) NOT NULL,
[Sales Office Address (Native Language)] [nvarchar](65) NOT NULL,
[Sales Office Address (Line 2 - Native Language)] [nvarchar](65) NOT NULL,
[Sales Office City (Native Language)] [nvarchar](100) NOT NULL,
[Sales Office StateProvince (Native Language)] [nvarchar](100) NOT NULL,
[Sales Office Region] [nvarchar](100) NULL,
[Native Language Code] [varchar](2) NOT NULL,
[DW_CreatedOn] [datetime2](7) NULL,
[DW_ModifiedOn] [datetime2](7) NULL,
[DW_IsDeleted?] [bit] NULL,
[DW_Checksum] [int] NULL,
[Source_ModifiedOn] [datetime2](7) NULL,
[DW_Hashbytes] [varbinary](max) NULL,
PRIMARY KEY NONCLUSTERED
(
[SalesOffice_Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[DimSalesOffice] ADD DEFAULT (newsequentialid()) FOR [SalesOffice_Key]
GO
CREATE UNIQUE CLUSTERED INDEX [IX_UK_DimSalesOffice] ON [dbo].[DimSalesOffice]
(
[Sales Office Code] ASC
)
GO
--MERGE QUERY
DECLARE @InsertRecordCount INT, @UpdateRecordCount INT;
/*****UPDATE*****/
UPDATE [dbo].[DimSalesOffice] SET
[CCN_Key] = [Source].[CCN_Key],
[Sales Office Name (Short)] = [Source].[Sales Office Name (Short)],
[Sales Office Name (Long)] = [Source].[Sales Office Name (Long)],
[Sales Office City] = [Source].[Sales Office City],
[Sales Office StateProvince] = [Source].[Sales Office StateProvince],
[Sales Office Postal Code] = [Source].[Sales Office Postal Code],
[Sales Office Country Code] = [Source].[Sales Office Country Code],
[Sales Office Address] = [Source].[Sales Office Address],
[Sales Office Address (Line 2)] = [Source].[Sales Office Address (Line 2)],
[Sales Office Name (Short - Native Language)] = [Source].[Sales Office Name (Short - Native Language)],
[Sales Office Name (Long - Native Language)] = [Source].[Sales Office Name (Long - Native Language)],
[Sales Office Address (Native Language)] = [Source].[Sales Office Address (Native Language)],
[Sales Office Address (Line 2 - Native Language)] = [Source].[Sales Office Address (Line 2 - Native Language)],
[Sales Office City (Native Language)] = [Source].[Sales Office City (Native Language)],
[Sales Office StateProvince (Native Language)] = [Source].[Sales Office StateProvince (Native Language)],
[Sales Office Region] = [Source].[Sales Office Region],
[Native Language Code] = [Source].[Native Language Code],
[DW_Checksum] =
CHECKSUM(
[Source].[CCN_Key],
[Source].[Sales Office Name (Short)],
[Source].[Sales Office Name (Long)],
[Source].[Sales Office City],
[Source].[Sales Office StateProvince],
[Source].[Sales Office Postal Code],
[Source].[Sales Office Country Code],
[Source].[Sales Office Address],
[Source].[Sales Office Address (Line 2)],
[Source].[Sales Office Name (Short - Native Language)],
[Source].[Sales Office Name (Long - Native Language)],
[Source].[Sales Office Address (Native Language)],
[Source].[Sales Office Address (Line 2 - Native Language)],
[Source].[Sales Office City (Native Language)],
[Source].[Sales Office StateProvince (Native Language)],
[Source].[Sales Office Region],
[Source].[Native Language Code],
0
),
[DW_Hashbytes] =
HASHBYTES(
'SHA2_256',
ISNULL(CAST([Source].[CCN_Key] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Postal Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Country Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2 - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Region] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Native Language Code] AS NVARCHAR(MAX)), '') + '|'
+ '0'
),
[Source_ModifiedOn] = NULL,
[DW_ModifiedOn] = GETUTCDATE(),
[DW_IsDeleted?] = 0
FROM [changeLog].[DimSalesOffice] [Source]
JOIN [dbo].[DimSalesOffice]
ON [Source].[Sales Office Code] = [DimSalesOffice].[Sales Office Code]
AND ISNULL([DimSalesOffice].[DW_Hashbytes], HASHBYTES('SHA2_256', '')) <> HASHBYTES(
'SHA2_256',
ISNULL(CAST([Source].[CCN_Key] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Postal Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Country Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2 - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Region] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Native Language Code] AS NVARCHAR(MAX)), '') + '|'
+ '0'
)
SET @UpdateRecordCount = @@ROWCOUNT;
/*****Soft Deletes*****/
UPDATE [dbo].[DimSalesOffice] SET
[DW_Checksum] = 0,
[DW_Hashbytes] =
HASHBYTES(
'SHA2_256',
ISNULL(CAST([DimSalesOffice].[CCN_Key] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Name (Short)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Name (Long)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office City] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office StateProvince] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Postal Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Country Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Address] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Address (Line 2)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Name (Short - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Name (Long - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Address (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Address (Line 2 - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office City (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office StateProvince (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Sales Office Region] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([DimSalesOffice].[Native Language Code] AS NVARCHAR(MAX)), '') + '|'
+ '1'
),
[Source_ModifiedOn] = NULL,
[DW_ModifiedOn] = GETUTCDATE(),
[DW_IsDeleted?] = 1
FROM [dbo].[DimSalesOffice]
WHERE NOT EXISTS
(
SELECT 1
FROM [changeLog].[DimSalesOffice] [Source]
WHERE [Source].[Sales Office Code] = [DimSalesOffice].[Sales Office Code]
)
SET @UpdateRecordCount = @UpdateRecordCount + @@ROWCOUNT;
/*****INSERT*****/
INSERT INTO [dbo].[DimSalesOffice]
(
[Sales Office Code],
[CCN_Key],
[Sales Office Name (Short)],
[Sales Office Name (Long)],
[Sales Office City],
[Sales Office StateProvince],
[Sales Office Postal Code],
[Sales Office Country Code],
[Sales Office Address],
[Sales Office Address (Line 2)],
[Sales Office Name (Short - Native Language)],
[Sales Office Name (Long - Native Language)],
[Sales Office Address (Native Language)],
[Sales Office Address (Line 2 - Native Language)],
[Sales Office City (Native Language)],
[Sales Office StateProvince (Native Language)],
[Sales Office Region],
[Native Language Code],
[DW_Checksum],
[DW_Hashbytes],
[Source_ModifiedOn],
[DW_ModifiedOn],
[DW_IsDeleted?],
[DW_CreatedOn]
)
SELECT
[Source].[Sales Office Code],
[Source].[CCN_Key],
[Source].[Sales Office Name (Short)],
[Source].[Sales Office Name (Long)],
[Source].[Sales Office City],
[Source].[Sales Office StateProvince],
[Source].[Sales Office Postal Code],
[Source].[Sales Office Country Code],
[Source].[Sales Office Address],
[Source].[Sales Office Address (Line 2)],
[Source].[Sales Office Name (Short - Native Language)],
[Source].[Sales Office Name (Long - Native Language)],
[Source].[Sales Office Address (Native Language)],
[Source].[Sales Office Address (Line 2 - Native Language)],
[Source].[Sales Office City (Native Language)],
[Source].[Sales Office StateProvince (Native Language)],
[Source].[Sales Office Region],
[Source].[Native Language Code],
[DW_Checksum] =
CHECKSUM(
[Source].[CCN_Key],
[Source].[Sales Office Name (Short)],
[Source].[Sales Office Name (Long)],
[Source].[Sales Office City],
[Source].[Sales Office StateProvince],
[Source].[Sales Office Postal Code],
[Source].[Sales Office Country Code],
[Source].[Sales Office Address],
[Source].[Sales Office Address (Line 2)],
[Source].[Sales Office Name (Short - Native Language)],
[Source].[Sales Office Name (Long - Native Language)],
[Source].[Sales Office Address (Native Language)],
[Source].[Sales Office Address (Line 2 - Native Language)],
[Source].[Sales Office City (Native Language)],
[Source].[Sales Office StateProvince (Native Language)],
[Source].[Sales Office Region],
[Source].[Native Language Code],
0
),
[DW_Hashbytes] =
HASHBYTES(
'SHA2_256',
ISNULL(CAST([Source].[CCN_Key] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Postal Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Country Code] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Short - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Name (Long - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Address (Line 2 - Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office City (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office StateProvince (Native Language)] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Sales Office Region] AS NVARCHAR(MAX)), '') + '|'
+ ISNULL(CAST([Source].[Native Language Code] AS NVARCHAR(MAX)), '') + '|'
+ '0'
),
[Source_ModifiedOn] = NULL,
[DW_ModifiedOn] = GETUTCDATE(),
[DW_IsDeleted?] = 0,
[DW_CreatedOn] = GETUTCDATE()
FROM [changeLog].[DimSalesOffice] [Source]
WHERE NOT EXISTS
(
SELECT 1
FROM [dbo].[DimSalesOffice]
WHERE [Source].[Sales Office Code] = [DimSalesOffice].[Sales Office Code]
)
SET @InsertRecordCount = @@ROWCOUNT;
SELECT [Update Record Count] = @UpdateRecordCount, [Insert Record Count] = @InsertRecordCount;
发布于 2022-06-16 12:08:07
相反的回答,因为它有更多的语言意义:
正如我在注释中提到的,SHA2_256 means
它将输出哈希为256位,也就是32字节(8位1字节)的长度,这意味着您所需要的最大VARBINARY
大小是VARBINARY(32)
。在HASBYTES()
的docs中提到了这一点:
输出符合算法标准: MD2、MD4和MD5为128位(16字节);SHA和SHA1为160位(20字节);SHA2_256为256位(32字节),SHA2_512为512位(64字节)。
因此,是的,您可以安全地将VARBINARY
的大小降到VARBINARY(32)
。
是的,在缩小VARBINARY
字段的大小之后,可以将其添加到表中的索引中。我推荐一个非聚集索引,该索引以主键字段为首,然后在定义中将HASHBYTES()
计算字段放在第二位。
之所以使其非聚集,是因为您通常希望避免索引中的热列,特别是与每个非聚集索引一起存储的聚集索引。频繁更新的热列会导致大量写入索引,而在聚集索引的情况下,这些写入也必须发生在每个非聚集索引上(因为聚集索引存储在其中)。所有列上的行散列肯定会频繁更改。
使用主键字段进行引导是有意义的,因为您首先希望通过该字段连接以匹配相同的行,然后通过HASHBYTES()
计算字段来检查它们是否不同。
如何改进哈希字节计算?我在考虑建立一个计算列的索引。
是的,您可以使用计算列(甚至不需要持久化,但可以是)并对其进行索引,或者您也可以创建索引视图,这两者我都做过了。我首先为计算列拍摄,因为它比索引视图稍微灵活一些,将表本身的行中的实际行哈希耦合起来,并且是要管理的少一个对象。
发布于 2022-06-16 01:00:02
HASHBYTES的输出长度取决于所使用的算法。SHA2_256产生256个位或32个字节。这在文件里。将列声明为binary(32)
是可以的。我目前工作的系统就是这样做的。
https://dba.stackexchange.com/questions/313401
复制相似问题