在第一篇中我介绍了如何访问元数据,元数据为什么在数据库里面,以及如何使用元数据。介绍了如何查出各种数据库对象的在数据库里面的名字。第二篇,我选择了触发器的主题,因为它是一个能提供很好例子的数据库对象,并且在这个对象中能够提出问题和解决问题。
本篇我将会介绍元数据中的索引,不仅仅是因为它们本身很重要,更重要的是它们是很好的元数据类型,比如列或者分布统计,这些不是元数据中的对象。
索引对于任何关系数据库表都是必不可少的。然而,就像吐司上的黄油一样,过度使用它们可能会在数据库中产生问题。有时,可以对表进行过度索引或缺失索引,或者构建重复索引。有时问题是选择一个坏的填充因子,错误地设置ignore_dup_key选项,创建一个永远不会被使用(但必须被维护)的索引,丢失外键上的索引,或者将GUID作为主键的一部分。简而言之,任何频繁使用的数据库系统中的索引都需要定期维护和验证,而目录视图是完成这些工作的最直接的方式之一。
让我们通过下面的简单语句来看一下都有哪些索引在你的数据库上,代码如下:
SELECT convert(CHAR(50),object_schema_name(t.object_ID)+'.'
+object_name(t.object_ID)) AS 'The Table', i.name AS index_name
FROM sys.indexes AS i
INNER JOIN sys.tables t
ON t.object_id=i.object_id
WHERE is_hypothetical = 0 AND i.index_id <> 0;
结果如下:
为什么要去引用sys.tables?这是因为它是确保只获得用户表的最简单方法。我们选择index_id 的values大于0,因为如果不为表创建集群索引,在sys中仍然有一个条目。索引,但它指向的是堆,不代表索引。每个表在sys中都有一行。索引值为0或1的索引。如果该表有一个聚集索引,则有一行数据且index_id值为1;如果该表是一个堆(这只是表示该表没有聚集索引的另一种方式),则会有一行的index_id值为0。此外,无论该表是否有聚集索引,每个非聚集索引都有一行,其index_id值大于1。我们过滤了的索引,这些索引是由数据库引擎优化顾问(DTA)创建的,目的仅仅是测试一个可能的索引是否有效。以防它们积累起来,最好把它们去掉。
如果你过一个多个指定的表,下面的这个查询是更为合理的,需要在上面的例子中增加对象的指定:
AND t.object_id = OBJECT_ID('Production.BillOfMaterials');
前面的表并不特别有用,因为无法一眼看到每个表有多少索引,以及它们是什么。下面这个语句可以实现:
SELECT convert(CHAR(20),object_schema_name(t.object_ID)+'.'
+object_name(t.object_ID)) AS 'The_Table',
sum(CASE WHEN i.object_ID IS NULL THEN 0 ELSE 1 END) AS The_Count,
coalesce(stuff((
SELECT ', '+i2.name
FROM sys.indexes i2
WHERE t.object_ID = i2.object_ID
ORDER BY i2.name
FOR XML PATH(''), TYPE).value(N'(./text())[1]',N'varchar(8000)'),1,2,''),'') AS Index_List
FROM sys.tables AS t
LEFT OUTER JOIN sys.indexes i
ON t.object_id=i.object_id
AND is_hypothetical = 0 AND i.index_id > 0
GROUP BY t.Object_ID;
我在老的测试数据库上执行这个测试,对象名称比较短。
The_Table The_Count Index_List
-------------------- ----------- --------------------------------------------------
dbo.publishers 1 UPKCL_pubind
dbo.titles 2 titleind, UPKCL_titleidind
dbo.titleauthor 3 auidind, titleidind, UPKCL_taind
dbo.stores 1 UPK_storeid
dbo.sales 2 titleidind, UPKCL_sales
dbo.roysched 1 titleidind
dbo.discounts 0
dbo.jobs 1 PK__jobs__6E32B6A51A14E395
dbo.pub_info 1 UPKCL_pubinfo
dbo.employee 2 employee_ind, PK_emp_id
dbo.authors 2 aunmind, UPKCL_auidind
(11 row(s) affected)
关于索引,您可以找到很多有趣的东西。例如,这里有一种快速查找表的方法,无需使用聚集索引(堆)
-- 展示所有没有聚集索引的表名称
SELECT object_schema_name(sys.tables.object_id)+'.'
+object_name(sys.tables.object_id) AS 'Heaps'
FROM sys.indexes /* see whether the table is a heap */
INNER JOIN sys.tables ON sys.tables.object_ID=sys.indexes.object_ID
WHERE sys.indexes.type = 0;
通过连接sys.partitions视图,我们可以计算出索引中大约有多少行。我修改了一些代码,关联了sys.extended_properties,这样可以把备注的信息带出来。
--列出每个索引/堆的行数
SELECT
OBJECT_SCHEMA_NAME(t.object_id)+'.'+OBJECT_NAME(t.object_id) as 'Table',
coalesce(i.NAME,'(IAM for heap)') as 'Index',
Coalesce(
(SELECT SUM(s.rows) FROM sys.partitions s WHERE s.object_id = i.object_id
AND s.index_id = i.index_ID
), 0) 'Rows',coalesce(ep.Value,'') as comments
FROM sys.tables t
INNER JOIN sys.indexes i ON i.object_id = t.object_id
LEFT OUTER JOIN sys.Extended_Properties ep
ON i.Object_Id = ep.Major_Id AND i.Index_Id = Minor_Id AND Class = 7;
然后,你可以修改这个代码,让其只是展示每个在索引表中的表有多少行。
SELECT
OBJECT_SCHEMA_NAME(t.object_id)+'.'+OBJECT_NAME(t.object_id) AS 'Table',
sum(rows) AS row_count
FROM sys.partitions p INNER JOIN sys.tables t
ON p.object_ID=t.object_ID
WHERE index_id < 2 GROUP BY t.object_ID,Index_ID;
如果您对某些表具有大量索引感到怀疑,那么可以使用下面查询,该查询告诉您具有超过4个索引和索引计数超过列计数一半的表。它是一种任意选择具有大量索引的表的方法。
--超过4个索引的表
--索引个数超过列数一半
SELECT object_schema_name(TheIndexes.Object_ID) + '.'+ object_name(TheIndexes.Object_ID) AS TableName,
Columns, Indexes
FROM
(SELECT count(*) AS indexes, t.object_ID
FROM sys.indexes i
INNER JOIN sys.tables t
ON i.object_ID=t.object_ID
GROUP BY t.object_ID) TheIndexes
INNER JOIN
(SELECT count(*) AS columns, t.object_ID
FROM sys.columns c
INNER JOIN sys.tables t
ON c.object_ID=t.object_ID
GROUP BY t.object_ID)TheColumns
ON TheIndexes.object_ID=TheColumns.object_ID
WHERE indexes>columns/2 AND indexes>4;
总是有必要找出自上次启动服务器以来没有使用的索引,特别是如果服务器一直在做各种各样的工作时。
--Indexes updated but not read.
SELECT
object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) as Thetable,
i.name AS 'Index'
FROM sys.indexes i
left outer join sys.dm_db_index_usage_stats s
ON s.object_id = i.object_id
AND s.index_id = i.index_id
AND s.database_id = DB_ID()
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
AND i.index_id > 0 --Exclude heaps.
AND i.is_primary_key = 0 --and Exclude primary keys.
AND i.is_unique = 0 --and Exclude unique constraints.
AND coalesce(s.user_lookups + s.user_scans + s.user_seeks,0) = 0 --No user reads.
AND coalesce(s.user_updates,0) > 0; --Index is being updated.
注意:我已经在代码里使用了动态管理视图sys.dm_db_index_usage_stats,这里起到了手机使用信息的作用,之后我们会更详尽的使用换这个对象来说明其作用。
如果打算知道索引占了多少空间,有许多‘胖’索引,就是包含了很多列,有可能索引中有的列不会出现在任何查询中,这就是浪费了空间。
SELECT
object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) AS Thetable,
coalesce(i.name,'heap IAM') AS 'Index',
convert(DECIMAL(9,2),(sum(a.total_pages) * 8.00) / 1024.00) AS 'Index_MB'
FROM sys.indexes i
INNER JOIN sys.partitions p
ON i.object_id = p.object_id
AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a
ON p.partition_id = a.container_id
WHERE objectproperty(i.object_id, 'IsUserTable') = 1
GROUP BY i.object_id, i.index_id, i.name;
让我们看看每个表的总索引空间,以及表中的行数。
SELECT
object_schema_name(i.Object_ID) + '.'+ object_name(i.Object_ID) AS Thetable,
convert(DECIMAL(9,2),(sum(a.total_pages) * 8.00) / 1024.00) AS 'Index_MB',
max(row_count) AS 'Rows',
count(*) AS Index_count
FROM sys.indexes i
INNER JOIN
(SELECT object_ID,Index_ID, sum(rows) AS Row_count
FROM sys.partitions GROUP BY object_ID,Index_ID)f
ON f.object_ID=i.object_ID AND f.index_ID=i.index_ID
INNER JOIN sys.partitions p
ON i.object_id = p.object_id
AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a
ON p.partition_id = a.container_id
WHERE objectproperty(i.object_id, 'IsUserTable') = 1
GROUP BY i.object_id;
发现关于索引的某些属性,通常最好使用属性函数作为快捷方式。
-- 查询没有主键的表
SELECT object_schema_name(object_id)+'.'+object_name(object_id) as No_Primary_key
FROM sys.tables/* see whether the table has a primary key */
WHERE objectproperty(OBJECT_ID,'TableHasPrimaryKey') = 0;
-- 查询没有索引的表
SELECT object_schema_name(object_id)+'.'+object_name(object_id) as No_Indexes
FROM sys.tables /* see whether the table has any index */
WHERE objectproperty(OBJECT_ID,'TableHasIndex') = 0;
-- )查询没有候选键的表
SELECT object_schema_name(object_id)+'.'+object_name(object_id) as No_Candidate_Key
FROM sys.tables/* if no unique constraint then it isn't relational */
WHERE objectproperty(OBJECT_ID,'TableHasUniqueCnst') = 0
AND objectproperty(OBJECT_ID,'TableHasPrimaryKey') = 0;
--查询带有禁用索引的表
SELECT distinct
object_schema_name(object_id)+'.'+object_name(object_id) as Has_Disabled_indexes
FROM sys.indexes /* don't leave these lying around */
WHERE is_disabled=1;
你可能注意到了一些奇怪的事情。尽管表的一些属性(如主键)本身就是对象,但列、统计或索引并非对象。让我们弄清楚这一点,因为它不是完全直观的体现在sys.objects,您可以找到关于所有公共数据库组件的基本标准信息,如表、视图、同义词、外键、检查约束、键约束、默认约束、服务队列、触发器和过程。我列出的所有这些组件都有其他属性,这些属性必须通过继承相关基本属性的视图可见,但也包括与对象相关的数据列。最好使用这些特殊的视图,因为它们有您需要的所有信息,系统只过滤您感兴趣的对象类型,比如表。各种对象(如约束和触发器)在sys.objects中都有parent_ID,非零的对象表,显示它们是子对象。
下面的查询向您展示了一种查看这些子对象并将其与父母关联的简单方法。
--查询索引父对象(表名)和索引名称
SELECT parent.name AS Parents_name,
child.name AS Childs_Name,
replace(lower(parent.type_desc),'_',' ') AS Parents_type,
replace(lower(child.type_desc),'_',' ') AS Childs_type
FROM sys.objects child
INNER JOIN sys.objects parent
ON parent.object_ID=child.parent_object_id
WHERE child.parent_object_id<>0
ORDER BY parents_name;
.
你会发现索引不是对象。在第一个查询中,返回的object_ID是定义索引的表的ID。
这里的问题是关系是复杂的。约束可以包含几个列,也可以由索引强制。索引可以包含几个列,但是顺序很重要。统计数据还可以包含几个列,也可以与索引相关联。这意sys.indexes, sys.stats and sys.columns不从sys.objects继承。参数和类型也是如此。
最简单的查询方式如下:
SELECT object_schema_name(t.object_ID)+'.'+t.name AS The_Table, --the name of the table
i.name AS The_Index, -- its index
index_column_id,
col_name(Ic.Object_Id, Ic.Column_Id) AS The_Column --the column
FROM sys.tables t
INNER JOIN sys.indexes i
ON t.object_ID=i.object_ID
INNER JOIN sys.Index_columns ic
ON i.Object_ID=ic.Object_ID
AND i.index_ID=ic.index_ID
ORDER BY t.name,i.index_id, index_column_id;
当然也可以指定特定表,例如:
WHERE i.object_id = OBJECT_ID('Production.BillOfMaterials');
也可以汇总上面语句,每个索引汇总成一行,展示所有索引,具体代码如下:
SELECT object_schema_name(t.object_ID)+'.'+t.name AS The_Table, --the name of the table
coalesce(stuff (--get a list of indexes
(SELECT ', '+i.name
+' ( '
+stuff (--get a list of columns
(SELECT ', ' + col_name(Ic.Object_Id, Ic.Column_Id)
FROM sys.Index_columns ic
WHERE ic.Object_ID=i.Object_ID
AND ic.index_ID=i.index_ID
ORDER BY index_column_ID ASC
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,'') +' )'
FROM sys.indexes i
WHERE i.object_ID=t.object_ID
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,''),'') AS Indexes
FROM sys.tables t;
效果如下:
XML索引被视为索引的扩展。我发现查看其细节的最好方法是为它们构建一个CREATE语句。
SELECT 'CREATE' + case when secondary_type is null then ' PRIMARY' else '' end
+ ' XML INDEX '+coalesce(xi.name,'')+ '
ON ' --what table and column is this XML index on?
+ object_schema_name(ic.Object_ID)+'.'+object_name(ic.Object_ID)
+' ('+col_name(Ic.Object_Id, Ic.Column_Id)+' )
'+ coalesce('USING XML INDEX [' + Using.Name + '] FOR ' + Secondary_Type_DeSc
COLLATE database_default,'')
+'
'+ replace('WITH ( ' +
stuff(
CASE WHEN xi.Is_Padded <> 0 THEN ', PAD_INDEX = ON ' ELSE '' END
+ CASE
WHEN xi.Fill_Factor NOT IN (0, 100)
THEN ', FILLFACTOR =' + convert(VARCHAR(3), xi.Fill_Factor) + ''
ELSE '' END
+ CASE WHEN xi.Ignore_dUp_Key <> 0 THEN ', IGNORE_DUP_KEY = ON' ELSE '' END
+ CASE WHEN xi.Allow_Row_Locks = 0 THEN ', ALLOW_ROW_LOCKS = OFF' ELSE '' END
+ CASE WHEN xi.Allow_Page_Locks = 0 THEN ', ALLOW_PAGE_LOCKS = OFF' ELSE ' ' END
, 1, 1, '')
+ ')', 'WITH ( )', '') --create the list of xml index options
+ coalesce('/* '+convert(varchar(8000),Value)+ '*/','')--and any comment
AS BuildScript
FROM sys.xml_Indexes xi
inner join sys.index_columns ic
ON ic.Index_Id = xi.Index_Id
AND ic.Object_Id = xi.Object_Id
LEFT OUTER JOIN sys.Indexes [USING]
ON [USING].Index_Id = xi.UsIng_xml_Index_Id
AND [USING].Object_Id = xi.Object_Id
LEFT OUTER JOIN sys.Extended_Properties ep
ON ic.Object_Id = ep.Major_Id AND ic.Index_Id = Minor_Id AND Class = 7
WHERE object_schema_name(ic.Object_ID) <>'sys' AND ic.index_id>0;
上面的查询结果将显示所有基本的XML索引细节作为构建脚本。
还有两种比较特殊的索引,一是空间索引,其信息在sys.spatial_index_tessellations 和 sys.spatial_indexes表中。另一个是全文索引,其信息在fulltext_index_fragments, fulltext_index_catalog_usages, fulltext_index_columns 和fulltext_indexes表中保存。
现在,让我们讨论一下分布统计数据或“stats”。每个索引都有一个附加的统计对象,以便查询优化器能够提供一个合适的查询计划。为此,它需要估计数据的“基数”,以确定为任何索引值返回多少行,并使用这些“stats”对象告诉它数据是如何分布的。
可以查询统计信息对象是如何与表进行关联的,语句如下:
SELECT object_schema_name(t.Object_ID) + '.'+ t.name AS The_table,
stats.name AS Stats_Name, sys.columns.name AS Column_Name
FROM sys.stats
INNER JOIN sys.stats_columns
ON stats.object_id = stats_columns.object_id
AND stats.stats_id = stats_columns.stats_id
INNER JOIN sys.columns
ON stats_columns.object_id = columns.object_id
AND stats_columns.column_id = columns.column_id
INNER JOIN sys.tables t
ON stats.object_id = t.object_id;
当它们与索引相关联时,统计数据继承索引的名称,并使用与索引相同的列。
通过比较与每个统计信息相关联的列号列表,您可以快速查看同一列或一组列是否有多个统计信息。
SELECT object_schema_name(Object_ID)+'.'+object_name(Object_ID) as tableName,
count(*) as Similar, ColumnList as TheColumn,
max(name)+', '+min(name) as duplicates
FROM
(SELECT Object_ID, name,
stuff (--get a list of columns
(SELECT ', ' + col_name(sc.Object_Id, sc.Column_Id)
FROM sys.stats_columns sc
WHERE sc.Object_ID=s.Object_ID
AND sc.stats_ID=s.stats_ID
ORDER BY stats_column_ID ASC
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'),1,2,'') AS ColumnList
FROM sys.stats s)f
GROUP BY Object_ID,ColumnList
HAVING count(*) >1;
结果如下:
展示了包含重复的统计对象,在本例中是sales.customer表在AccountNumber列上有两个类似的统计对象。
在数据库中有很多有价值的信息都在索引上。一旦表的数量变大,很容易让表出现一些问题,比如无意中没有聚集索引或主键,或者有重复的索引或不必要的统计信息等。我们通过掌握如何查询这些索引的动态视图后能够快速查询定位使用表的信息,方便我们预防和解决这类问题,这些基础方法已经在DBA和数据库开发的工作中变得越来越重要了,