首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >比较列存储索引与行索引

比较列存储索引与行索引

作者头像
用户1217611
发布2018-01-30 16:27:08
1.5K0
发布2018-01-30 16:27:08
举报
文章被收录于专栏:文渊之博文渊之博

原因:

    之前已经写过一篇关于列存储索引的简介https://cloud.tencent.com/developer/article/1032222,很粗糙但是基本阐明了列存储索引的好处。为了更好的理解列存储索引,接下来我们一起通过列存储索引与传统的行存储索引地对比2014中的列存储索引带来了哪些改善。由于已经很多介绍列存储,因此这里我仅就性能的改进进行重点说明。

测试场景

    我创建了5个测试,尽量保证测试环境避免来自外界的重负载进而影响到结果。测试结果基于两个独立的表,分别是:

  • FactTransaction_ColumnStore - 这个表仅有一个聚集列存储索引,由于列存储索引的限制,该表不再有其他索引。
  • FactTransaction_RowStore - 该表将包含一个聚集索引和一个非聚集列存储索引和一个非聚集行存储索引。

    首先我用脚本文件创建表和索引,然后用30m行数据填充到三个表中。由于所有的测试我都制定了最大并行度的hint ,因此可以指定内核数量来查询。

测试1-填充表

   为了更好地测试,一个表由列存储索引构成,而另一个表仅有行存储索引构成。填充数据来自于另一个表'FactTransaction'。

IO 和时间统计

Table 'FactTransaction_ColumnStore'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

  (30000000 row(s) affected)

SQL Server Execution Times:  CPU time = 98204 ms,  elapsed time = 109927 ms.

Table ' FactTransaction_RowStore '. Scan count 0, logical reads 98566047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 (30000000 row(s) affected)

SQL Server Execution Times:  CPU time = 111375 ms,  elapsed time = 129609 ms.
观察测试

表名

填充时间

逻辑读

FacTransaction_ColumnStore

1.49 mins

0

FacTransaction_RowStore

2.09 mins

98566047

测试2-比较搜索

   注意这里在行存储索引上我指定表的hint,迫使表通过索引查找。

-- Comparing Seek.... 
SET Statistics IO,TIME ON

Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESEEK)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

Select CustomerFK
From [dbo].FactTransaction_ColumnStore  
Where transactionSK = 4000000
OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF

IO 和时间统计

Table 'FactTransaction_RowStore'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 0 ms.

Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 2510, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:    CPU time = 0 ms,  elapsed time = 83 ms.

执行计划

观察测试2

正如上图所示,行存储索引表的索引查找远比列存储索引表查询快的多。这主要归因于2014的sqlserver不支持聚集列存储索引的索引查找。执行计划对比图中一个是索引扫描导致更多的逻辑读,因此导致了性能的下降。

表名

索引类型

逻辑读

运行时间

FacTransaction_ColumnStore

Column

714

83 ms

FacTransaction_RowStore

Row

3

0 ms

Test 3 - Comparing SCAN

   注意这次我指定的hint都是索引扫描,当然列存储索引上优化器默认为索引扫描。

-- Comparing Scan.... 
SET Statistics IO,TIME ON

Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

Select CustomerFK
From [dbo].FactTransaction_ColumnStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF

IO 和时间统计

Table 'FactTransaction_RowStore'. Scan count 1, logical reads 12704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
  CPU time = 32 ms,  elapsed time = 22 ms.

Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
  CPU time = 0 ms,  elapsed time = 2 ms. 

执行计划

观察测试3

   正如之前提到的,索引扫描列存储要比行存储快,俩个逻辑读和运行时间表明列存储索引在大表扫描上是更优的方式,因此更适合于数据仓库的表。

表名

索引类型

逻辑读

运行时间

FacTransaction_ColumnStore

Column

714

2 ms

FacTransaction_RowStore

Row

12704

22 ms

测试4-聚合查询

    测试行存储表使用基于聚集索引。

SET Statistics IO,TIME ON

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_FactTransaction)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

   测试行存储表,使用CustomerFK 和BrandFK的索引。(覆盖索引)

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_CustomerFK_BrandFK)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

    测试行存储索引使用CustomerFK 和BrandFK的列存储索引(覆盖索引)

Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=ColumnStore_CustomerFK_BrandFK) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)

Test on the columnstore table using the Clustered Index.

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_ColumnStore]
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

SET Statistics IO,TIME OFF
IO 和时间统计

    使用基于聚集索引查询行存储的表。

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 45977, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 9516 ms,  elapsed time = 2645 ms.

   使用行存储的非聚集索引测试行存储表。(覆盖索引)

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 71204, physical reads 0, read-ahead reads 2160, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 5343 ms,  elapsed time = 1833 ms.

   使用非聚集列存储索引测试行存储表。(覆盖索引)

Table 'FactTransaction_RowStore'. Scan count 4, logical reads 785, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 141 ms,  elapsed time = 63 ms.

    使用聚集索引测试列存储表。

Table 'FactTransaction_ColumnStore'. Scan count 4, logical reads 723, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:  CPU time = 203 ms,  elapsed time = 118 ms.
执行计划
观察测试4

   这里才是列存储索引开始“闪耀”的地方。两个列存储索引的表查询要比传统的航索引在逻辑读和运行时间上性能好得多。

表名

索引使用

索引类型

逻辑读

运行时间

FacTransaction_ColumnStore

ClusteredColumnStore

Column

717

118

FacTransaction_RowStore

RowStore_FactTransaction

Row

45957

2645

FacTransaction_RowStore

RowStore_CustomerFK_BrandFK

Row

71220

1833

FacTransaction_RowStore

ColumnStore_CustomerFK_BrandFK

Column

782

63

测试5-比较更新(数据子集)

   这个测试中,我将更新少于100m行数据,占总数据的30分之一。

SET Statistics IO,TIME ON

Update [dbo].[FactTransaction_ColumnStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112
OPTION (MAXDOP 1)

Update [dbo].[FactTransaction_RowStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112

OPTION (MAXDOP 1)

SET Statistics IO,TIME OFF
IO 和时间统计
Table 'FactTransaction_ColumnStore'. Scan count 2, logical reads 2020, physical reads 0, read-ahead reads 2598, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(913712 row(s) affected)

SQL Server Execution Times:  CPU time = 27688 ms,  elapsed time = 37638 ms.

Table 'FactTransaction_RowStore'. Scan count 1, logical reads 2800296, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(913712 row(s) affected)

SQL Server Execution Times:  CPU time = 6812 ms,  elapsed time = 6819 ms.
执行计划
观察测试5

  在这种情况下 ,列存储索引的表要比行存储的更新慢的多。

表名

索引类型

逻辑读

运行时间

FacTransaction_ColumnStore

Column

2020

37638 ms

FacTransaction_RowStore

Row

2800296

6819 ms

    注意对于行存储表逻辑读还是要比行存储的要多很多。这是归因于列存储索引的压缩比率更高,因此占用更少的内存。

总结

    列存储索引(包含聚集和非聚集)提供了大量的优势。但是在数据仓库上使用还是要做好准备工作。一种合适地使用情况是非聚集索引不能被更新且禁用对底层表的更新。如果是巨大且没有分区的表,可能存在一个问题,整个表的索引每次都会被重建,因此如果表是巨大的则禁止使用列存储索引。因此必须要有好的分区策略来支持这种索引。

   有几个应用列存储索引的地方:事实表的聚合、Fast Track Data Warehouse Servers、恰当环境SSAS的Cube…

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2016-07-18 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 原因:
  • 总结
相关产品与服务
云数据库 SQL Server
腾讯云数据库 SQL Server (TencentDB for SQL Server)是业界最常用的商用数据库之一,对基于 Windows 架构的应用程序具有完美的支持。TencentDB for SQL Server 拥有微软正版授权,可持续为用户提供最新的功能,避免未授权使用软件的风险。具有即开即用、稳定可靠、安全运行、弹性扩缩等特点。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档