前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >C-Store: A Column-oriented DBMS

C-Store: A Column-oriented DBMS

作者头像
公众号guangcity
发布2022-12-02 20:41:05
5300
发布2022-12-02 20:41:05
举报
文章被收录于专栏:光城(guangcity)光城(guangcity)

今天看了一篇CStore论文,笔记记录一下,后续内容待补充~

目录

1.C-Store: A Column-oriented DBMS

1.1 Proposal

1.2 Why C-Store?

1.3 Architecture

1.4 Innovative

1.5 Projections

1.6 Segments and Storage Keys

1.7 Join Indices

1.8 RS and WS

1.8.1 encoding schemes

1.8.2 WS

1.9 Storage Management

C-Store: A Column-oriented DBMS

1.1 Proposal

  • Compare Row-store with Column-store DBMS

Row-store DBMS

Column-store DBMS

Attributes of a row are placed contiguously in storage

Values for each attribute (column) are stored contiguously

Write-optimized system

Read-optimized system

More suitable for OLTP-style applications

Data warehouses, OLAP

1.2 Why C-Store?

  • Can only bring required attribute values into memory
  • Data can be packed densely together rather than aligned by byte or word boundaries
  • Easy to encode together column values of the same data type
  • Typical queries involve aggregates over important attributes
  • ...

1.3 Architecture

Writeable Store (WS) which is architected to support high performance inserts and updates.

Read-optimized Store (RS), which is capable of supporting very large amounts of information.

This architecture avoid locking with write based workload.

Components:

  • WS:support high performance inserts and updates.
  • RS:support high performance reads.
  • Tuple Mover:moves updated records from WS to RS.

Key Points:

  • Queries (in SQL) must access data in both storage systems.
  • Updates are implemented as an insert and a delete.
    • Inserts are sent to WS, while deletes must be marked in RS for later purging by the tuple mover.
  • merge out process
  • use a variant of the LSM-tree concept to merge ordered WS data objects with large RS blocks.

art

1.4 Innovative

  • hybrid architecture
  • projections
  • Heavily compressed columns using one of several coding schemes
  • A column-oriented optimizer and executor
  • High availability and improved performance through K-safety using a sufficient number of overlapping projections(基于 projections 的 HA 和 K-safety)
  • The use of snapshot isolation to avoid 2PC and locking for queries.

1.5 Projections

Projections Definition:Groups of columns which are stored in physically contiguous manner

  • Anchoredon a logical table T
  • Can also contain columnsfrom other tables that are related to T through foreign keys, etc.

emp

For example:

代码语言:javascript
复制
EMP1 (name, age)
EMP2 (dept, age, DEPT.floor)
EMP3 (name, salary)
DEPT1(dname, floor

1.6 Segments and Storage Keys

  • Segments
    • Every projection is divided into segments
    • Value-based partitioning on the sort key
  • Storage Keys
    • In each segment, different column values belongingtothesamerecord are identified by SK

1.7 Join Indices

Used to reconstruct a table from its projections.For example:Procection2 -> Projcection1

join_index

1.8 RS and WS

1.8.1 encoding schemes
  1. Self-order, few distinct values(or with most of the same values)
代码语言:javascript
复制
1,1,1,1,3,3,3,3,3,3,3,5,5,5
(v, f, n) -> v:the column where first appears, f: the position in the column, n: the number of times v appears
(1, 1, 4), (3, 5, 7),(5, 12, 3)
  1. Foreign-order, few distinct values(or with most of the same values)
代码语言:javascript
复制
0,0,1,1,2,1,0,2,1
(v, b) -> v: value, b: bitset for this value 
(0, 110000100), (1, 001101001), and (2,000010010)
  1. Self-order, many distinct values
代码语言:javascript
复制
1,4,7,7,8,12
(1,3,3,0,1,4)

The idea for this scheme is to represent every value in the column as a delta from the previous value in the column

  1. Foreign-order, few distinct values

leave it unencoded

1.8.2 WS
  • The difference being columns are not encoded since it is assumed that WS is trivial in size as compared to RS.
  • Sid and SK identify the same tuple in RS and WS.
  • Every column represented as (v, sk) and a B-tree is built for sk.
  • Sort key s represented as (s, sk)

1.9 Storage Management

storage allocator:It will dynamically adjust the partition of the Segment, or apply for the Segment.The corresponding WS and the Join Indexes related to the RS also need to be adjusted accordingly.

TODO:事务、查询器、执行器、恢复等,下一篇Vertica...

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2022-10-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 光城 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • C-Store: A Column-oriented DBMS
    • 1.1 Proposal
      • 1.2 Why C-Store?
        • 1.3 Architecture
          • 1.4 Innovative
            • 1.5 Projections
          • 1.6 Segments and Storage Keys
            • 1.7 Join Indices
              • 1.8 RS and WS
                • 1.9 Storage Management
                领券
                问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档