Exploring Clustered Columnstore Indexes and some Relevant Concepts

Today, I am going to explain you Clustered Columnstore Index and some related concepts. As a Database enthusiastic, you should know it. Let's start.

Columnstore Index

A columnstore index is a technology for storing, retrieving, and managing data using a columnar format known as a columnstore. This is standard for storing and querying large data warehouse fact tables. It offers up to 10 times the query performance compared to traditional row-oriented storage. Additionally, it provides up to 10 times the data compression over uncompressed data sizes.

Let's check some more topics associated with columnstore index.

Rowstore

Rowstore organizes data logically in a table format with rows and columns, and stores it physically in a row-wise data structure. Traditionally, SQL Server stores data in rowstore format with underlying data storage as heap, clustered index, or a memory-optimized table.

Let's you have above table. As per rowstore storage format, SQL Server will store data as

Columnstore

Columnstore also organizes data logically in a table format with rows and columns, however physically it stores in a column-wise data structure. If you organize above table in columnstore format then it would be look like

Rowgroup

If a table has millions of data, it slices the table (max 1,048,576 rows per slicing) and stores the data as columnstore format. A table might have multiple rowgroup. Thus, high performance and compression is achieved.

Column Segment

A column segment is a column of data within a rowgroup. Each rowgroup contains one column segment for every column in the table. These column segments are compressed together and stored on physical media, with metadata included for each segment to allow for fast elimination of segments without reading them.

Clustered Columnstore Index

A clustered columnstore index serves as the physical storage for the entire table. Unlike row-level indexes, columnstore indexes do not require the column names in the indexes of the corresponding table. Clustered columnstore indexes operate on the entire table at the data page level. Generally, clustering arranges data in a specific order, so with row-level indexes, the rows of the indexed column(s) are sorted accordingly. However, with a columnstore index, clustering is performed on each column sequentially.

Delta Rowgroup

You can think it as a temporary storage of columnstore indexes. It enhances columnstore compression and performance by storing rows until their number reaches a threshold of 1,048,576, after which they are moved into the columnstore.

When a delta rowgroup reaches this maximum row count, it transitions from an OPEN to a CLOSED state. A background process called the tuple-mover checks for closed rowgroups. It compresses the closed delta rowgroup and stores in the columnstore as a COMPRESSED rowgroup. Once a delta rowgroup is compressed, the existing delta rowgroup transitions into a TOMBSTONE state. The tuple-mover will later remove it when there are no references to it.

Deltastore

A columnstore index can have multiple delta rowgroup. Collectively, the delta rowgroups are called the deltastore.

Nonclustered Columnstore Index

A nonclustered index serves as a secondary index created on a rowstore table which includes a copy of some or all of the rows and columns from the underlying table.

Conclusion

Hopefully, you are clear about Columnstore Indexes and associated concepts.

Happy TSQLing!

Add comment