Deep Drive into Columnstore Index

In today's data-driven world, performance is everything—especially when working with large volumes of data. Traditional row-based indexes can struggle to keep up with modern analytics workloads. That’s where Columnstore Indexes come in. Designed for high-performance querying on massive datasets, Columnstore Indexes reshape the way SQL Server handles storage and retrieval, particularly for OLAP-style reporting and data warehousing.

In my last Columnstore Index article, I discussed about some concepts of Columnstore Index. In this write-up, we will go deeper.

Figure-1: Columnstore Index

Preparing the Ground

Through out this article, I used 47 GB StackOverflow Database specialized copy as of 2018/06 and followed below:

  1. Created two tables using same schema of Users table only modifying AboutMe column as NVARCHAR(4000)
    • Users_NonColumnStoreIndex - it has no Columnstore index.
    • Users_ColumnStoreIndex - it has the clustered Columnstore index.
  2. Imported data from Users table to Users_NonColumnStoreIndex and Users_ColumnStoreIndex truncating AboutMe field.
  3. Both table has the same number of rows i.e. 89,15,506.

Examining Columnstore

It is Compressed

Columnstore index compresses data more than its counter part rowstore index. Let's run sp_BlitzIndex on both tables.

EXEC sp_blitzIndex @tablename='Users_NonColumnStoreIndex';

EXEC sp_blitzIndex @tablename='Users_ColumnStoreIndex';

Figure-2: sp_BlitzIndex output of Users_NonColumnStoreIndex table

Figure-3: sp_BlitzIndex output of Users_ColumnStoreIndex table

You will see that for the same amount of data, User_ColumnstoreIndex table occupies half of the Users_NonColumnStoreIndex table.

Data is Stored in Column-wise

In columnstore, data is logically organized as a table with rows and columns, and physically stored in a column-wise data format. Run below query on User_ColumnstoreIndex table.

sp_blitzIndex @tablename='Users_ColumnStoreIndex', @ShowColumnstoreOnly=1

Figure-4: Users_ColumnStoreIndex table detail view

Figure-4 shows how User_ColumnstoreIndex table is organized. There are 14 row groups and green marked column shows total rows per row group. If a row-group is full, it is compressed and marked as COMPRESSED (yellow marked column). If any data is DELETED from a compressed row group, it is marked as deleted (red marked column).

If a data is INSERTED, it goes to the Delta Rowgroup (black marked row) which is in OPEN state (yellow marked column). For UPDATED operation, first it is marked as DELETED then it is INSERTED into Delta Rowgroup. Once Delta Rowgroup is full, it is compressed and marked as COMPRESSED.

When to Pick Columnstore Index

You should pick Columnstore Index when your table meets following criteria:

  • You have large volume of data
  • Your queries are mostly perform analytics that scan large ranges of values
  • Your data is mostly stable or your queries are updating and deleting <10% of the rows
  • You have fact and dimension tables for a data warehouse

Choosing the Right Columnstore Index

  • Clustered columnstore index - use it for traditional data warehouse workload or IoT workload
  • Ordered columnstore index - use it when a clustered columnstore index is queried via a single ordered predicate column or column set.
  • Nonclustered indexes in a clustered columnstore index - use it to:
    • Enforce primary key and foreign key constraints on a clustered columnstore index
    • Speed up queries that search for specific values or small ranges of values
    • Speed up updates and deletes of specific rows
  • Nonclustered columnstore index on a disk-based heap or B-tree index - use it when
    • OLTP workload has some analytics queries so you can drop B-tree indexes and add one nonclustered columnstore index
    • OLTP workloads perform ETL operations to move data to a separate data warehouse. You can eliminate ETL and a separate data warehouse by creating a nonclustered columnstore index on some of the OLTP tables.

Combining Columnstore and Rowstore Indexes

  • You can create an updatable nonclustered columnstore index on a rowstore table. 
  • You can run analytics on the columnstore index and transactions on the rowstore index at the same time. However, it stores a copy of the data which requires additional storage.
  • Both rowstore and columnstore indexes are working on the same data as the columnstore is updated when data changes in the rowstore table.
  • You can perform efficient table seeks on the underlying columnstore index by adding one or more nonclustered rowstore indexes on columnstore indexs.
  • You can enforce a primary key constraint by using a UNIQUE constraint on the rowstore table. Since a nonunique value fails to insert into the rowstore table, the Database Engine can't insert the value into the columnstore.
  • The nonclustered columnstore index supports filtered condition too. To minimize the performance impact of adding a columnstore index on an OLTP table, you can use filtered condition to create a nonclustered columnstore index on only the cold data of your operational workload.

Final Words

Columnstore indexes are a powerful tool in SQL Server, especially when dealing with large volumes of data and analytical workloads. By organizing data by columns instead of rows, they significantly reduce storage and boost query performance. Whether you are building a data warehouse or optimizing reports, knowing when and how to use Columnstore indexes can make a big difference. Start experimenting, and let your queries fly!

 

Going Further

If SQL Server is your thing and you enjoy learning real-world tips, tricks, and performance hacks—you are going to love my training sessions too! 

Need results fast? I am also available for 1-on-1 consultancy to help you troubleshoot and fix your database performance issues.

Let’s make your SQL Server to take your business Challenge!

For any queries, mail to mamehedi.hasan[at]gmail.com.

Add comment