Understanding Index Fragmentation

That day, I was purchasing some items from my neighbouring grocery shop. I noticed, the shop owner was maintaining his debtors list in his ledger book. I found his ledger book started with a table of his customer list along with page numbers. Pages had  transactions of individual customer. In case of a page became full, it had reference of another pages with bunch of transactions. There are some blank pages and also blank spaces in some pages. Suddenly, I realized how SQL Server manage the INDEX functionality.

Data Page

Physically, SQL Server stores data in 8KB data pages. 8 contiguous pages form an extent. A data page has pointers to its next and previous pages.

 

INDEX

The main purpose of SQL Server index is to locate the data quickly based on the column level value. It plays a vital role for database performance. SQL Server organizes INDEX as a set of pages (index nodes) based on B+ tree structure.

  • Root node (Root Page) - top node of the B+ tree
  • Leaf nodes (leaf Pages) - the bottom level of nodes of the B+ tree
  • Intermediate nodes - any index node between the root and the leaf nodes

 

Diagram: B+ Tree Structure of a Clustered Index

  • Leaf nodes of B+ Tree Structure of Clustered Index - contains actual data row
  • Leaf nodes of B+ Tree Structure of Non-Clustered Index - contains the values from the indexed columns and row locators that point to the actual data rows

INDEX Fragmentation

Index fragmentation is a natural phenomenon. Whenever there is any INSERT, UPDATE or DELETE operation, SQL Server automatically modifies indexes. For example while inserting a new row, SQL Server determines the data page where it must be inserted. Below scenarios might be happened:

  1. The desired page might not exist, in that case, database engine creates page, add it and an entry is added to the parent page above it.
  2. If the page exists:
    • if it has space, it reads the data page and inserts the row into it
    • if it has insufficient space, it allocates a new page and moves close to 50% of the rows from the previous page to the newly allocated page and performs the insert on either the old or new page based on the index key value.

Over the time, these operations can cause the data in the index to become fragmented in the database.

There are two types of fragmentation. 

  1. Internal Fragmentation and
  2. External Fragmentation or Logical Fragmentation

Internal Fragmentation 

SQL Server Internal Fragmentation occurs when pages have too much free space. For example, let's assume we have thousands rows stored in 80 pages that are 100% full. After some CRUD operations, it becomes 100 pages that are 80% full. This increases the number of logical reads (100 pages instead of 80) during the query execution. As the index utilizes more data pages to store data.

External Fragmentation or Logical Fragmentation

External Fragmentation is caused when the logical order of the pages do not match with their physical order. That means when a page split occurs, the new page may be placed at some distant memory location not adjacent to older page. As a result, the query requires large number of small I/O operations to read the same amount of data rather than small number of large I/O operations. Which degrades query performance.

Page Density and Fill Factor

  • Page Density - A data page can store variable number of rows. If a page is fully occupied by rows, page density is 100%. If a page is empty, page density is 0%.
  • Fill Factor - How much of a page will be filled by rows. For example if fill factor is 100 (or 0, which is equivalent in this context), page will be filled up 100%. Any other value will cause lower page density resulting query performance degradation.

Detecting Fragmentation

The System Dynamic management view sys.dm_db_index_physical_stats is used to detect size and fragmentation information of data and indexes. Some notable columns are: 

SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
       OBJECT_NAME(ips.object_id) AS object_name,
       i.name AS index_name,
       i.type_desc AS index_type,
       ips.avg_fragmentation_in_percent, ips.avg_page_space_used_in_percent,
       ips.page_count,ips.fragment_count, ips.alloc_unit_type_desc
FROM sys.dm_db_index_physical_stats(DB_ID(), default, default, default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id AND ips.index_id = i.index_id
ORDER BY page_count DESC;

Diagram: INDEX Fragmentation

  • avg_page_space_used_in_percent - returns the average percentage of the data storage space used on the page. It is used to check the internal index fragmentation. 
  • avg_fragmentation_in_percent - shows information about external index fragmentation. For tables with clustered indexes, it provides the percentage of out-of-order pages when the next physical page allocated in the index is different from the page referenced by the next-page pointer of the current page. For heap tables, it returns the percentage of out-of-order extents, when extents are not residing continuously in data files. 
  • fragment_count - indicates how many continuous data fragments belongs to the index. Every fragment constitutes the group of extents adjacent to each other. Adjacent data increases the chances that SQL Server will use sequential I/O and Read-Ahead while accessing the data. 

Best Practices of Controlling Page-splits and Fragmentation

There are couple of best practices to minimize the page splits and fragmentation:

  1. Appropriate Clustering Key - use IDENTITY for clustering key. As this is unique, static, narrow, and ever-increasing. Which prevents random data INSERT. Though it has a side effects. There might be multiple concurrent inserts on the last page in memory, causing high PAGELATCH_xx waits for the threads. Fortunately, SQL Server 2019 introduced OPTIMIZE_FOR_SEQUENTIAL_KEY index option to tackle this.
  2. Appropriate fill factor - it determines how much of a page will be filled by rows. For example if fill factor is 100 (or 0, which is equivalent in this context), page will be filled up 100%. Start with fill factor 100. Monitor the fragmentation levels, and gradually adjust the value down until you find an optimal value.
  3. Avoid VARCHAR and NVARCHAR - when possible avoid variable-length data types. As data in variable-length data types may change over time and potentially cause page splits.
  4. Avoid Row Versioning - SQL Server stores the old versions of records in the version store within tempdb and attaches a 14-byte version tag to the rows in the database to link them to the version store. This 14-byte increase in each row may cause rows to be re-positioned, potentially leading to page splits.
  5. Avoid shrinking database - Shrinking moves pages from the end of the data file to the front and releases the free space at the end. SQL Server relocates these pages to the front without maintaining physical contiguity, which can introduce fragmentation. Therefore, we should avoid shrinking unless absolutely necessary.

Index Maintenance

Though you follow all the above best practices, still you will see INDEX fragmentation over the time. For this, you need to use either INDEX Rebuild or Reorganize.

INDEX Rebuild

In INDEX Rebuilding, index is dropped and re-created. A rebuild operation can be done offline or online based on database engine version and index type. An offline rebuilding task is faster than online. However, it locks the object-level and prevents query to access data.

INDEX Reorganize

Reorganize operation is some sort of house keeping operation of index which puts index in order. This is less resource hungry and online.

You can perform the INDEX rebuild or reorganize task from SSMS very easily.

 

INDEX Rebuild or Reorganize?

Are you confused about picking right operation between Rebuild and reorganize? Microsoft suggests below guideline:

  • Fragmentation < 10% - no action is required. As it has very less impact on performance.
  • Fragmentation is between 10% to 30% - choose reorganize.
  • Fragmentation > 30% - choose rebuild.

Conclusion

INDEX is a vital factor for query performance. It's effectiveness degrades when there is fragmentation. Develop a INDEX management policy to avoid the sudden stop of your workload.                                                                                                             

Add comment