Understanding SQL Server Execution Plan Part-1

When a query performs poorly and you fails to understand the reason behind it. You need to check the execution plan. However, reading an execution plan is like reading a new language. You need to understand the operators, their properties, data flows, meaning of icons, order of operators etc. In this article series, I am going to discuss this step by step.

In this part, you will learn some basic concepts.

Execution plan

An execution plan is a blueprint that the execution engine uses to execute a query. It’s a step-by-step strategy that outlines how the execution engine will process the query, including how and when the indexes and tables will be accessed, the order of operations, and how data will be joined and filtered, foreign keys accessed and more.

SQL Server can show execution plans in 3 different formats:

  1. XML plan
  2. Text plan and
  3. Graphical plan. We will only discuss this in this article.

Query Processing Steps

Once a Query is submitted to SQL Server, it passes through a series of steps. For detail SQL Server Architecture, check my article.

  1. Any syntactic, semantic error is checked and query is bind with physical tables, views, procedures or functions. Also query tree is generated
  2. Query Optimizer is responsible for finding out the cheapest or cost-effective execution plan.
  3. Finally, Query executor executes the query as per the execution plan by calling the Access Methods. Once data is retrieved from Storage Engine, the result is forwarded to the end user.

A Few Concepts on Execution Plan

Plan Cache

Based on the query, SQL Server may take from millisecond to minutes to generate a execution plan. To avoid overhead, SQL Server stores plans for future usage.

Plan aging

A plan is produced based on different statistics. Over the period, these statistics updated. So, every plan comes up with a age and after the expiry period, it becomes obsolete.

Plan recompilation

Some events like changing INDEX, table structure, statistics put an existing execution plan in recompilation state. A new plan is generated the next time the query is called.

Trivial Plan

If a submitted query is simple and has only one possible plan. In this situation, the optimizer disregards finding the optimum plan. This plan is called TRIVIAL plan.

Estimated Plan

If a query is submitted only for inspection not for execution, then the Optimizer checks whether it exists in plan cache, otherwise do the full optimization process and returns the plan. Estimated plan is very useful for testing large, complex queries that could take a long time to run. It is not saved in plan cached.

Actual Plan

If a query is submitted for execution, then the Optimizer checks whether it exists in plan cache, otherwise do the full optimization process and returns the plan. It is saved in plan cached for future usages.

Difference Between Estimated and Actual Plan

Conceptually, there is no difference between Estimated and Actual plan. At any given time, there is only one valid execution plan for a particular query. Based on server activity and available resources, the actual plan might be differ than the estimated plan in case of parallel execution. Remember that Optimizer produces plan based on available statistics which might changes over the period. So for a particular query, a plan generated at present time might not match with future plan.

Data flow arrows

An arrow shows the direction of data flow between the operators. The thickness of an arrow reflects the amount of data passed. A slim arrow means less rows where as a thicker arrow meaning more rows.

Table Cardinality

Simply, cardinality means the number of rows in a table or indexed view.

Statistics and their Importance

Statistics are stored as BLOB objects which contain statistical information about the distribution of values in one or more columns of a table or indexed view. Query optimizer updates necessary statistics. If statistics don't exist then they will be created immediately and consumed by optimizer. Optimizer depends on statistics to estimate the cardinality or number of rows of a query result. In other words, it is the knowledge of how many rows are in a table, and how many of those rows satisfy the various search and join conditions, and so on. Based on this, optimizer creates a high-quality query plan. For example, optimizer may opt for the index seek operator instead of the more resource-intensive index scan operator, if doing so improves query performance.

Cost

Under the every operator, there is cost parameter which shows how much cost associated with this operator. This is relative to other operators.

Diagram: Operator's cost


Operators

Operators depicts the way SQL Server executes a query. Each operator implements a specific algorithm designed to perform a specialized task. Query optimizer uses a series of operators to construct the query plan. There are approximately 85 operators which can be divided as:

  • Logical operators - describe what operation needs to be performed conceptually.
  • Physical operators - implement the operation described by logical operators.

Physical operator does three actions:

  1. Initialization - calling the Init() method of the physical operator will instantiate itself, and set up any required data structures.
  2. Data collection - GetNext() method of the physical operator will fetch the next row of data
  3. Clean up - when the Close() method performs object clean-up and shuts the operator down.

Now, let's check some common operators categorized with their basic purpose.

Common Physical Operators
  Category Operators
   Data Read
 

 Clustered Index Scan

   Non-Clustered Index Scan
   Clustered Index Seek
   Non-Clustered Index Seek
 Constant Scan
 Lookup
 Data Combining  Nested Loops
 Merge Join
 Hash Match
 Data Manipulation

 Sort
 Stream Aggregate
 Compute Scalar
 Filter
 Top

 

Properties of Common Operators

Let's check some common properties of operators. The properties will vary based on operators.

Diagram: Tooltips of properties of Clustered Index Seek

  1. Physical Operation - implements the operation described by logical operators e.g. Non-Clustered Index Seek, Clustered Index Scan etc.
  2. Logical Operation - describes what operation needs to be performed conceptually.
  3. Actual Execution Mode - depicts the actual plan need to be executed.
  4. Estimated Execution Mode - portrays the estimated plan need to be executed.
  5. Storage - describes how data is logically stored in file system like Row Store or Column Store.
  6. Actual Number of Rows Read - actual number of rows returns by the operator.
  7. Actual Number of Rows for All Executions - actual number of rows returns by the operator for all iterations.
  8. Actual Number of Batches - returns no of batches for a batch query.
  9. Estimated Operational Cost - displays estimated cost of the operator with respect to the execution plan assigned by the optimizer.
  10. Estimated I/O Cost - optimizer puts the input and output costs of the result set.
  11. Estimated Subtree Cost - tells subtree generation cost from the query tree. This is generated by the optimizer.
  12. Estimated CPU Cost - optimizer estimates the cost to execute the CPU operations.
  13. Estimated Number of Execution - in a single batch, the number of executions that can be handled by the optimizer for a estimated plan.
  14. Number of Executions - in a single batch, the number of executions that can be handled by the optimizer for a actual plan.
  15. Estimated Number of Rows for All Execution - optimizer's calculations about rows that are going to be returned by the all execution.
  16. Estimated Number of Rows to be Read - optimizer puts an estimated value about number of rows that are going to be read.
  17. Estimated Number of Rows Per Execution - optimizer's expectations about rows that are going to be returned.
  18. Estimated Row Size - size of the returned rows.
  19. Actual Rebinds - counts the number of times the Init() method was called on physical operator
  20. Actual Rewinds - result dataset is not changed, so it can be reused.
  21. Ordered - determines whether returned data set value will be sorted or not.
  22. Node ID - numbering format from right to left and then the usual top to bottom. So, the bottom right will have node ID is 1.
  23. Object - the object accessed, such as the index being accessed by a scan or a seek operation.
  24. Output List - defines the returned columns list.
  25. Predicate - values derived from WHERE clause of the SQL statement.

Conclusion

So far, you learnt about the execution plan, query processing steps and some relevant concepts. We also discussed about some frequently used operators and their properties. In next part of this series, you will learn more about the operators.

Check next part!

Add comment