The Waiting Game: Understanding SQL Server Wait Statistics Part-1

When SQL Server slows down, it is not always easy to figure out why. That is where Wait Statistics come into play — they act like a performance stethoscope, helping DBAs listen to what SQL Server is waiting on. Every query, every process, leaves behind clues in the form of waits. By understanding these waits, you can diagnose bottlenecks, uncover hidden issues, and fine-tune your server for peak performance.

In this series, I will tell you some relevant concepts, show you how to read them, and help you decide what actions to take — so you are not just guessing anymore.

Figure-1: Wait Statistics 

Wait Statistics

When you submit a SELECT query to SQL Server, the ideal scenario is that the CPU quickly processes and returns the result. However in practical, many requests compete for limited server resources. This can lead to delays — some requests have to wait before they can move forward. For example, if one query is modifying a table row then another select query might have to wait until it is free. SQL Server keeps track of these waits. It notes the duration of the wait,
and the cause of the wait (wait type). These records are called wait statistics.

You can find out the cause of the waits and tune your SQL Server by examining two Dynamic Management Views:

  1. sys.dm_os_wait_stats (or sys.dm_db_wait_stats on Azure SQL Database) – cumulative duration of all wait types
  2. sys.dm_os_waiting_tasks – wait statistics of currently running queries

Some Relevant Concepts

Session, Session_id

When you try to connect with SQL Server, it will authenticate and create a session, identified by a unique session_id. In this session, you can execute queries.

SQLOS

The SQLOS is an application layer beneath the SQL Server database engine layer. It manages thread scheduling for CPU usage, thread synchronizations, memory management, deadlock detection and management etc. For details check my Understanding SQL Server Architecture article.

Worker Threads

SQLOS uses host OS's threads to perform any task. Some threads can be used for SQL Server's tasks dedicatedly. For an user requested jobs, it maintains a thread pool. When SQL Server gets an user request, the SQLOS scheduler creates a parent task (main job). This parent task is handled by a worker thread from thread pool. If the job is big or complex, SQL Server might decide to parallelize it. In that case, the parent task stays in place, and SQL Server creates multiple smaller  sub-tasks, and each one gets its own thread to run in parallel.

In general, SQLOS assigns a scheduler for every CPU core. For example, an 8-core processor has 8 schedulers. If it has hyper-threading then 8 physical CPU and 8 logical CPU so total 16 schedulers. Only one session’s worker thread can be running on the CPU at any given time. A thread has three states:

  1. RUNNING – executing in the CPU.
  2. SUSPENDED – if a thread requires a resource which is unavailable (e.g. a page that is not in memory) then it is shifted to an unordered waiter list, with SUSPENDED status, until the required resource is available.
  3. RUNNABLE – if a thread is ready to run (all required resources are in places) and waiting for CPU (because the scheduler is executing another request), then it will be placed in a FIFO queue (runnable queue) with RUNNABLE status.

SQLOS allocates 4 ms processing time to a thread before it must yield the CPU to another session’s thread. If a thread can not complete its task within this period, it will move to either runnable or waiter list queue.

Service Time, CPU Time, Parallel Processing

Ideally, when a user sends a request to SQL Server, most of the time should be spent with the CPU working on it. That is  CPU time or service time.

If the request is complex—like it needs a lot of data or must do many calculations—it will take more CPU time.

To speed things up, SQL Server can split the work into smaller pieces and let multiple CPUs work on it at the same time. This is called parallel processing.

Resource Wait, Wait Time, Wait Type, Signal Waits, Total Wait Time, Response time

When SQL Server is processing a request and it needs a resource (like data from disk or a lock on a table), the thread pauses and goes to a waiting list. This is called a resource wait. Every time a request’s worker thread has to wait for a resource, the SQLOS keeps track of:

  • How long the thread had to wait — wait time
  • What it was waiting for — wait type
  • If all required resources are in places to run the thread again, still it might have to wait for a turn on the CPU. This waiting time is called a signal wait — it means the thread is just waiting for the CPU.

The total wait time = resource wait time + signal wait time.

The Response time = service time + total wait time = service time + resource wait time + signal wait time

Wait Stat Related DMVs

There are two important DMVs which expose wait statistics of SQLOS:

sys.dm_os_waiting_tasks

This DMV gives info about all the threads that are currently waiting for resources — like a lock, disk I/O, or memory. You can use this view to troubleshoot performance issues by identifying which resources are causing delays. For example, if one long-running query is locking a table, and many other queries are waiting on that lock, you will see that lock wait information here.

sys.dm_os_wait_stats

This DMV shows a summary of all the wait times that have happened on the server since it started (or since someone manually reset it). For example, every time a thread has to wait because SQL Server is reading data from disk, that time is recorded under the total cumulative time of a wait type PAGEIOLATCH_SH. The term cumulative total wait time means:

  • Cumulative – SQL Server keeps adding up the wait times for each type of wait including all threads and all sessions on the server. These numbers keep growing until the SQL Server restarts, or someone manually resets the wait stats.
  • Total – It is the sum of resource wait and signal wait.

So, Resource waits = Total waits – Signal waits. All the time are in milliseconds.

The sys.dm_os_wait_stats view helps us find out which resources SQL Server is waiting on the most, and whether it's facing CPU pressure. However, it shows historical data that keeps adding up over time, sometimes for weeks or months. So, if your server is having a problem right now, it might be hard to spot because the wait stats are mixed in with a lot of old data. So the best practices are:

  • Track wait statistics regularly, not just when something breaks.
  • Create a baseline — a record of what 'normal' wait patterns look like on your system.
  • Then, when things go wrong, you can compare the current waits with your baseline to quickly spot unusual or problematic waits.

Final Words

In this part, you have learnt some basic stuffs of SQL Server Wait Statistics. We also discussed about two important DMVs which provides information about the waits. In next part of this series, you will see how to troubleshoot performance bottlenecks using these DMVs.

 

Going Further

If SQL Server is your thing and you enjoy learning real-world tips, tricks, and performance hacks—you are going to love my training sessions too! 

Need results fast? I am also available for 1-on-1 consultancy to help you troubleshoot and fix your database performance issues.

Let’s make your SQL Server to take your business Challenge!

For any queries, mail to mamehedi.hasan[at]gmail.com.

Add comment