How To Monitor Deadlocks in SQL Server

How To Monitor Deadlocks in SQL Server

  • Comments 8
  • Likes

Written by Mohit Gupta, Microsoft Premier Field Engineer.


Every now and then we have deadlocks taking place in SQL Server.  Most go unnoticed, either because the frontend application handles it gracefully.  Or no error is not reported out to the user.

What is a Deadlock?

Deadlocks happen when conflicting locks are taken by two or more processes.  In rare scenarios, I have heard of deadlocks caused by SQL Statements executing in parallel however I cannot reproduce it. Neither side is willing to step aside for the other processes.  Think of going shopping for furniture and two people (Person A and Person B) go in at same time.  Unknown to them they both want the same goods (Good A and Good B).  One starts searching from right another person starts searching from left side of the store.  Person A finds Good A (Exclusive Lock) and starts looking for Good B (Shared Look).  But while Person A was searching for Good A, Person B found Good B (Exclusive Lock) and starts looking for Good A (Shared Lock).  When they find the other good they realize it is already reserved and cannot have it, neither Person A nor Person B is willing to let the other person have both the Goods. Therefore we are stuck in a deadlock.

In SQL Server to prevent this stalemate (i.e. a deadlock) from filling up the system, we have a Deadlock Monitor thread that is running in the the background to “help” resolve deadlocks.

The Deadlock Monitor Thread

If we look at sys.dm_os_waiting_tasks we find a system task that is always waiting: REQUEST_FOR_DEADLOCK_SEARCH.  This thread wakes up every five seconds to see if we have any deadlocks.  If it finds any deadlocks, it terminates one of the sessions. That’s right, it KILLS one of the two sessions to let the other session have all the resources it needs.  Now of course furniture store management can’t do that (at least if shouldn’t if it wants to stay out of jail and keep its customers), but in SQL Server we don’t have to worry, right!?

Well, SQL Server does have to worry a bit, even if it’s not going to jail.  However it has to make sure to kill the session that is the easiest to rollback.  Because if SQL Server kills a transactions, any work it has done must be rolled back to bring the database to consistent state.  It decided this looking at the LOG USED value.

SQL Rollback Logic (LOG USED)

We can see in the example above it kills Session ID 75 over 192 because Session ID 75 only used 648 bytes Log versus the Session 192 which used 944 bytes.

So one wonders what happens when they are the same?  It’s simple, SQL Server will kill whoever came in second.

It wakes up every 5 seconds to check for deadlocks. If it finds any, it follows above process to decide how to resolve it.  However, first time it wakes up, it wakes up the second time right away, to make sure it is not a nested deadlock.  If there is one it kills that and goes back to sleep.  Next time it wakes up it wakes up 4.90 seconds (estimate on wake, I think its actually 10ms).  It will keep going down to as small as 100ms; that is, it will wake up 10 times per second to deal with Deadlock Issues.

How do we monitor deadlocks?

Process

Comments

Windows Performance Monitor
Object: SQLServer:Locks
Counter: Number of Deadlocks/sec
Instance: _Total
This provides all the deadlocks that have happened on your server since the last restart.  We can look at this counter using the following SQL Statement:
SELECT cntr_value AS NumOfDeadLocks
  FROM sys.dm_os_performance_counters
 WHERE object_name = 'SQLServer:Locks'
   AND counter_name = 'Number of Deadlocks/sec'
   AND instance_name = '_Total'
Trace Flags1204 and 1222 Trace flag 1204 has existed since at least SQL Server 2000.  And Trace Flag 1222 was introduced in SQL Server 2005.Both output Deadlock Information to the SQL Server ERRORLOG.
SQL Server Profiler/Server Side TraceEvent Class: LocksEvent Name: Deadlock Graph Gives an XML Graph like the example above.  Very easy to read and figure out what is going on.
Extended Events The new way to do monitoring in SQL Server.  Extended Events are eventually going to replace the SQL Server Profiler all together (i.e. SQL Server Profiler is on the Deprecated Feature list).  It produces the same XML graph as SQL Server Profiler, and is lighter in performance impact
System Health This is a “new” default trace, but it’s unlike the default trace that had limited the amount of information being tracked and could not be modified.  We can modify a system health definition, which is built on a Extended Events.  But unlike the default trace, the System Health tracks the Deadlocks that have occurred recently.  So we can pull this information from the System Health for analysis instead of implementing our own Extended Events monitoring.

In this article we’ll look at two new methods: Extend Events Trace and System Health.

Deadlock Monitoring Using System Health

System Health is like Default trace that exists in the background.  But unlike default trace it does not save any information to disk. Below is the definition for System Health from SQL Server 2012 RTM.  System Health has been around since 2008, however was not visible in the GUI.

CREATE EVENT SESSION [system_health] ON SERVER 
ADD EVENT sqlclr.clr_allocation_failure(
ACTION(package0.callstack,sqlserver.session_id)),
ADD EVENT sqlclr.clr_virtual_alloc_failure(
ACTION(package0.callstack,sqlserver.session_id)),
ADD EVENT sqlos.memory_broker_ring_buffer_recorded,
ADD EVENT sqlos.memory_node_oom_ring_buffer_recorded(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack)),
ADD EVENT sqlos.scheduler_monitor_deadlock_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_iocp_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_non_yielding_rm_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_stalled_dispatcher_ring_buffer_recorded,
ADD EVENT sqlos.scheduler_monitor_system_health_ring_buffer_recorded,
ADD EVENT sqlos.wait_info(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text)
WHERE ([duration]>(15000) AND ([wait_type]>(31) AND ([wait_type]>(47) AND [wait_type]<(54) OR [wait_type]<(38) OR [wait_type]>(63) AND [wait_type]<(70) OR [wait_type]>(96) AND [wait_type]<(100) OR [wait_type]=(107) OR [wait_type]=(113) OR [wait_type]>(174) AND [wait_type]<(179) OR [wait_type]=(186) OR [wait_type]=(207) OR [wait_type]=(269) OR [wait_type]=(283) OR [wait_type]=(284)) OR [duration]>(30000) AND [wait_type]<(22)))),
ADD EVENT sqlos.wait_info_external(
ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text)
WHERE ([duration]>(5000) AND ([wait_type]>(365) AND [wait_type]<(372) OR [wait_type]>(372) AND [wait_type]<(377) OR [wait_type]>(377) AND [wait_type]<(383) OR [wait_type]>(420) AND [wait_type]<(424) OR [wait_type]>(426) AND [wait_type]<(432) OR [wait_type]>(432) AND [wait_type]<(435) OR [duration]>(45000) AND ([wait_type]>(382) AND [wait_type]<(386) OR [wait_type]>(423) AND [wait_type]<(427) OR [wait_type]>(434) AND [wait_type]<(437) OR [wait_type]>(442) AND [wait_type]<(451) OR [wait_type]>(451) AND [wait_type]<(473) OR [wait_type]>(484) AND [wait_type]<(499) OR [wait_type]=(365) OR [wait_type]=(372) OR [wait_type]=(377) OR [wait_type]=(387) OR [wait_type]=(432) OR [wait_type]=(502))))),
ADD EVENT sqlserver.connectivity_ring_buffer_recorded(SET collect_call_stack=(1)),
ADD EVENT sqlserver.error_reported(
ACTION(package0.callstack,sqlserver.database_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack)
WHERE ([severity]>=(20) OR ([error_number]=(17803) OR [error_number]=(701) OR [error_number]=(802) OR [error_number]=(8645) OR [error_number]=(8651) OR [error_number]=(8657) OR [error_number]=(8902)))),
ADD EVENT sqlserver.security_error_ring_buffer_recorded(SET collect_call_stack=(1)),
ADD EVENT sqlserver.sp_server_diagnostics_component_result(SET collect_data=(1)
WHERE ([sqlserver].[is_system]=(1) AND [component]<>(4))),
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file(SET filename=N'system_health.xel',max_file_size=(5),max_rollover_files=(4)),
ADD TARGET package0.ring_buffer(SET max_events_limit=(5000),max_memory=(4096))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=120 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

It is a bit hard to read this, however all it is doing is defining the Events, the Columns, and filters on those events.  After which it defines the target (that is, where should it save the information).  It saves it to memory and a file system_health.xel located in you LOG directory.

The event we are concerned with is SQLOS.SCHEDULER_MONITOR_DEADLOCK_RING_BUFFER_RECORDED.  We can dive into the SYSTEM_HEALTH using the XML Methods to retrieve the DEADLOCKs.  The following SQL statements are “supposed” to get this for you:

WITH SystemHealth
AS (
SELECT CAST(target_data as xml) AS TargetData
FROM sys.dm_xe_session_targets st
JOIN sys.dm_xe_sessions s
ON s.address = st.event_session_address
WHERE name = 'system_health'
AND st.target_name = 'ring_buffer')

SELECT XEventData.XEvent.value('(data/value)[1]','VARCHAR(MAX)') AS DeadLockGraph
FROM SystemHealth
CROSS APPLY TargetData.nodes('//RingBufferTarget/event') AS XEventData (XEvent)
WHERE XEventData.XEvent.value('@name','varchar(4000)') = 'xml_deadlock_report'


However while writing this article I found this doesn’t give me the correct information in SQL Server 2012 RTM.  Therefore I have created an Bug Report on Microsoft Connect, because it should give us the XML data, however I don’t get anything useful.  I re-wrote the above statement to get the XML Graph: 

WITH SystemHealth
AS (
SELECT CAST(target_data as xml) AS TargetData
FROM sys.dm_xe_session_targets st
JOIN sys.dm_xe_sessions s
ON s.address = st.event_session_address
WHERE name = 'system_health'
AND st.target_name = 'ring_buffer')

SELECT XEventData.XEvent.query('(data/value/deadlock)[1]') AS DeadLockGraph
FROM SystemHealth
CROSS APPLY TargetData.nodes('//RingBufferTarget/event') AS XEventData (XEvent)
WHERE XEventData.XEvent.value('@name','varchar(4000)') = 'xml_deadlock_report'

So instead of using the XML Value Method, I used the XML Query Method to get the Deadlock Graph.  However, the first one should work as well since it works fine in SQL Server 2008/SQL Server 2008 R2.

This query gives me XML Graph for each deadlock detected, that we are used to from our Deadlock Graph report in SQL Server Profiler.

Deadlock Graph

We can click on any of the XML links to get the complete XML and then save it as Filename.XDL file to get it in Deadlock Graph in SSMS.

System Health is great for this, however it doesn’t save deadlocks indefinitely.  So next let’s look at implementing Deadlock Monitoring using Extended Events, much like SQL Server Server Side Trace.

Deadlock Monitoring Using the Extended Events Trace

I will be implementing this in SQL Server 2012 RTM because we have a GUI interface for it.  However, after you create the Extended Events Trace (Session) we can take the SQL code and run it against 2008 or 2008 R2 to create a similar trace.

Extended Events

 

  1. Navigate to SQL Server
  2. Management.
  3. Extended Events
  4. Sessions.
  5. Right Click on Sessions, Go to New Session Wizard.New Session Wizard
  6. Enter in Session Name “Deadlock_Monitor” click Next.Choosing a template
  7. Select Do not use a template and click Next (Templates are like SQL Server Profiler Templates, where you have defaults to start with, but none of the templates fit our needs).Select Events to Capture
  8. This is where we define what events we want to capture. Type “deadlock” in Event Library.  We get following image.Deadlock Events to Capture
  9. Click on “xml_deadlock_report” and click “>” to add it.  Click Next to continue.xml_deadlock_report Global Fields
  10. Now we need to define what columns we want to capture, for now just click Next.Selecting Columns
  11. Define any filters needed. For now we’ll ignore this setting.  Click Next to continue.Defining Filters
  12. Select “Save data to file”, enter file name, size, etc.  Click Next to continue.Save the data to a file
  13. Review all the configurations and click Finish to setup and start the session.New Event Creation Completed
  14. Now you can start the capture and watch it live.

In this article we looked at different ways to deadlock monitoring and a new way using Extend Events.  If you have questions or suggestions please feel free to message me.  Below is the T-SQL Code for same Deadlock.

CREATE EVENT SESSION [Deadlock_Monitor] ON SERVER 
ADD EVENT sqlserver.xml_deadlock_report
ADD TARGET package0.event_file(SET filename=N'C:\Temp\Deadlock_Monitor.xel')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Mohit Gupta ,great article ,really useful for a absolute beginner, thanks a lot!!!11

  • Hi,

    I rally understood about dealocks monitoring..Thanq soo much

  • Hi

  • Clearly laid out article - very useful

  • hi, System Health method give me some results but xml links does not appears in the select list.
    the select list is empty . Why?
    (sql2008 r2)
    thank's in advance

  • Good article Mohit.

  • Great article. I found the deadlock. Although replication seems running when I open "Replication Monitor" it comes up with a RED Cross on the "Agents" tab and when I try to run it ,it fails at 4% and says "Prefetch objects failed for Database MYDb" - question what to do after I find the deadlock ? How to resolve this ? Or help SQL resolve this manually ?

  • Just getting started with this....using 2008R2. I tried entering the code in the box at the end of your blog but got:
    The target name, "package0.event_file", is invalid, or the object could not be found
    I'm not sure I'm getting a lot of deadlocks but enough that I want to start monitoring them. Other code samples I've found only show deadlock events several days old. I could run a profile but I'd rather use extended events.