WSS/MOSS - Common Issue - SQL Deadlocks during STSADM import operations
In the last couple of weeks I have seen several cases where STSADM import operations failed with random exceptions. With other words: performing the same import into an empty site collection multiple times the import operation failed at different points during the import. Checking the ULS logs showed errors like the following:
10/20/2008 12:47:26.59 STSADM.EXE (0x78BC) 0x4FF4 Windows SharePoint Services Database 6f8g Unexpected Unexpected query execution failure, error code 1205. Additional error information from SQL Server is included below. "Transaction (Process ID 123) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction." Query text (if available): "..."
Such a behavior is usually an indication that asynchronous actions interact with the import operation and caused a deadlock in SQL server.
Another interesting tidbit is that this only affects STSADM import but not content deployment.
Isolating these issues is not very easy as SQL asap kills one of the deadlocking queries and the child process (in our case STSADM.EXE and potentially a second process) continue to run till they finally fail due to the fact that the SQL query did not succeed.
In a test environment it is possible to isolate the issue by attaching a debugger to the SQL server and setting a breakpoint right before the deadlock victim is killed. That causes the deadlock to persist and allows to take memory dumps of STSADM and the other involved processes.
In the cases I have worked on the problem was always caused by a custom event receiver fireing when importing the items. That also explains why only STSADM -o import is affected but not Content Deployment: with STSADM -o import the After event handlers will fire while Content Deployment suppresses After events through the import settings SPImportSettings.SuppressAfterEvents.
Unfortunatelly STSADM does not provide an option to suppress the after events. So there are two possible way to resolve the issue:
- Disable the event receivers in the features on the target machine when performing the import
- Create a custom import application which uses the content deployment and migration API as discussed in Part 3 of my Deep Dive into the content deployment and migration API series and sets the SuppressAfterEvents property to true.