SP2010 - Timer service crash

Symptoms

Timer service crashes every second -- it starts and stops continuously on some of the servers in the farm. We see the below event viewer errors:

Log Name: System
Source: Service Control Manager
Date: [Date and Time]
Event ID: 7031
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: <machine name>
Description:
The SharePoint 2010 Timer service terminated unexpectedly. It has done this 957 time(s). The following corrective action will be taken in 30000 milliseconds: Restart the service.
Event Xml:
<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
<EventID Qualifiers="49152">7031</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime="2013-07-09T01:40:34.089553100Z" />
<EventRecordID>291829</EventRecordID>
<Correlation />
<Execution ProcessID="504" ThreadID="2312" />
<Channel>System</Channel>
<Computer>machine name</Computer>
<Security />
</System>
<EventData>
<Data Name="param1">SharePoint 2010 Timer</Data>
<Data Name="param2">957</Data>
<Data Name="param3">30000</Data>
<Data Name="param4">1</Data>
<Data Name="param5">Restart the service</Data>
</EventData>
</Event>

Log Name: System
Source: Service Control Manager
Date: [Date and Time]
Event ID: 7036
Task Category: None
Level: Information
Keywords: Classic
User: N/A
Computer: <machine name>
Description:
The SharePoint 2010 Timer service entered the running state.
Event Xml:
<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}" EventSourceName="Service Control Manager" />
<EventID Qualifiers="16384">7036</EventID>
<Version>0</Version>
<Level>4</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8080000000000000</Keywords>
<TimeCreated SystemTime="2013-07-09T01:40:31.265935000Z" />
<EventRecordID>291828</EventRecordID>
<Correlation />
<Execution ProcessID="504" ThreadID="2312" />
<Channel>System</Channel>
<Computer>machine name</Computer>
<Security />
</System>
<EventData>
<Data Name="param1">SharePoint 2010 Timer</Data>
<Data Name="param2">running</Data>
<Binary>53005000540069006D0065007200560034002F0034000000</Binary>
</EventData>
</Event>

Cause

Malformed OWSTIMER.EXE.config file on one of the servers.

 

RESOLUTION

We found the below call stack in the ULS logs:

An exception occured while trying to acquire the local farm: System.TypeInitializationException: The type initializer for 'System.Data.SqlClient.SqlConnection' threw an exception. ---> System.TypeInitializationException: The type initializer for 'System.Data.SqlClient.SqlConnectionFactory' threw an exception. ---> System.TypeInitializationException: The type initializer for 'System.Data.SqlClient.SqlPerformanceCounters' threw an exception. ---> System.Configuration.ConfigurationErrorsException: Configuration system failed to initialize ---> System.Configuration.ConfigurationErrorsException: '?' is an unexpected token. The expected token is ''>''. Line 1, position 20. (C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN\OWSTIMER.EXE.Config line 1) ---> System.Xml.Xml..

Based on the above error, we went to the server and navigated to the file C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN\OWSTIMER.EXE.Config.

We found the below entry in the file:

<?xml version="1.0"? encoding="utf-8" ?>

Compared in-house and the actual entry is as below:

<?xml version="1.0" encoding="utf-8" ?>

We removed the extra question mark (?) before "encoding" and saved the changes.
Then we restarted timer service.
Waited for about 5-10 minutes and there is no crash observed.
Also, the event viewer doesn't have the event IDs 7031 and 7036 anymore.
We also were able to get the mails waiting in the mail drop folder flushed as soon as the timer service was stable.