Imagine this Monday morning scenario: you had a great weekend with your friends at the beach. Those bones are aching just a bit after all the football you played. Still, you drag yourself out of the bed, go to the office, hoping, just hoping that nothing major happens today.
BUT… when you arrive at the office, you find all hell is breaking loose. The phones are ringing all over the place, and your manager is shouting at you and your colleagues to do something. And exactly what is wrong? Well, your mission critical database, all 2TB of it, is suddenly offline and unreadable. You look at the error logs and find the dreaded words – database corruption. You look for the backups and then figure out that you were never taking transaction log backups for this database. And later during a root-cause analysis it comes to light that the first signs of corruption actually showed up 6 months ago! And you never knew about it because there were no proactive alerting or checks configured for this database instance.
None of us ever want to face this ‘perfect storm’ scenario. Microsoft Services and the Premier Field Engineering (PFE) team, through our series of Premier Proactive Assessment Programs, will help you to proactively identify, remediate and mitigate risks proactively.
The SQL Server Risk Assessment Program (SQLRAP) is currently available for SQL Server versions 2000-2008 R2. It includes an onsite PFE visit (typically for 4-5 days) and a deep knowledge transfer element.
In October 2012, we also have released SQL RAP as a Service (here) which targets SQL Server versions 2005-2012. It is a remote offering with a PFE delivering the analysis through an online meeting.
Both options are currently available in most markets worldwide, and you can select the one which most suits your business needs.
In the last 4 years, Microsoft Premier Field Engineers have delivered over 6,500 individual RAPs for SQL Server, identifying an average of 140 issues in each engagement. That’s a lot of stuff! And it keeps getting better – the SQL RAP as a Service offering increases the scope and reach, including the latest SQL Server 2012 features such as AlwaysOn Availability Groups. There are too many issues to list in this single blog post, so here is a ‘tag cloud’ of the coverage in these offerings.
Now, wondering what are the most common issues which we find? Come on, let’s take a look!
The most common critical issue we find is that Windows Updates have not been applied regularly. And not only do we identify the current issue with health (specific updates missing) but we go further and through a series of operational interview questions, identify that there is a risk because there is no comprehensive patching policy.
We find very commonly that while installing SQL Server, an ‘administrator’ or equivalent service account has been selected. This leads to an unwanted elevation of privilege security risk. We identify all instances of SQL Server which are running under such privileged accounts.
This one comes up in many instances, and in many cases, in combination with the above (admin account) issue. Many administrators do not realize that xp_cmdshell is not required to be turned on all the time. They can either enable and disable it on demand, or use the proxy account feature to mitigate the risk.
Many times, we find that the DBCC CHECKDB command has never been executed, or is not executed on a regular basis. This can very well fail to alert us to impending catastrophic failure of the database due to corruption issues. We identify databases for which CHECKDB has either never run or has not run at least once in the last week. It is recommended that the CHECKDB be scheduled on a monthly basis and on-demand for significant events such as unexpected shutdown / failover of the server.
Another very common issue is that SQL instances are not updated with the current service pack. Not only is this important from a supportability point of view, but it is also important to receive the latest security, performance and stability fixes. We identify all instances which have older service packs installed. The onsite PFE in the case of the SQL RAP offering will also advise you through a knowledge transfer exercise on the importance of testing this update prior to application.
In the SQLRAP, we also perform a limited static code analysis of your T-SQL stored procedures, views and functions. During this analysis, we will identify objects which are using a NOLOCK table hint and using that data to perform a modification (INSERT or UPDATE typically.) As we know, the NOLOCK hint can result in inconsistent (a.k.a. ‘dirty’) data reads and in some cases can actually cause errors as well. Note that this static code analysis check is currently not available in the remote SQL RAP as a Service.
This issue is typically the outcome of using the default 10% auto-growth setting for data / log files. SQL RAP detects all such files and provides us the remediation advice to set the growth increment to a fixed value, such as 300-500MB. Specifically for transaction log files, we also perform a related check to determine the number of virtual log files and flag if they are greater than a threshold.
This is one of many security checks which are performed in the SQL RAP. Specifically, this check uses the PWDCOMPARE function to determine if the password is the same as the login name. In addition the check also identifies passwords which are blank. We also provide you scripts so that you can perform this check on a regular basis.
The SQL RAP has a strong operational element to it. During the SQL RAP the DBA staff has to answer a questionnaire which aims to uncover issues such as those outlined above. Specifically this issue is uncovered after a conversation between the PFE and the DBA staff, and in many cases the business owners of the system. The SQL RAP also identifies related issues around the SLA (or lack thereof) of database backup / restore.
The SQL RAP inherits a lot of checks from the Cluster RAP. One of the most common issues we find therein is that the MSDTC resource has actually been created in the cluster resource group. This can cause performance and stability issues.
By now, you should have a pretty good idea of what the Proactive Premier Assessment Services can do for you. Contact your Microsoft Premier Technical Account Manager to book your SQL RAP or SQL RAP as a Service today!
Posted by Frank Battiston, MSPFE Editor
You need to get an EA to do a SQLRAP to be told this...
Only you were just told this, so whats your point?
Running a SQL RaaS on a clustered SQL Server 2005 SP4 and all the collection steps fail with "There were no Nodes found to iterate against. ViewName: SQLView, etc. I ran the Scoping tool and everything for prerequistes passed validation. It even identified the system as a cluster during the iniital configuration. What possible permissions issue can this be attributed, I am senior system administrator for my company and I know I have all the necessary permissions on servers.