The SQL Management Pack has been updated. This is the biggest update to the SQL MP, probably since we added support for SQL 2008.
Version 6.1.314.36 brings many changes to the MP – so you should give careful consideration to this one. It is not simply a handful of bug fixes, for sure.
Download location and Guide: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=8c0f970e-c653-4c15-9e51-6a6cadfca363
The core updates, per the guide:
- Removed DMO installation requirement.
Let’s talk about this one. In previous SQL MP’s – SQL-DMO was required for the MP to successfully discover and monitor the SQL instances. SQL-DMO became “deprecated” in SQL 2005…. however, previous versions of the SQL MP (even for SQL 2008) required SQL-DMO to get installed as an add-on. Now – that is no longer the case. We no longer have a dependency on DMO, and that does not need to be installed on your SQL 2005, SQL 2008, nor SQL 2008R2 instances.
- Added support for SQL Server 2008 R2 and removed support for SQL Server 2000.
So – we are adding support for SQL 2008R2. When we say removing support for SQL 2000 – we should clarify that. You can still use and monitor SQL 2000, just like you always did. We just aren't shipping those specific MP’s with the SQL MP anymore, as they will no longer be updated. (Mainstream support for SQL 2000 technically ended in April of 2008, so hopefully you don’t have much SQL 2000 systems running around). You can continue to use the existing MP’s you already have for SQL 2000…. and we have a separate download you can request from Microsoft if you never had any version of them.
- Made security improvements.
- Documented settings for low-privilege environments.
This part appears super-duper complicated at first…. hopefully I can clear up some of this configuration, later on in this article.
- Introduced new rules and monitors, updated existing rules and monitors, and improved Knowledge Base information.
- Populated empty alert descriptions.
- Improved database consistency check monitoring and introduced detailed configuration monitoring for the ‘SQL Database’ object.
- Reconfigured space monitoring takes into account autogrow settings and all levels of storage hierarchy in SQL Server (DB File, DB Log File, DB File Group and Database).
Umm, this is HUGE. Previous versions of the MP did not monitor free space at all when a database was set to auto-grow enabled. This was a much requested option and is a big step for customers who use autogrow to keep an outage from occurring, but still need to know when DB’s/Files are filling up.
- Fixed the following issues:
- File Group discovery fails when Databases are excluded.
- SQL DB Engine version is incorrect
- “Last Run Status” monitor changes states inappropriately.
- SQL Agent Job Discovery fails when some job properties are NULL.
- Default interval for the rule “Logins per second” is not compliant with MPBA.
- Monitor “SQL Server Windows Service” turns to red state when service is restarting.
- “Script : Failed to login” alert is triggered if service is unavailable.
- DB files, DB file groups and DB Log files discoveries are not triggered consistently.
- Frequently changed properties of DB affect performance of the monitored system.
- Discovery intervals for “Discover Databases for a Database Engine”, “Discover Replication Components”, and “Agent job discovery” are not compliant with MPBPA.
- Full Text Search Service Start/Stop tasks do not work on SQL 2008 Cluster.
Whoa! I totally missed the part highlighted above on first pass. This is a big step! The SQL MP used to be one of the biggest contributors to CONFIG CHURN in customer management groups. This was because the database and log size were captured as a property of the DB class, which changed on each run of the discovery. Now these values are hard coded to output “0” and this is why. This is a VERY good thing and it means the SQL MP is no longer a contributor to config churn at all!
- Disabled the following monitors and rules to cut down noise:
- SQL Server Full Text Search Service
- Blocking Sessions
- Long Running Jobs
- Auto Close Configuration
- Auto Create Statistics Configuration
- Auto Shrink Configuration
- Auto Update Statistics Configuration
- DB Chaining Configuration
- DB Total Space
- DB Space Percentage Change
- SQL Server Service Broker or Database Mirroring Transport stopped
- SQL Server Service Broker transmitter shut down due to an exception or a lack of memory
- SQL Server Service Broker or Database Mirroring is running in FIPS compliance mode
- The SQL Server Service Broker or Database Mirroring transport is disabled or not configured
- An SQL Server Service Broker procedure output results
- The Service Broker or Database Mirroring Transport has started
- Process Worker appears to be non-yielding on Scheduler
- IO Completion Listener Worker appears to be non-yielding on Node
- An SQL job failed to complete successfully
- IS Service has attempted to stop a running package
Now – this part is interesting. It appears these monitors and rules were the biggest noisemakers out of the box. I suppose that not all customers needed these, they require some special tuning for your environment – so these are present, but now disabled “out of the box”. If you need/use these rules and monitors – you will need to enable them, as they are now considered “optional”.
- Deprecated the following monitors and rules:
- DB Space Free (MB)
- DB Log File Space Free (%)
- DB Log File Space Free (MB)
- Disk Free Space
- Collect Database Size (MB)
- Collect Transaction Log Free Space (MB)
- Collect Transaction Log Free Space (%)
- Collect Transaction Log Size (MB)
- An exception occurred while encrypting a message in the target queue
- Could not find column in syscolumns for object in database
- DBCC executed found and repaired errors
- Cannot retrieve row from page by RID because the slotid is not valid
- Could not retrieve row from page by RID because the requested RID has a higher number than the last RID on the page
- The nonclustered index indicated by the index ID is in error
- Table error: Page is allocated to different object found in page header
These deprecated monitors and rules should not be re-enabled – these have been disabled and deprecated as they are replaced with new functionality. The reason we have to leave behind “DEPRECATED” rules and monitors, is to maintain MP upgrade capability. The guide actually missed a couple rules. Here is the list below:
That is a LOT of changes! Let’s give it a shot.
So – I import the MP’s. No issue.
Immediately – I start to get new alerts that I have never seen before.
We are off to the races!
So – when SQL server is installed, the DEFAULT security gives BUILTIN\Administrators and NT_AUTHORITY\SYSTEM (Local System) the “Systems Administrator” role in SQL.
Now – I like to run my SQL servers like my customers do. Generally, they will add a global group to SQL security, giving it SA rights. This global group will contain user administrative accounts for accessing SQL, these are generally the DBA team. Then they delete BUILTIN\Administrator from having ANY rights to SQL, and will limit NT_AUTHORITY\SYSTEM (Local System) to only have very specific SQL rights to the instance. See: http://support.microsoft.com/kb/932881
This secures their SQL instances, so that nobody except the SQL team has rights to the SQL instance, SQL configuration, or SQL databases.
Now, this poses a problem for monitoring. Monitoring workflows require elevated rights to be able to discover and fully monitor all aspects of SQL instance, databases, agents, jobs, etc. In most typical environments – you have deployed your agents to use Local System as the default agent action account. This is good. However – this account now has very limited rights and cannot monitor SQL fully. This is REALLY evident from all the “Run As Account does not exist on the target system” alerts…. with REALLY high repeat counts.
I am not a fan of all the “Alert Generation was temporarily suspended” alerts either… this means because of this lack of security situation, we are causing a SQL workflow to flood OpsMgr. This is caused when you have a LOT of databases on the SQL instance.
You will also see your Application logs on the SQL servers spammed with the following:
Log Name: Application
Date: 8/12/2010 6:27:24 PM
Event ID: 18456
Task Category: Logon
Keywords: Classic,Audit Failure
Login failed for user 'NT AUTHORITY\SYSTEM'. Reason: Failed to open the explicitly specified database. [CLIENT: 10.10.10.7]
All this basically means is – Local System (or whatever you are using as your default agent action account) does not have enough rights to SQL to perform all the monitoring tasks. Now – the quick fix – is to grant Local System “SA” Role in the SQL instance.
If that isn’t allowed via policy, then OpsMgr provides “RunAs” capability – to run specific workflows as another user account. This is documented in the guide.
This is where it is critical to READ THE ENTIRE GUIDE.
There is a section which would have avoided all this noise… from the guide:
To avoid monitoring noise
- If you are upgrading from a previous version, export and save your current management pack with any customizations so that you can roll back the installation if needed.
- Import the library file.
- Define Run-As accounts.
- Import the discovery file.
- Make sure that the required objects are discovered. In case of security alerts, adjust Run As accounts. If the list of discovered objects is not as expected, enable or disable discovery for management groups.
- Import the monitoring file.
- Customize the management pack.
Now – that would all work just hunky-dory… if you were just getting started with the SQL MP. However – for most customers…. you will have already been running the SQL MP… and would have already had to deal with these issues, as the security/access thing is not something that’s new… it always existed because SQL is one of the few applications that has its own security access model outside the operating system.
So let this be a valuable lesson – READ THE GUIDE, FIRST!
We start with Before you import. This tells us which MP files we need, and any other settings we need to consider (like enabling agent proxy on all clustered agents)
Next, we move on to the Import. We already did that above. I strongly recommend NOT using the catalog “download from web” function here – you should download the MSI, extract it, and import from there. Customers often get into trouble using the web download – sometimes they only upgrade existing MP’s and miss new MP’s added to the MSI.
Next – Create a new management pack for customizations. These will be our override MP’s. I always recommend at a minimum – one override MP per technology. This would mean we will need two override MP’s: one for SQL 2005 and one for SQL 2008. Alternatively you could just do one for SQL overall. I like to keep mine very specific. Makes it easier to jettison the SQL 2005 MP’s down the road when I no longer support SQL 2005. In my case – I create the following two unsealed MP’s for overrides:
- Overrides – SQL 2005
- Overrides – SQL 2008
Next section in the guide talks about optionally disabling the SQL full-text service monitors for SQL 2005 and 2008. However – this is in error – because in this new version of the MP – those have been disabled by default – out of the box. So no need to mess with those unless you want them enabled.
Next up – the section on alerting for services that are set to Automatic. By default – service unit monitors only monitor the service if it is set to Automatic startup type. HOWEVER – on clustered DB engines – this is set to Manual. This means we will NOT alert on those services, ever. Now – before you freak out – that’s ok… because the *Cluster* MP will alert if any resource group in a cluster is offline or partially online… which would catch this condition. However – it is possible you don’t send cluster alerts to your SQL team. In that case – you can modify these SQL service monitors to monitor services in any startup type. See the guide for details – this is a simple override. The following unit monitors need an override for this, for both SQL 2005 and SQL 2008:
- SQL Server Windows Service (for both classes SQL 200x DB Engine and SQL 200x DB)
- SQL Server Reporting Services Windows Service
- SQL Server Analysis Services Windows Service
- SQL Server Integration Services Windows Service
- SQL Server Full Text Search Service Monitor
- SQL Server Agent Windows Service
***Note – for SQL 200x DB – the ability to set this startup type override is not present. Not an issue – as you will get alerted from the Engine class.
So that is 12 overrides, in total, if you did them all. I will be overriding ONLY the setting for SQL Server Windows Service monitor on SQL 200x DB Engine, and SQL Server Agent Windows Service monitor on SQL Server 200x Agent class, as those are the only two clustered services I monitor.
Next up in the guide is the Optional Configuration section. This is where you could enable optional discoveries for object types that are disabled out of the box. There is a chart in the guide which helps you know what's enabled out of the box, and what isn't. For the most part, EVERYTHING is enabled out of the box except two things: Replication Components, and Agent Jobs. This is GOOD… as we don’t provide a lot of monitoring for Replication components (so why discover them?) and we monitor SQL agent jobs already by monitoring the SQL Agent itself. You MIGHT consider enabling the SQL agent job discovery – if you want very granular monitoring on each SQL Agent job – and have the ability to have unique overrides and configuration settings on a per-job basis.
Now is a good time to review the “Key Monitoring Scenarios” which describes some of the advanced monitoring, and manual configuration you might need to do. This version of the MP offers a LOT of monitoring that did not exist in the previous MP for DB’s with autogrow enabled, and disk space available should lots of databases exist on the same volumes.
You need to consider how you want to monitor long running SQL agent jobs – per agent, or per job… and understand the impact of doing so.
You need to ensure you are monitoring for job failures, and that your SQL agent jobs are configured to write an event to the event log in order for this rule to work.
Think about blocking spids – and look at the default time of 60 seconds, and make sure that is correct for your environment.
Consider all you SQL instances – and if you have some you don’t want to discover, either by server/computer, or by name. See THIS LINK for ideas there.
Be aware of the issue that many SQL workflows may share a script – and that script will cook down as long as you don’t make different changes (overrides) on workflows that share a common datasource (script). We list thos workflows to educate you on that, and the performance impact you will see if you do it wrong, in this part of the guide.
This section also covers the rules we disabled by default – to reduce noise. You can review these and consider overrides for any you require. Tip – start with defaults. Don’t go crazy from the get-go.
Ok – at this point following the guide - we have covered:
- Before You import
- Import the Management pack
- Create a management pack for customizations
- Optional configuration
- Reviewed the discovery chart
- Reviewed key monitoring scenarios.
Now – on to what is probably the most challenging part – Security.
This part of the guide is incredibly complicated, leaves out some of the key areas on how to set up “More Secure” and “Less Secure” Options when using Run As accounts. This is a common issue I see in MP guides – they just link to the generic documentation on setting up and configuring Run-As accounts and profiles. That is really incomplete – because each MP should really tell you what classes or objects you need to associated these with in the profile. You are left to guess here, unfortunately.
I will do my best to describe what I see are the most common (and palatable) options for setting up security with the SQL MP.
Scenario #1. You use Local System as the default agent action account.
You accept the default SQL permissions, where local system has the “SA” role to the SQL instance. In this case – the default agent action account has full rights to the operating system and to SQL. No other configuration or use of Run-As accounts is necessary.
Scenario #2. You use a domain user account as the default action account.
This account is a member of the local administrators group on the server OS. This account has been delegated “SA” rule in SQL explcitly, or via group membership (such as the default “Builtin\Administrators” SQL login, that has SA rights by default) In this case – the default agent action account has full rights to the operating system and to SQL. No other configuration or use of Run-As accounts is necessary.
Scenario #3. You use Local System as the default action account.
However, the SQL team has restricted the NT_AUTHOPRITY\SYSTEM (Local System) SQL login, and removed the “SA” right. In this case, the Local System account has full rights to monitor the server OS, however, does not have enough rights to discover and monitor the SQL application. In this case – we would use a Run-As account(s) to manage access for the SQL workflows only, to execute under this Run-As account. This account(s) can be created and fully managed by the SQL team.
Scenario #4. You use a domain user account as the default action account. This account is a member of the Local Administrators group.
However, the SQL team has restricted or deleted the BUILTIN\Administrators SQL login, thereby removing the “SA” right. In this case, the default agent action domain user account has full rights to monitor the server OS, however, does not have enough rights to discover and monitor the SQL application. In this case – we would use a Run-As account(s) to manage access for the SQL workflows only, to execute under this Run-As account. This account(s) can be created and fully managed by the SQL team.
So – I will cover scenarios #3 and #4 above – setting up your run-as accounts.
The SQL MP guide does a pretty good job of depicting *one way* to handle this. The steps they outline are quite complicated, but done in a manner that is consistent with SQL security best practices. They use specially created Domain group memberships, and 3 distinct Run-As accounts/Profiles. They also specify that their example – does NOT SUPPORT CLUSTERED SQL SERVER. Now – the majority of my customers run almost all their SQL instances in a cluster – so we really cannot use the methods listed in the guide, until they can support the majority of SQL instances in large enterprises, which are clustered.
At a high level – the simplest configuration using Run-As is below:
- Create a Run-As Account in OpsMgr, using a Windows Account, associating this with a domain user account.
- In the OS – ensure this Run-As account is a member of the Local Administrators group.
- In SQL – create a SQL login, associate it with this same Run-As account. Grant this account the SA (System Administrator) role to the instance.
- Associate this Run-As account to each of the 3 profiles
- SQL Server Default Action Account
- SQL Server Discovery Account
- SQL Server Monitoring Account
The above scenario – will run all OpsMgr monitoring under the default agent action account security context. However, all SQL Server MP workflows that support Run-As, will execute under this special account.
This is the most supportable scenario that will require the least impact, if you have to use Run-As. However, in some SQL environments, you are not allowed to grant the “SA” right to any accounts other than actual DBA’s. In these cases – you can use the instructions in the SQL MP guide to configure this account to only have very explicit access to SQL, as defined in the MP.
For a deeper dive into configuring Run-As for the SQL MP – see the following article: