Time for a Friday night post. Microsoft PFE raise the absence of system performance baselines pretty much in every Risk Assessment Program (RAP) or Health Check. Customers do not generally create performance baselines.
So, what are the typical uses for a performance baseline? Troubleshooting performance-related issues and facilitating capacity management comes to mind. “Hey, doesn’t Opsmgr have that information by default?” I hear you say. Opsmgr does indeed have some of it, but the performance information, for SQL Server at least, is not complete and only available for 7 days in the OperationsManager database and 10 days in the OperationsManagerDW database (with a default data retention configuration). We need to track performance baselines for systems over a much longer period.
Okay, so what is the plan (and make that step-by-step please)?
- Create a super-thorough WinWord template for a SQL Server (+ underlying Windows OS) Performance Baseline, so we know what performance counters, etc will be required – see attached docx.
- List all current Opsmgr performance counter collections. The following PowerShell script can be used to do this:
Get-PerformanceCounter | Export-Csv $env:HOMEDRIVE\All_Hosts_All_PerfCounters.csv -NoTypeInformation
- Compare the template created in 1 above with the list created in 2 above. Identify performance collections not done by Opsmgr at present. Note: v6.1.314.35 (July 2010) of the SQL Management Pack and v6.0.6667.0 (September 2009) of the Base OS Management Pack are installed in my lab (yep, my OS MP upgrade release schedule for my personal lab is shocking, in terms of current priorities, it comes right after watching Outrageous Fortune, the V TV Series, analysing the skies for noticeable changes to the ozone layer and about a thousand other things). In my lab, this is the list of performance counters that are not collected:
Object Name
Counter Name
Memory
Pages Input/Sec
Memory
Free System Page Table Entries
Paging file
%Usage Peak
Process (sqlservr)
%Processor Time
Process (msmdsrv)
%Processor Time
SQLServer:Access Methods
Forwarded Records/sec
SQLServer:Access Methods
Full Scans / sec
SQLServer:Access Methods
Index Searches/sec
SQLServer:Access Methods
Page Splits/sec
SQLServer:Access Methods
Workfiles Created/sec
SQLServer:Access Methods
Worktables From Cache Ratio
SQLServer:Access Methods
Table Lock Escalations/sec
SQLServer:Transactions
Longest Running Transaction Time
SQLServer:Memory Manager
Granted Workspace Memory (KB)
SQLServer:Memory Manager
Maximum Workspace Memory (KB)
SQLServer:Memory Manager
Memory Grants Outstanding
SQLServer:Memory Manager
Memory Grants Pending
SQLServer:Memory Manager
Total Server Memory (KB)
SQLServer:Memory Manager
Target Server Memory (KB)
SQLServer:Databases
Data File(s) Size (KB)
SQLServer:Databases
Log Bytes Flushed/sec
SQLServer:Databases
Log File(s) Size (KB)
SQLServer:Databases
Log File(s) Used Size (KB)
SQLServer:Databases
Log Flush Wait Time
SQLServer:Databases
Log Flush Waits/sec
SQLServer:Databases
Log Flushes/sec
SQLServer:Databases
Log Growths
SQLServer:Databases
Log Shrinks
SQLServer:Databases
Log Truncations
SQLServer:Databases
Percent Log Used
SQLServer:Buffer Manager
Free List Stalls/sec
SQLServer:Buffer Manager
Lazy Writes/Sec
SQLServer:Buffer Manager
Checkpoint Pages/sec
SQLServer:Buffer Manager
Page Life Expectancy
SQLServer:Buffer Manager
Page Lookups/sec
SQLServer:Buffer Manager
Page Reads/sec
SQLServer:Buffer Manager
Page Writes/sec
SQLServer:Buffer Manager
Readahead/sec
SQLServer:Buffer Manager
Database Pages
SQLServer:Buffer Manager
Procedure Cache Pages
SQL Server:Buffer Manager
Target Pages
SQL Server:Buffer Manager
Free Pages
SQL Server:Buffer Manager
Stolen Pages/sec
SQL Server:SQL Statistics
Batch requests/sec
SQL Server:SQL Statistics
SQL Attention Rate/sec
SQL Server:Cursor Manager by Type
Active Cursors
SQL Server:SQL Errors
Errors/sec
SQL Server:Deprecated Features
Usage
SQL Server:General Statistics
Logouts/sec
SQL Server:Latches
Latch Waits/sec
SQL Server:Latches
Avg Latch Wait Time (ms)
SQL Server:Latches
Total Latch Wait Time (ms)
SQL Server:Locks
Lock Wait Time (ms)
SQL Server:Locks
Avg Wait Time (ms)
- Create custom windows performance collection rules in Opsmgr for the missing performance counters (i.e. those above). Note: There are obvious best practices that should be followed to make sure that you don’t blow-up, destroy or cripple your Opsmgr or SQL Server systems (or at least create a semblance of following good practice)…e.g. deploy in your test environment first (yep, soooo many of us have them for Opsmgr, right?), choose a sensible performance collection sample frequency (tip, every 1 sec is not it), calculate the additional storage requirements in the OperationsManager and OperationsManagerDW databases, enable the new rules for a limited number of target instances to start with, etc.
- Okay, now use PowerShell to export raw data for all of the performance counters in the attached file. The script below will export ‘SQL Re-Compilations/sec’ counter data for all ‘Microsoft.SQLServer.2008.DBEngine’ discovered instances.
Get-MonitoringClass -Name "Microsoft.SQLServer.2008.DBEngine" | Get-MonitoringObject | % `{$aaa=$_.Id;$bbb=$_.PathName;Get-PerformanceCounter | ? {$_.MonitoringObjectId -eq $aaa -and $_.ObjectName -eq "SQLSERVER:SQL Statistics" -and $_.CounterName -eq "SQL Re-Compilations/sec"} | Get-PerformanceCounterValue -StartTime ((get-date).adddays(-7)) -EndTime (get-date) | Select TimeSampled,SampleValue | Export-Csv -NoTypeInformation $env:HOMEDRIVE\$bbb.csv}; - Manually edit the script above for all desired counters…
…I know, I know…it is real easy to automate that <sigh>…The PowerShell script below will get the raw performance data only for relevant Objects and Counters specified in a CSV file:
Note: The CSV format is simply: Object,Counter. Use at least two different CSV's for the OS and SQL Server and replace "Microsoft.SQLServer.2008.DBEngine" with "Microsoft.Windows.OperatingSystem" (or a class more relevant to your scenario like "Microsoft.Windows.LogicalDisk") and the pointer to the CSV in the above script for the OS performance data collection.Import-Csv c:\JVTools\SQL.csv | % `
{$Counter=$_.Counter;$Object=$_.Object;Get-MonitoringClass -Name "Microsoft.SQLServer.2008.DBEngine" | Get-MonitoringObject | % `{$aaa=$_.Id;$bbb=$_.PathName;$Object;$Counter;$count++;$CounterStr=$Counter-replace("/"," per ");$ObjectStr=$Object-replace(":","-");Get-PerformanceCounter | ? {$_.MonitoringObjectId -eq $aaa -and $_.ObjectName -eq $Object -and $_.CounterName -eq $Counter} | Get-PerformanceCounterValue -StartTime ((get-date).adddays(-7)) -EndTime (get-date) | Select TimeSampled,SampleValue | Export-Csv -NoTypeInformation $env:HOMEDRIVE\$Count$ObjectStr$CounterStr$bbb.csv}}; - Finally, manually crunch the data in Excel and complete the attached WinWord template…<sigh> or use the Measure-Object (not Measure-Command) in PowerShell to create the 7 day averages where required. Now at least manually fill out the WinWord template and create the baseline graphs-over-time…<sigh> or use PowerShell to do that as well
. A post for another day perhaps? - Repeat this baseline creation every 3 or so months (depending on how dynamic and/or important your system is), before and after major changes like service packs, etc. Use it for troubleshooting, load balancing, fighting off business and application owners that want to overload your system and to help in getting that next salary increase when your boss asks you for a list of proactive things you have done lately.
I know that this data can also be retrieved from the OperationsManagerDW database in SQL (the OperationsManager database is not a good idea from an Opsmgr performance perspective...as the SQL guy, you may have the pesky Opsmgr Admin on your case with his own pesky Opsmgr performance baseline indicating that system stats degraded since you started fiddling) and I also know that a custom Opsmgr report to replace the WinWord template is very possible. That is a post for another time, this time the focus was on using PowerShell. I am also aware that this view is largely server-centric and that a more holistic end-to-end service performance baseline would be a useful thing...hmm Opsmgr does include features for creating Distributed Applications...wonder if I can target that with the MonitoringObject?
Have fun!
Johan