Summary: Guest blogger, Tom Stringer, illustrates how to monitor an AlwaysOn Availability Group by using Windows PowerShell to ensure high availability functionality in an environment.

Microsoft Scripting Guy, Ed Wilson, is here. Today we have another blog in our SQL Server Week series from Microsoft PFE, Thomas Stringer.

Thomas Stringer is a SQL Server premier field engineer at Microsoft. He specializes in scripting (Windows PowerShell), high availability, disaster recovery, performance tuning, internals, querying, and database administration. He also has a software development background in languages and platforms from C all the way through .NET (including C# for WinForm and ASP.NET WebForms/MVC applications).

Photo of Thomas Stringer

Email: sqlsalt@outlook.com
Blog: http://sqlsalt.com
Twitter: @SQLife
LinkedIn: Thomas Stringer

Here’s Part 2…

In Part 1 of this series, Set Up an AlwaysOn Availability Group with PowerShell, I talked about and illustrated how to set up an availability group. But as tenders of data (and the administrators of high availability), we know far too well that these solutions can sometimes get hiccups or not do exactly what we think they are doing. And that’s what this blog post is going to be about: Monitoring our availability groups to check if they are in a state that we expect and desire.

The SMO namespace lends a large number of classes, and this includes the necessary objects and data access to actively monitor our high availability implementation.

Database synchronization

One of the key points of monitoring availability groups is to make sure that the data on secondary replicas is at the correct synchronization state with the primary replica (for instance, with synchronous commit, the secondary replicas should be in a synchronized state). We can monitor this through the DatabaseReplicaState class.

As always, when we expect to have the SMO namespace available to us, we’ll need to load it:

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") |

  Out-Null

First, we will look at a relatively manual way to traverse the availability group structure for this desired information (later in this post, I will explain how to automate the availability group monitoring). What I mean by this is that we’ll explicitly create a Server object for each replica. In this example, the primary replica with a server name of “ALWAYSON1” is $SqlServer and the secondary replica with a server name of “ALWAYSON2” is $SqlServerSec:

$SqlServer = New-Object Microsoft.SqlServer.Management.Smo.Server("ALWAYSON1")

$SqlServerSec = New-Object Microsoft.SqlServer.Management.Smo.Server("ALWAYSON2")

Now that we have these objects initialized, we can start diving into the monitoring. To get the synchronization state of the availability group databases, we can run the following code:

$SqlServer.AvailabilityGroups["AgTest100"].DatabaseReplicaStates |

  Select-Object AvailabilityReplicaServerName,

    AvailabilityDatabaseName,

    SynchronizationState

My availability group is named “AgTest100”, and by using the AvailabilityGroups property of the Server object, I am able to specify which availability group I would like to retrieve by specifying the availability group’s name. We then reference the DatabaseReplicaStates property of the AvailabilityGroup object to get the monitoring information we want. The value of this can be any of the following:

  • Synchronized
  • Synchronizing
  • NotSynchronizing
  • Reverting
  • Initializing

For synchronous commit mode, during normal operation, we should be seeing Synchronized. And for asynchronous commit, we should have Synchronizing. The output from my environment looks like the following screenshot:

Image of command output

This tells me that my availability group database, AdventureWorks2012, on my replicas is in the Synchronized state. If I had more than one database in my availability group, its state would also reflect here.

Replica AvailabilityMode

In the previous section, I talked about how to determine the synchronization status of the availability group databases. What I alluded to is that the SynchronizationState will largely be dependent on the availability mode that the replica is set with (either synchronous commit or asynchronous commit). To find that information, we can run the following statement:

$SqlServer.AvailabilityGroups["AgTest100"].AvailabilityReplicas |

  Select-Object Name,

    Role,

    AvailabilityMode

In my test environment, I get the following result set:

Image of command output

This simple statement gives us all of the replica names, their current role (Primary or Secondary), and most importantly, the AvailabilityMode. This is just more information to get the current picture for what we expect, especially pertaining to SynchronizationState.

Log send and redo

Performance and impact is almost always on our mind as DBAs. We absolutely care about synchronization and operational state, but we also need to be mindful about the impact that our workload and the data transmission is having on our availability group. After all, if you have a synchronous commit replica and that replica is unable to redo transactions quickly enough, you could potentially be impacting performance on the primary replica and having poor user experience. Or if you have an asynchronous commit replica that is designed to handle your reporting requirement, you need to ensure that the data is current enough to comply with the SLA.

Luckily, SQL Server gives us these metrics to determine those few key items. Let’s get Windows PowerShell to retrieve this information:

$SqlServer.AvailabilityGroups["AgTest100"].DatabaseReplicaStates |

  Where-Object {$_.IsLocal -eq $false} |

  Select-Object AvailabilityReplicaServerName,

    AvailabilityDatabaseName,

    LogSendRate,

    LogSendQueueSize,

    RedoRate,

    RedoQueueSize

In my test environment, I get the following output:

Image of command output

We see four important results here. They are going to be rates in KB/s or queue sizes in KB.

  • LogSendRate gives us how quickly the log is able to be sent to secondary databases.
  • LogSendQueueSize shows us the amount of log that is waiting to be sent to the secondary.
  • RedoRate is just that—the rate at which the secondary database is redoing the transactions.
  • RedoQueueSize is the amount of log on the secondary database that is waiting to be redone.

What’s worth noting here that is with consistent and sustained queue sizes, you could potentially be seeing issues with the network (not able to transfer log records quickly enough) or the secondary replica’s inability to redo log records quickly enough. Another consideration is the type of workload that is on the primary replica’s database.

Operational and connection states

There are other factors that we need to take into consideration at the replica level. Mainly, these are the operational and connection states. We can use the following Windows PowerShell to gather this information (this is run on each replica in the availability group):

$SqlServer.AvailabilityGroups["AgTest100"].AvailabilityReplicas |

  Where-Object {$_.Name -eq $SqlServer.Name} |

  Select-Object Name,

    Role,

    OperationalState,

    ConnectionState

$SqlServerSec.AvailabilityGroups["AgTest100"].AvailabilityReplicas |

  Where-Object {$_.Name -eq $SqlServerSec.Name} |

  Select-Object Name,

    Role,

    OperationalState,

    ConnectionState

In my environment, I get the following output:

Image of command output

The OperationalState can be any of the following possibilities:

  • Online
  • Offline
  • Pending
  • PendingFailover
  • Failed
  • FailedNoQuorum
  • Unknown

For normal operations, we expect this to be Online. Other states can reflect failover (potentially a normal operation depending on the surrounding situation) or other issues that will require manual intervention to resolve (for instance, Failed indicates that the replica can no longer talk to the cluster).

The ConnectionState can be one of the following:

  • Connected
  • Disconnected
  • Unknown

During normal operations, we should be seeing a Connected state.

Miscellaneous tips and tricks

We are monitoring our availability group with Windows PowerShell, so we have already taken the first step to ensure that this is a streamlined process. If you noticed earlier, a lot of these queries are run against the primary replica. In my case, currently the primary replica is a server named “ALWAYSON1”. But if we have a listener created, we can also utilize our diagnostic scripts against the listener. Most of the data collection commands are run solely against the primary replica, and we know that the availability group listener (if it exists) points to the primary, so we can take advantage of this connection abstraction like any other client application does.

Another point to consider is that this blog post illustrates how to monitor one availability group. There’s probably a good chance that you have more than one availability group in your environment. My suggestion would be to store a list of servers (in a text file or in a table in a centrally managed database server), and simply loop through them. You can retrieve all of this diagnostic data in a ForEach loop by accessing the AvailabilityGroups property of each server.

~Tom

Thanks for showing us how to monitor AlwaysOn availability groups with Windows PowerShell, Tom!

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy