This post aims to provide you with an introduction into monitoring the health and activity of your WCF- and WF-based applications with Windows Server AppFabric. More specifically, the post will outline the AppFabric tooling features that are built into IIS Manager as well as outline some basic strategies in using these features to monitor your applications.
Introduction
At the center of AppFabric’s monitoring tools is the Dashboard, which provides a centralized gateway to view the health of WCF and WF services deployed locally or to a server farm. It exposes real-time data for durable WF services and historic data for both WCF and WF services. The Dashboard is designed to provide a holistic summary of all positive and negative metrics on your services in a hierarchical form, starting from a high level and allowing you to drill down incrementally to an atomic level via one of our query-able enumeration pages. Consistent with other IIS Manager features, the Dashboard can be viewed from the server, site or application scopes via the tree view in the Connections Pane on the left hand side of the IIS Manager UI.
Before I explain further, it is important to note that the Dashboard sources data from one or more persistence and monitoring databases. In order for metrics of a particular service to be surfaced on the dashboard, it needs to be configured to utilize persistence (storage of persistence data in one or more Persistence databases) and/or configured to utilize event collection (storage of events in one or more Monitoring databases).
Dashboard Structure and Navigation Flow
The Dashboard is divided into three primary sections: Persisted WF Instances, WCF Call History, and WF Instance History. Each section provides a summary of a particular data pivot and drilling down within each section will lead you to the section’s own respective enumeration page. The first section (Persisted WF Instances) presents ‘live’ data while the subsequent sections provide historic metrics that are constrained to a particular time period. The time period can be modified via the ‘Time Period’ drop-down on the Dashboard menu with both predefined and custom options available.
Each section within the Dashboard can be collapsed or expanded. The collapsed view only allows the section’s summary bar to be visible, providing users with aggregate counts of all positive and negative metrics associated with the subject area (e.g. WCF Call History). Expanding the section will display a series of metrics that breakdown the aggregate counts shown on the section’s summary bar into key contributing factors/sources. For example, expanding the WF Instance History section will display a breakdown of activations and failures by the top 5 services as well as a count of the number of instance failures that have been recovered versus unrecovered. All metrics on the Dashboard are clickable, allowing you to drill-down into the counts to see details on each enumerated item via each section’s respective query page.
Monitoring the health of WCF Services
The AppFabric in this release only support persistence on WF services. As such, monitoring of the health of WCF services will be enabled by AppFabric’s event collection capabilities. With event collection enabled the Dashboard provides visibility into WCF calls and service exceptions via the WCF Call History section.
The summary bar of the WCF Call History section within the Dashboard is aimed at providing an aggregate count of all successfully completed calls and WCF service exceptions over a given period of time. Expanding the section provides some key breakdowns that allow you to:
1. Identify services in high demand: The first column lists the top 5 services (when applicable) with the highest number of completed calls over a given period.
2. Identify top exception-causing services: The center column lists the top 5 services (when applicable) that have encountered the highest number of WCF service exceptions over a given period.
3. Gain breakdown of key causes of WCF service exceptions: The purpose of the third column is to provide a numeric breakdown on the key causes of service exceptions: faulted calls and failed calls. It is important to note that service exceptions can also be caused by issues other than failed or faulted calls, such as service activation errors.
All metrics within the WCF Call History section can be clicked on, allowing you to drill-down into the aggregate count to view an enumerated list via the Tracked Events enumeration page. Depending on the metric you selected, the Tracked Events enumeration page will display the corresponding items via running a prepopulated query.
Monitoring the health of WF Services
The Dashboard provides varying levels of monitoring capabilities for WF services. All WF services regardless of durability can be configured to utilize AppFabric’s event collection capabilities, allowing data at varying verbosity to be collected for monitoring and troubleshooting purposes. This data is surfaced on the Dashboard via the WCF Call History and WF Instance History sections. Durable WF services can also utilize AppFabric’s persistence infrastructure which will allow the Dashboard to also provide live visibility into the health of persisted workflow instances. This feature is provided by the Dashboard’s Persisted WF Instances section.
Using historic data for Health Monitoring
Any WF-based service that is configured to utilize Dublin’s event collection capabilities set at ‘Health Monitoring’ level or above will be able to make visible on all historic metrics on the Dashboard. Since WF-based services also use WCF for communication, the WCF Call History section will also expose monitoring data on these services. For the purpose of this sub-topic, I will focus on discussing the WF Instance History section as the WCF Call History section has already been discussed earlier.
The purpose of the WF Instance History section is to provide a historic overview of all workflow instance activations, failures and completions over a given period. These three key metrics are presented in the summary bar of the section. Expanding the section provides some key breakdowns that allow you to:
1. Identify WF services in high demand: The first column lists the top 5 services (when applicable) with the highest number of instance activations over a given period.
2. Identify WF services with most instance failures: The center column lists the top 5 services (when applicable) that have experienced the greatest number of instance failures over a given period.
3. Understand recovered versus unrecovered instances: The purpose of the third column is to put in context the aggregate failure count in terms of what items are potentially still actionable.
All metrics within the WF Instance History section can be clicked on, allowing you to drill down and view an enumerated list via the Tracked WF Instances page. In addition to the instance information available on the page, you are also able to navigate or view all tracked events for a given instance, assuming that event collection is enabled for the parent service.
Using Persistence data for Health Monitoring
For durable WF services that are configured to utilize AppFabric’s persistence capabilities, the Dashboard provides live visibility into running and suspended persisted instances via the Persisted WF Instances section. Sourced by one or more persistence databases, the section offers an overview of what is happening with your durable workflows.
The summary bar of the Persisted WF Instances section contains a numeric breakdown of all running (Active or Idle) and suspended instances currently associated with your environment. When further context is required, expanding the section provides some key breakdowns that allow you to:
1. Identify durable WF services with highest current demand: The first column lists the top 5 services (when applicable) that currently have the most number of active or idle instances.
2. Identify services with most suspended instances: The center column lists the top 5 services (when applicable) that currently have the most number of suspended instances.
Again, like other sections, all metrics within the Persisted WF Instances section can be clicked on, allowing you to drill down and view an enumerated list via the Persisted WF Instances page. The enumeration page not only provides details on each persisted WF instance that satisfy the query conditions, but also supports instance control operations (i.e. Resuming a suspended instance). Similar to the Tracked WF Instances page, you can also navigate to and view all tracked events for a given persisted instance, assuming that event collection is enabled for the parent service.
Summary and Additional Resources
AppFabric’s monitoring tooling is predominantly delivered via four features within IIS Manager: Dashboard, Persisted WF Instances enumeration, Tracked WF Instances enumeration and Tracked Events enumeration. Starting from the Dashboard, AppFabric’s feature set is aimed to surface the health of WCF and WF services and provide incremental drill-downs via query-able enumeration pages to assist in investigation and problem-diagnosis activities.
Next week’s post will focus in more detail on using AppFabric tools to troubleshoot applications. Also for more information on AppFabric monitoring and troubleshooting tools in general, view the endpoint.tv episode with demonstration here.
Goal:
Configuration of WCF 3.5 services can be challenging and .NET 4 introduces additional configuration settings for WF services and new WCF features. Today, there are several options available to edit configuration: svcConfigEditor.exe, Visual Studio, and Notepad.
All of these tools provide a basic, but complete experience that enables you to edit the entire set of configuration knobs available in System.ServiceModel. While many of these configuration knobs may be relevant during development, very few are relevant for application administrators.
So in the Windows Server AppFabric, we have looked at configuration settings that application owners are most likely to tweak, and set out to provide a rich tooling experience for these specific settings.
The list includes:
AppFabric Tools:
Much of the AppFabric management experience resides inside of IIS Manager in order to deploy and manage WAS-hosted services. ASP.NET applications are already configured via IIS Manager, and hence, adding the AppFabric configuration experience helps provide a seamless Web service configuration environment in IIS Manager. We also provide a rich scripting experience for all UI configuration via PowerShell cmdlets. So, most of the settings supported in AppFabric management UI can also be changed using PowerShell, giving you the ability to script post-deployment configuration.
Most of the application/service configuration experience we introduced in IIS Manager can be found in a single configuration dialog available at all IIS scopes:
Figure 1: Configuration dialog at the application scope
Tabs used in the configuration UI provide configuration options for various sections, which gives you a single dialog/UI to edit all configuration from. All of these tabs are context-sensitive, i.e., depending on the scope, they vary slightly. For example: you can only enable Auto-start (Availability tab) at the application or the service scope.
Defaults:
What does it really mean to configure at all of these scopes?
The WCF and WF service project templates in Visual Studio 2010 make use of a new simplification of the configuration system in NET 4: The <service> tag can be omitted in the configuration, in which case, services use certain defaults, including a default service behavior. That default service behavior is by convention the service behavior that has no name (the name attribute is omitted or set to an empty string), aka a “nameless behavior”.
It turns out that many of the settings we have chosen to tool are in fact service behaviors. So when we offer the ability to edit a setting in a service behavior at the service scope (ex: service throttling), we edit whichever behavior the service is using, named or nameless. But at the virtual directory, application, site, and server scopes, when editing that same setting, we will in fact edit a nameless service behavior at that scope.
Now, in the realm of service behaviors merging like regular collections in .NET 4, what this really means is that you can manage an inheritance chain of “default” settings that will apply to your services.
For example, you could define a default SQL WF Persistence store at the server scope. That setting will apply to all WF services on your server that use the default service behavior. And this setting could be modified for specific sites on your server, or even disabled for certain applications or services.
Details worth noting
This default story in AppFabric configuration has one consequence though. If your service is still using the “old configuration” standard of <service> tag and named (i.e., non-default) service behavior, this entire hierarchy of configuration that you set up with the dialog will not apply to the service. We offer a way to opt-into this new world, with the “Use Defaults” button in the configuration dialog’s general tab at the service scope.
Additionally, the monitoring settings defined at the application, site, or server scope will actually apply to all of your services. This is because the monitoring configuration in general is not service behavior based. Note that out of all the monitoring settings, only WF tracking is defined in a service behavior.
Additional information to keep in mind during configuration:
In conclusion, the Windows Server AppFabric leverages the new configuration features introduced in .NET 4 to provide a rich tooling experience in IIS Manager and via PowerShell for the configuration settings that application owners are most likely to change.