Azure Operational Insights Onboarding Troubleshooting Steps

Azure Operational Insights Onboarding Troubleshooting Steps

  • Comments 15
  • Likes

[NOTE -  System Center Advisor is now a part of the new Microsoft Azure Operational Insights - Click to learn more]

 

This article provides a series of steps and different procedures and known troubleshooting hints for either Operations-Manager attach mode or for Direct Agent. Some sections apply to both, some only to one type of reporting infrastructure (SCOM or DA).

If you none of these steps work for you send us an email scdata@microsoft.com and we would be more than happy to help get your issue resolved.

Or post in the feedback forum http://aka.ms/opinsightsfeedback

W also have an MSDN Forum home-forum=opinsights but use it more for general community questions, as we can’t see your email and can’t respond back directly to help out on that channel.

Follow us on Twitter @OpInsights and feel free to engage, but be aware that many questions can’t be answered in 140 characters!

Now you know how to reach us – let’s walk thru various known errors and procedures. Depending on the topology (SCOM or Direct Agent – or just Azure storage) you use, some sections will not apply.

 

SCOM REGISTATION – ERROR 3000
We have had a few customers run into the “Error 3000: Unable to register to the Advisor Service” while trying to connect their OpsMgr 2012 Management group to System Center Advisor.

Error 3000: Unable to register to Advisor Service.

There are two reason why a customer may run into this:

  1. The server clock is off sync with the current time by more than 5mins. You can resolve this pretty easily by changing the clock time on your server to match the current time, you can accomplish this with opening command prompt as an Administrator type w32tm /tz to check the time zone, and w32tm /resync to sync.
    Note that, even if your clock says it is synchronized (i.e. with your company’s time server), it might still be out of sync with the one of our machines in Azure… since the time window is only 5 minutes, this often is the issue too. Verify you are synchronizing with a reliable time server ON THE INTERNET. You can further troubleshoot this type of issue by enabling VERbose tracing on the Management Server/Console machine

      use this article http://support.microsoft.com/kb/942864 to learn about OpsMgr tracing. In a nutshell you need to do

      StartTracing.cmd VER
      - reproduce the issue –
      StopTracing.cmd
      FormatTracing.cmd

      in the formatted trace files you should find an exception saying the token was rejected because it was not yet valid or expired, or similar phrasing.

  2. Their internal proxy server\firewalls are blocking communication to the Advisor service endpoints. We provide detailed instructions for this second case in this article. Read on.

IF YOU HAVE A PROXY - REGISTRATION / CONFIGURATION STEPS

Depending on your proxy configuration, you might not be able to register at all, or – even when you do manage to register – some communication from SCOM to the service will later fail and scenarios might not light up in the portal. We describe the type of communications and endpoints you need to allow your management servers, console and direct agents to talk to in order for OpInsights to work for you.

Step 1: Request exception for the service endpoints

The following domains and URLs need to be accessible through the firewall/proxy for the management server to access the Azure Operational Insights Web Services

Management Server

URL

Ports

service.systemcenteradvisor.com

scadvisor.accesscontrol.windows.net

scadvisorservice.accesscontrol.windows.net

*.blob.core.windows.net/*

data.systemcenteradvisor.com

ods.systemcenteradvisor.com

*.ods.opinsights.azure.com

*.systemcenteradvisor.com

Port 443

Port 443

Port 443

Port 443

Port 443

Port 443

Port 443

Port 443

Large Volume scenarios / intelligence packs and OpsMgr agents

Note that with some upcoming intelligence packs (i.e. ‘Security and Audit’), given the large volume of data sent for those scenarios (Windows Security Logs), the agents, even if reporting to OpsMgr and receiving configuration from the OpsMgr Management Grup, will report data directly (=without queuing thru the management server) to the cloud. The destination needed for this communication is the following

URL

Ports

*.ods.opinsights.azure.com

Port 443

Note that the proxy setting specified in Step 2 below will be automatically propagated to OpsMgr agents.

Operations Manager Console

The following domains and URLs need to be accessible through the firewall to view the Advisor Web portal and OpsMgr Console (to perform ‘registration’ to Azure Operational Insights).

Resource

Ports

*.systemcenteradvisor.com

*.live.com

*.microsoft.com

*.microsoftonline.com

login.windows.net

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Ports 80 and 443

Also ensure the Internet Explorer proxy is set correctly on your computer you are trying to login with. Especially valuable test is to try and connect to a SSL-enabled website, i.e. https://www.bing.com/ – if the HTTPS connection doesn’t work from a browser, it probably also won’t in the Operations Manager Console and in the server modules that talk to the web services in the cloud.

 

Directly-connected Agents

Direct Agent does not us your credentials to connect to the workspace: you have to enter workspace id and key. Those credentials are used for registration, after the agent is registered, a certificate is used. Direct Agent only needs to connect to the following destinations

URL Ports

*.blob.core.windows.net/*

*.oms.opinsights.azure.com

*.ods.opinsights.azure.com

ods.systemcenteradvisor.com

Port 443

Port 443

Port 443

Port 443

Once you have completed registering your OpsMgr Environment to the Advisor Service you need to follow Steps 2, 3 and 5 to allow your Management servers to send data to the Advisor Web Service (step 4 is only required if you have an old patch level… but you are running the latest update rollup, right?).

Step 2: Configure the proxy server in the OpsMgr Console

  • Open the OpsMgr Console

  • Go to the “Administration” view

  • Select “Advisor Connection” under the "System Center Advisor" Node

Click “Configure Proxy Server”

  • Check the checkbox to use a proxy server to access the Advisor Web Service
  • Specify the proxy address in the http://proxyserver:port format

  


Step 3: Specify credentials for OpsMgr if the Proxy Server requires Authentication

If your proxy server requires authentication, you can specify one in the form of an OpsMgr RunAs account and associate it with the ‘System Center Advisor Run As Profile Proxy’

  • In the OpsMgr Console, go to the “Administration” view

  • Select “Profiles” under the "RunAs Configuration" Node

  • Double click and open “System Center Advisor Run As Profile Proxy


  • Click ‘Add’ to add a 'RunAs Account'. You can either create one or use an existing account. This account needs to have sufficient permissions to pass through the proxy
  • Set the Account to be targeted at the ‘Operations Manager Management Servers’ Group
  • Complete the wizard and save the changes

  

Note: not all code paths currently support authentication. It is still possible that you will need to set some of those exclusions mentioned in Step 1 to allow anonymous traffic to some of those destinations. We will keep this document up-to-date as this requirement evolves.


Step 4: Configure the proxy server on each UNPatched OpsMgr Management Server for WinHTTP

NOTE: this step is NO LONGER required IF you UPDATED your Management Servers to Update Rollup 3 for System Center 2012 R2, or Update Rollup 7 for System Center 2012 SP1 (or newer ones). In fact, we recommend you don’t do this step and just upgrade to the latest Rollup if you can!

  • Open Command Prompt as an Administrator on the Management Server

  • Type netsh winhttp set proxy myproxy:80

  • Restart the ‘System Center Management’ Service (HealthService)
  • Do step 2 on each of your management servers in your management group

Step 5: Configure the proxy server on each OpsMgr Management Server for Managed code

There is another setting in Operations Manager, which is intended for general error reporting, but we have noticed that - when set - due to the same modules being used in multiple workflows, this proxy setting also ends up affecting Advisor connector's functionality.
The recommendation is therefore to also set it (to the same proxy you set in the other places) for each and every MS if you use a proxy.

  • In the OpsMgr Console, go to the “Administration” view

  • Select “Device Management” and then the "Management Servers" Node

  • Right-click and choose “Properties” for each MS (one at the time) and set the proxy in the “Proxy Settings” tab.

Proxy settings per MS

If none of the above steps resolve your issue please let us know and we will help you!

VERIFYING IF THINGS ARE WORKING POST COMPLETING THE CONFIGURATION WIZARD

Procedure 1: Validate if the right Management Packs get downloaded to your OpsMgr Environment

Note: Depending on which Intelligence Packs you have enabled from the OpInsights Portal will you see more or less of these MPs. Search for keyword ‘Advisor’ or ‘Intelligence’ in their name.

Advisor Management Packs in SCOM

You can additionally check for these MPs using OpsMgr PowerShell and typing these commands

get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken

get-scommanagementpack | where {$_.DisplayName -match 'Advisor'} | select Name,DisplayName,Version,KeyToken | Out-GridView

Note: if you are troubleshooting Capacity Intelligence Pack, check HOW MANY management packs with the name containing ‘capacity’ you have: there are two management packs that have the same display name (but different internal ID’s) that come in the same MP bundle; if one of the two does not get imported (often due to missing VMM dependency) the other MP does not get imported and the operation does not retry.

You should see the following three MPs related to ‘capacity’

  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Intelligence Pack
  • Microsoft System Center Advisor Capacity Storage Data

if you only see one or two of them but not all three, remove it and wait 5/10 minutes for OpsMgr to download and import it again – check the event logs for errors during this period.

 

Procedure 2: Validate if the right Intelligence Packs get downloaded to your Direct Agent

In Direct Agent you should see the Intelligence Packs collection policy being cached under C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Management Packs

Intelligence Packs on Direct Agent

 

Procedure 3: Validate if data is being sent up to the Advisor service (or at last attempted)

  • Open ‘Performance Monitor’
  • Select ‘Health Service Management Groups’

  • Add all the counters that start with ‘HTTP’

  • If things are configured right you should see activity for these counters, as events and other data items (based on the intelligence packs onboarded in the portal, and the configured log collection policy) are uploaded. Those counters don’t necessarily have to be continuously ‘busy’ - if you see little to no activity it might be that you are not onboarded on many intelligence packs or have a very lightweight collection policy.

 

Procedure 4: Check for Errors on the Management Server or Direct Agent Event Logs

As a final step if all of the above fails see if you are seeing any errors in Event Viewer –> Application and Services –> Operations Manager event log and filter by Event Sources: Advisor, Health Service Modules, HealthService and Service Connector (this last one applies to Direct Agent only). You can copy these event and post them in the ‘Feedback’ forum so we in the product team can help you further. Most of these events would be also be found on Direct Agent, the steps for troubleshooting would be similar. The only part that differs between SCOM and Direct Agent is really the registration process:

  • in SCOM you have a nice wizard with browser integration that lets you pick your workspace as a user/admin then SCOM takes care of exchanging certificates and uses those for MP download and data transfer/upload to OpInsights
  • in Direct Agent you just copy/paste the workspace id and key and those are used to authenticate / prove that it’s really you registering those agents and you own that workspace, and then certs are exchanged under the hood by the service similarly to SCOM and used the same way

Hence, many of these events apply to both types of reporting infrastructure.

Open Event Viewer –> ‘Application and Services’ –> ‘Operations Manager’ and filter by Event Sources: Advisor, Health Service Modules, HealthService and Service Connector (this last one applies to Direct Agent only).

  

A few of the ‘bad’ events you might see when looking if things aren’t working are described in the following table:

EventID Source Meaning Resolution
2138 Health Service Modules Proxy requires authentication Follow step 3 and/or step 1 above
2137 Health Service Modules Cannot read the authentication certificate Re-running the Advisor registration wizard will fix certificates/runas accounts
2132 Health Service Modules Not Authorized Could be an issue with the certificate and/or registration to the service; try re-running the Advisor registration wizard that will fix certificates and runas accounts. Additionally, verify the proxy has been set to allow exclusions as in step 1 above, and/or verify authentication as in step 3 (and that the user indeed has access thru the proxy)
2129 Health Service Modules Failed connection / Failed SSL negotiation There could be some strange TCP settings on this server. Check this other blog post from the community for such as case http://jacobbenson.com/?p=511
2127 Health Service Modules Failure sending data received error code If it only happens once in a while, this could be just a glitch. Keep an eye to understand how often it happens. If very often (every 10 minutes or so throughout the day), then it is an issue – check your network configuration, proxy settings described above, and re-run registration wizard. But if it only happens sporadically (i.e. a couple of times per day) then everything should be fine, as data will be queued and retransmitted.
Some of the HTTP error codes have some special meanings, i.e.:
- the FIRST time that a MMA direct agent or management server tries to send data to our service, it will get a 500 error with an inner 404 error code – 404 means not found; this indicates that the storage area we’ll use for this new workspace of yours isn’t quite ready yet – it is being provisioned. On next retry, this will however be ready and flow will start working (under normal conditions).
A 403 might indicate a permission/credential issue, and so forth. There are more information on the 403 below in the Direct agent specific section of this post.
2128 Health Service Modules DNS name resolution failed You server can’t resolve our internet address it is supposed to send data to. This might be DNS resolver settings on your machine, incorrect proxy settings, or a (temporary) issue with DNS at your provider. Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
2130 Health Service Modules Time out Like the previous event, depending if it happens constantly or ‘once in a while’ it could be an issue – or not.
4511 HealthService Cannot load module "System.PublishDataToEndPoint" – file not found Initialization of a module of type "System.PublishDataToEndPoint" (CLSID "{D407D659-65E4-4476-BF40-924E56841465}") failed with error code The system cannot find the file specified.

This error indicates you have old DLLs on your machine, that don’t contain the required modules. The fix is to update your Management Servers to the latest Update Rollup.
4502 HealthService Module crashed If you see this for workflows with names such as CollectInstanceSpace or CollectTypeSpace it might mean the server is having issues to send some data. Depending on how often it happens - constantly or ‘once in a while’ - it could be an issue or not. If it happens more that every hour it is definitely an issue. If only fails this operation once or twice per day, it will be fine an able to recover. Depending on how the module actually fails (description will have more details) this could be an on-premises issue – i.e. to collect to DB – or an issue sending to the cloud. Verify your network and proxy settings, and worst case try restarting the HealthService.
4501 HealthService Module "System.PublishDataToEndPoint"  crashed A module of type "System.PublishDataToEndPoint" reported an error 87L which was running as part of rule "Microsoft.SystemCenter.CollectAlertChangeDataToCloud" running for instance "Operations Manager Management Group" with id:"{6B1D1BE8-EBB4-B425-08DC-2385C5930B04}" in management group "SCOMTEST".

You should NOT see this with this exact workflow, module and error anymore, it used to be a bug *now fixed* tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6714689-alert-management-intelligence-pack-not-sending-ale
4002 Service Connector The service returned HTTP status code 403 in response to a query.  Please check with the service administrator for the health of the service. The query will be retried later.

You can get a 403 during the agent’s initial registration phase, you’ll se a URL like

https://<YourWorkspaceID>.oms.opinsights.azure.com/ AgentService.svc/AgentTopologyRequest

Error code 403 means ‘fordbidden’ – this is typically a wrongly-copied WorkspaceId or key, or the clock is not synced (just like for ‘error 3000’ in SCOM at the beginning of this article) – see more here

Procedure 5: Look for your agents to send their data and have it indexed in the Portal

Check in the OpInsights Portal, from Overview page navigate to the small tile on the right end side ‘Servers and Usage’ – this will show if management groups (and their agents) and direct agents are reporting data into search. The number of agents on the tile is derived from data – if machines don’t report for 2 weeks they’ll drop off the radar:

Servers and Usage

The drill downs take you to search and show the last indexed data’s timestamp for each machine. From there you can explore what data it is. Depending on the amount of data collection configured and which intelligence packs, data upload schedule and speed can vary.

This page also features metering information (this does not use the search index but the billing system, it’s refreshed every couple of hours) about the amounts of data sent to the service broken down by Intelligence Pack.

In addition to the above, the Advisor engineering team is committed to resolving all your onboarding issues so please contact us if you run into any issues. We are here to help.

OTHER KNOWN ISSUES AND WORKAROUNDS (SCOM)

'Search' button in the 'Add a Computer/Group' dialogue is missing

We have had a couple of customers report that the Search button in the Computer Search dialog is invisible. We are trying to investigate why this happens. A temporary workaround is click in the ‘Filter by(optional)’ edit box and press TAB to get to the invisible search button, and then activate it by <Spacebar> or <Enter>.

 

IIS LOG COLLECTION

There is another post here with specific information on how to best configure IIS logging for use with OpInsights and some other known issues http://blogs.technet.com/b/momteam/archive/2014/09/19/iis-log-format-requirements-in-system-center-advisor.aspx 

Some info there also apply to Direct Agent too, but was mostly written for SCOM. There is other information about IIS with Direct Agent further below in this post.

 

SQL ASSESSMENT

SQL Assessment requires .NET 4 to run on each agent. It supports the Standard, Developer and Enterprise editions of SQL Server, all currently supported versions.

 

MALWARE ASSESSMENT

Windows 7 and Windows Server 2008 R2 have the issues described/tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519211-windows-server-2008-r2-sp1-servers-are-shown-as-n

See what Anti-Malware products are enabled by following this thread http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519202-support-other-antivirus-products-in-malware-assess

 

DIRECT AGENT SPECIFIC INFORMATION

MOST of the errors in the table above in ‘Procedure 4’ about ‘Management Servers’ also apply to Direct Agent. In Direct Agent, each agent is responsible to talk to OpIsights on its own, while in Operations Manager it is the Management Server that sends data on behalf of the agents reporting to it, acting as a gateway.

On Direct agent the most common issue we have seen so far is Error code 403 which means ‘fordbidden’ – this is typically a wrongly-copied workspaceId or key – see more here.

 

Other things that we are currently tracking for Direct Agent:

Capacity Management Intelligence Pack does NOT work with Direct Agent; only with Operations Manager. In fact it needs even Operations Manager to be integrated with Virtual Machine Manager. We are tracking ideas to either generalize it starting here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6662146-open-up-the-capacity-management-pack-for-other-sys

Alert Management Intelligence Pack does NOT work with Direct Agent; it depends on and requires Operations Manager, whose alerts it synchronizes to the cloud.

Malware Assessment works, other than for the same symptom noted above for 2008R2/Win7.

Update Assessment, Change Tracking as well as Log management Intelligence Packs for collecting Windows Events and IIS Logs works for both SCOM and Direct Agent already.

 

If you need documentation on how to install the agent (also in scripted/unattended way) check the documentation here http://msdn.microsoft.com/en-us/library/azure/dn884659.aspx

If you need, Direct Agent supports passing thru proxy – there is a PowerShell script in the official documentation here http://msdn.microsoft.com/library/azure/dn884643.aspx that you can use to configure which proxy and credentials to use on the agent (it’s an application-specific setting; no other process than MMA’s needs to be be able to know how to reach the internet).

You can also install the agent in a different way (i.e. sysprep in an OS image uploaded to Azure) and then configure it with the COM API – an example powershell is documented here http://msdn.microsoft.com/en-us/library/azure/dn873959.aspx

If your VM is in Azure, you can one-click install/enable the agent from the Azure portal http://azure.microsoft.com/en-us/updates/easily-enable-operational-insights-for-azure-virtual-machines/

We currently only have 64bit version of the agent – 32bit is tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6744349-support-for-windows-2003-and-2008-servers-32-bit

 

WINDOWS AZURE DIAGNOSTICS INFORMATION

Log management thru Azure Portal integration allows to also ingest windows events from Windows Azure Diagnostics (WAD) Storage. This works for Cloud Services roles and IaaS VMs configured to write to WAD.

Collecting IIS Logs from WAD works for Cloud Services and for IaaS VMs, but not currently for Azure Web Sites – this is tracked here http://feedback.azure.com/forums/267889-azure-operational-insights/suggestions/6519351-collect-iis-logs-from-windows-azure-diagnostics-st

Check out and vote the other ideas about what to collect in this category on our forum http://feedback.azure.com/forums/267889-azure-operational-insights/category/88086-log-management-and-log-collection-policy

Here is a good paper on how to configure your azure roles and VMs to write to Windows Azure Diagnostics storage in the first place http://download.microsoft.com/download/B/6/C/B6C0A98B-D34A-417C-826E-3EA28CDFC9DD/AzureSecurityandAuditLogManagement_11132014.pdf 

 


Satya, Daniele and other folks on the OpInsights team maintain and update this post regularly with new information and learning; check it regularly!


Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • when I login to https://preview.systemcenteradvisor.com (from my workstation - no firewall/proxy issue here) with my organizational account, I get a message "The Microsoft account you used is not associated with an advisor account. Do you want to create a new Advisor account ?". I click "OK" then enter an account name (like firstname.lastname). When I click "Create", I always get an error "Account Creation Wizard did not complete... Error 0x00000000, unknown server error..."
    Same result when I try directly from SCOM console for initial registration.
    Any idea ?

  • Hi Satya,

    Thanks for the proxy information, I want to add the following:

    I missed this URL for the Management Server:
    scadvisorcontent.blob.core.windows.net:443

    For the Console I am missing these URLs:
    dc.services.visualstudio.com:443
    ajax.aspnetcdn.com:443
    az416426.vo.msecnd.net:443


    Another tip, when setting the WinHTTP and you also are monitoring Linux machines, add the domain name in the bypass-list (http://technet.microsoft.com/nl-nl/library/cc731131(v=ws.10).aspx)

  • @Eric - thanks. Keep in mind we are working on unifying some of these settings and might not require NETSH WINHTTP (or system-wide proxy) in the future. Please check on our feedback site, there are more details and conversations going on about this.

  • @Sylvain - please follow the instructions at https://preview.systemcenteradvisor.com/instructions?LandingPP5 where we direct you to create an account from the Operations Manager console, not from the web portal. We are however aware of the "0x00000000 unknown server error" that was happening yesterday, but a fix has already been pushed out, so it should work again now.

  • @Daniele.. Thanks for your feedback. it works fine now.

  • Same result when I try directly from SCOM console for initial registration.
    Any idea ?

    Lesly at http://www.hotellyonouest.com

  • Hey guys i'm getting event id 1108 from the health service

    An Account specified in the Run As Profile "Microsoft.SystemCenter.Advisor.RunAsProfile.Certificate" cannot be resolved. Specifically, the account is used in the Secure Reference Override "SecureOverrideeca4fb94_be7a_6139_fe65_7bf04b571e2d".

    I'm also not getting any system update data showing up in SCA

  • I'm actually seeing the 1108 on multiple servers as well as a LOT of 21405 events

    Command executed: "C:\Windows\system32\cscript.exe" /nologo "DiscoverHealthServiceCommunicationRelationships.js" 0 {3237253B-2A1C-38E7-8E52-588635224D35} {8F8BBA33-503D-F0AF-1A98-E5D935EF702D} server.fqdn.here "AdvisorMonitorV2"
    Working Directory: C:\Program Files\System Center Operations Manager\Agent\Health Service State\Monitoring Host Temporary Files 760\5150\

  • @Jeffrey, the DiscoverHealthServiceCommunicationRelationships issue is known and harmless - is documented in some old Advisor Release notes here http://onlinehelp.microsoft.com/en-us/advisor/ff976541.aspx

    For the other issue, problems with the RunAs account and certificate can typically be solved by just re-running the registration wizard. but if you kept connecting and disconnecting and reconnecting your management groups to different advisor accounts a few times, it is possible that your RunAs accounts are a bit in a mixed state...