Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Error alerts from the DNS MP – script failures, WMI Probe failed?

Error alerts from the DNS MP – script failures, WMI Probe failed?

  • Comments 23
  • Likes

Updated 11/16/09 – I think this is pretty much resolve now – read below:

 

I have seen this at several customer sites, and even in my own lab.  You might find the following alerts (below) stemming from the DNS MP.

To start, I would recommend the resolutions in my previous post:  Getting lots of Script Failed To Run alerts- WMI Probe Failed Execution- Backward Compatibility

Everything at that post above helps, however, it does not resolve all of the alerts, 100% of the time.  After about two weeks on a Windows 2003 DC/DNS server… the problem can re-occur with WMI failures and script errors.  Restarting the computer, or restarting WMI will immediately resolve it. 

This appears to be an issue with the Windows DNS WMI provider, that causes this Generic Failure when trying to access the WMI based DNS namespace, and query it.  It appears that there is a TLS slot leak every time the DNS WMI provider unloads.  It appears that the DNS WMI provider will unload after 5 minutes of not being accessed.  Those who patch their computers monthly, likely wont even see this issue, or only see it for a short time until the next patch cycle. 

To resolve it – I have written a monitor (example and sample MP below) which queries the DNS WMI namespace every 4 minutes, which keeps the provider from unloading.  Therefore, the DNS provider stays loaded, and never has to unload, and leak a TLS slot.  This has actually shown to resolve some other issues with scripts and latency, caused by the DNS WMI provider having to load back up after an unload.

 

 

The events/alerts you may see to define the error condition:

 

WMI Probe Module Failed Execution
Log Name:  Operations Manager
Source:  Health Service Modules
Event Number:  10409
Description:
Object enumeration failed
Query: 'Select EventLogLevel from MicrosoftDNS_Server'
HRESULT: 0x80041001
Details: Generic failure
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Monitor.ServerLoggingLevel
Instance name: dc01.opsmgr.net
Instance ID: {11056C4C-B933-98ED-3DC5-4B9AAE232B23}
Management group: PROD1

 

WMI Probe Module Failed Execution
Log Name:  Operations Manager
Source:  Health Service Modules
Event Number:  10409
Description:
Object enumeration failed
Query: 'Select Name, Shutdown, Paused from MicrosoftDNS_Zone'
HRESULT: 0x80041001
Details: Generic failure
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Monitor.ZoneRunning
Instance name: test.opsmgr.net (dc01.opsmgr.net)
Instance ID: {E0A3BD98-04B7-0C44-B26D-F8E6175456D1}
Management group: PROD1

 

Script or Executable Failed to run
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 21406
Description:
The process started at 6:26:59 AM failed to create System.Discovery.Data. Errors found in output:
C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8675\DNS2003ComponentDiscovery.vbs(123, 9) SWbemServicesEx: Generic failure
Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DNS2003ComponentDiscovery.vbs" {C984657D-0255-F11B-2C76-1542793A684D} {11056C4C-B933-98ED-3DC5-4B9AAE232B23} dc01.opsmgr.net true true true "" false 700 1 Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8675\
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.Discovery.Components
Instance name: dc01.opsmgr.net
Instance ID: {11056C4C-B933-98ED-3DC5-4B9AAE232B23}
Management group: PROD1 

 

Script or Executable Failed to run
Log Name: Operations Manager
Source: Health Service Modules
Event Number: 21405
Description:
The process started at 3:58:21 AM failed to create System.Discovery.Data, no errors detected in the output. The process exited with 0
Command executed: "C:\WINDOWS\system32\cscript.exe" /nologo "DNS2003Discovery.vbs" {C8655A28-E27E-C6ED-B158-8569219A71A6} {89AC2E61-9144-4B94-9028-5A25F547213E} dc01.opsmgr.net false
Working Directory: C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 10\8515\
One or more workflows were affected by this.
Workflow name: Microsoft.Windows.DNSServer.2003.ServerDiscovery
Instance name: dc01.opsmgr.net
Instance ID: {89AC2E61-9144-4B94-9028-5A25F547213E}
Management group: PROD1

 

Script or Executable Failed to run

Event Type:              Error
Event Source:           Health Service Script
Event Category:        None
Event ID:                  1152
Date:                        5/19/2009
Time:                        11:18:48 AM
User:                        N/A
Computer:                DC01
Description:
DNS2003Discovery.vbs : The Query 'select * from MicrosoftDNS_Server' did not return any valid instances. 
Please check to see if this is a valid WMI Query.. Generic failure

 

 

So…. at this point, you have updated Cscript to 5.7 KB955360, and applied the KB933061 hotfix to stabilize WMI.  However, after a period of time – these errors start happening again?

Since the issue is a problem caused by the Windows DNS WMI provider unloading – we need to keep it loaded.  Since I believe it unloads after 5 minutes of inactivity, we need to make sure we query WMI at least every 4 minutes.  The simplest, cheapest, and easiest way I know to do that… is to create a simple performance monitor, that queries the DNS WMI namespace for a value, every 4 minutes.  I have a complete write-up on how to create this monitor at THIS LINK.

 

I will start by creating a new Management pack - “Custom – DNS Addendum MP”

Next – I will create a new monitor, Unit Monitor, WMI Performance Counters, Static Thresholds, Single Threshold, Simple Threshold.

image

Give the monitor a name.  I used “Custom - DNS Monitor Query to keep namespace loaded”

For the monitor target – since this is a problem only on Windows Server 2003, I chose “DNS 2003 Server”.  We do not need to do this on Server 2008.

For the Parent monitor, I chose performance:

image

Next, we need to fill in the namespace, query, and frequency.  I input “root\MicrosoftDNS” for the namespace, and “Select EventLogLevel from MicrosoftDNS_Server”.  Since I want it to run every 4 minutes, that would be 240 seconds:

image

For the performance mapper section – this is the most confusing – I explain it a bit deeper at THIS LINK  For now – just follow the graphic below:

image

Next, on the Threshold page… since this monitor is not really supposed to do anything other than query WMI on a schedule… we don't want it to alert.  The query we are running for this example will return an integer from 0-10, so I will set this to 99, a number it could never return so the monitor will never change state.

Next, on the Alert Settings, do NOT generate alerts for this monitor.

Click Create.  That is it. 

For those who want to test this – I am attaching my sample management pack with only this monitor in it.  To use my MP, you will need to have SCOM R2, otherwise you can create your own monitor as above.

Attachment: Custom.DNS.Addendum.MP.zip
Comments
  • Hello Kevin.

    Many times the solutions you describe work perfectly. However, on many occassions I find that for the DNS MP the updates of WMI and Windows Scripting Host is not sufficient. The DNS class in WMI has to be recompiled as well.

    Then all errors are gone and everything runs like clockwork again.

    So on DNS servers I start three actions:

    - Updating WMI

    - Updating Windows Scripting Host

    - Recompiling DNS class in WMI

  • Here is the only problem I have with recompiling the MOF:  In my testing - recompiling the MOF is only necessary when the DNS WMI namespace is missing or corrupt.  If after bouncing the WMI service, you still cannot manually query any of the WMI objects, or cannot even connect to the WMI namespace, then I agree - recomplile the mof.  

    However - in my testing - I did all three - updated WMI, recompliled MOF, and then updated cscript.  After 1 month passed - the issue returned.  It seems to take a long amount of uptime for this random error condition to present itself.  That is why I am adding the additional WMI buffer space now.  This supposedly will address the issue for most people.  I will let you know in a month or two.  :-)

  • Oops. So even with recompiling the MOF the issue returns...

    Good to know about adding additional WMI buffer space. If that solves this problem also on the long term it is good to know.

    Thanks again for sharing such good information with the community.

  • Hi Kevin,

    I'm afraid it does not help.

    We recompiled the DNS mof, installed KB933061, increased teh buffer space, finally rebooted the systems.

    It was a relief for some time, but the errors reappeared after some time.

    I'm afraid there is something wrong with either WMI or DNS mof or both ?

  • Yep... totally agree.  I am seeing the same now.

    The current theory, is that something is wrong with the WMI provider for DNS.... this isnt related to SCOM.  That issue is - that this DNS WMI provider leaks a TLS slot and when they are exhausted (takes about 2-3 weeks for me) then the problem occurs.... When this happens, you can bounce WMI/Reboot the computer, and the problem goes away for 2-3 weeks.

    The provider unloads after 5 minutes of inactivity.  If you did something - say.... run a times script that does something VERY lightweight, like runs the simple WMI query and nothing else - against the DNS WMI namespace, and run this script every 2 minutes.  This would keep the provider from loading/unloading as caused by the SCOM MP.... and the TLS slots will not leak because the DNS provider is not unloading.  I was also thinking of maybe writing a threshold monitor - against a WMI perf object, that wont change/alert... and this might keep the provider loaded and have even less impact.

    That is a theory, I have not had time to test and validate this.

  • I found an existing monitor that appears to do the same thing.  Under the class DNS 2003 server there is a configuration monitor named "DNS 2003 Event Logging Level Monitor".  I created an override to change the Interval from 900 to 240.  Hope this helps.

  • Mark - that is a really go idea.... as long as that monitor queries the WMI namespace.

    The only concern I would have is the "expense" of that monitor... if it uses a lot of CPU when it runs it might have a tad more impact to the server... but overall I like it!

  • Has anyone been able to try modifying the monitor that Mark mentionned in his post (02/19/2010). I'm stuck with the same issue here but it only happens on our production DCs and can't really try this out...

    Francis

  • Hi,

    I am not sure whether this is the best place for my problem. We have DNS memory leak in a dc (wk3/sp2/x86) that occurs about every two weeks. It started after we have deployed SCOM R2 agents. I wonder whether it might be related this topic.

    Thanks,

  • Hi, everyone!

    A have the same issue - monitoring DNS in SCOM always failed! Query 'Select Name, Shutdown, Paused from MicrosoftDNS_Zone' always return 0x80041001 error. I have tried to apply various updates and hotfixes, create custom MP for DNS according to Kevin's post - nothing helps!

    In www.activexperts.com/.../server I have found, that WMI query to DNS-server in SCOM has bug - query should be like this 'Select Name, Paused, Shutdown from MicrosoftDNS_Zone'.This query returns proper result!

    Does anyone know, how to change this query in SCOM??

  • If you are getting a generic failure - the problem is the leak in WMI.  If you bounce the server - does the problem go away for a little while?

    Those two queries are identical, changing the order doesnt matter, both are valid.

  • Well,

    You are right, Kevin. Failure of query 'Select Name, Shutdown, Paused from MicrosoftDNS_Zone' depends on position of stars...

    After rebooting server problem goes away for a couple of weeks. But after that time it happens again. All recommended updates to WMI and OS are installed. Does anyone have issue like this on Windows 2008 DNS servers?

    P.S. I will try to ask MS techsupport for this problem.

  • Alex - are you getting this on 2003 servers or 2008 or 2008R2?

    There is no TLS leak on the 2008 DNS WMI provider.... this specific issue should impact 2003 servers only.... unless you are hitting something else.

    Are your WMIPRVSE processes using a lot of private bytes (look in task manager)

    Are you sure you set up a rule or used my MP to query the WMI provider every 4 minutes to keep it from unloading?  That is the fix.... for 2003 servers at least.

  • Kevin,

    All my DNS servers are Windows 2003 R2 and I get this error from all this servers. I have 3-6 WMIPRVSE processes with 5-20 MBs of memory. I have upgraded SCOM to R2 version adn will try your MP on this version too - on SCOM 2007 it has no effect.

  • Alex - this is still Server 2003 then - and has the leak in WMI.  You MUST use something like my MP to keep the provider from unloading, or you will be affected.  This is a textbook example.  You can hotfix and patch and tweak to your hearts content - you will not solve the root cause.  The root cause is that when the DNS WMI provider unloads after a period of inactivity - it leaks a TLS slot.  If you use my example MP, this will qury the provider enough to keep it from unloading, and you will work around the issue in the Windows WMI provider.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs