<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.technet.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Kevin Holman's OpsMgr Blog : agents</title><link>http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx</link><description>Tags: agents</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>29106 event on RMS – Index was out of range.  Wait.  What?</title><link>http://blogs.technet.com/kevinholman/archive/2009/11/10/29106-event-on-rms-index-was-out-of-range-wait-what.aspx</link><pubDate>Tue, 10 Nov 2009 20:01:36 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3292922</guid><dc:creator>kevinhol</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3292922.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3292922</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3292922</wfw:comment><description>&lt;p&gt;Was working with a customer on this one – figured it might help others.&lt;/p&gt;  &lt;p&gt;Saw a lot of these VERY SPECIFIC 29106 events on the RMS, specifically with the text:&amp;#160; &lt;/p&gt; &lt;strong&gt;&lt;em&gt;System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection.&lt;/em&gt;&lt;/strong&gt;  &lt;br /&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Here is the full event:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Event Type:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Warning       &lt;br /&gt;Event Source:&amp;#160;&amp;#160;&amp;#160; OpsMgr Config Service        &lt;br /&gt;Event Category:&amp;#160; None        &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 29106        &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 11/10/2009        &lt;br /&gt;Time:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 12:43:24 PM        &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A        &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; AGENTNAME        &lt;br /&gt;Description:        &lt;br /&gt;The request to synchronize state for OpsMgr Health Service identified by &amp;quot;3688d65d-a16c-2be6-7e84-5faf8a9cffe0&amp;quot; failed due to the following exception &amp;quot;System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection.        &lt;br /&gt;Parameter name: index&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;What we found was – that we could look up these health service ID’s – by pasting them in the following SQL query:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;select * from MTV_HealthService       &lt;br /&gt;where BaseManagedEntityId = '3688d65d-a16c-2be6-7e84-5faf8a9cffe0'&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This would give us the name of the agent.&lt;/p&gt;  &lt;p&gt;In the console, under Agent Managed – we found all of these agents were in “Unmonitored” state – on the agents themselves, they were stuck.&amp;#160; They looked like they got installed, but could not get config.&amp;#160; We deleted them from agent managed, waited a few minutes, and let them show back up in Pending Management.&amp;#160; Approved them – then they were able to come back in and work properly.&amp;#160; These looked for the most part like orphaned machines, and several were computers that were renamed, or old DC’s that were demoted.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3292922" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>What is config churn?</title><link>http://blogs.technet.com/kevinholman/archive/2009/10/05/what-is-config-churn.aspx</link><pubDate>Mon, 05 Oct 2009 08:29:49 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3284780</guid><dc:creator>kevinhol</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3284780.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3284780</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3284780</wfw:comment><description>&lt;p&gt;There have been a couple good articles briefly covering this topic…. you might have read them.&amp;#160; I will reference some below.&amp;#160; Config churn is basically, when your RMS is in an almost never-ending loop of generating config.&amp;#160; This can be caused by “less than optimized” management packs, pushing agents all the time, or injecting major changes into a management group, such as overrides or custom rules and monitors, or importing updated management packs.&amp;#160; By examining this topic in depth – we will re-state some already known best practices with maintaining a healthy management group, and get some deeper knowledge as to why they are best practices in the first place.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Any time you push agents, or create rules and monitors, or overrides for widespread classes….. you can create a config update on the RMS that must be sent down to ALL agents in the management group.&amp;#160; For small management groups (under 500 agents) this is generally not a big deal and processes rather quickly.&amp;#160; For large management groups over 1000 agents, this can cause high resource utilization of the RMS and SQL Database, in terms of CPU, Memory, and Disk I/O.&amp;#160; This can impact data insertion, and console performance during these times.&amp;#160; For these reasons, we like to keep those activities down to a minimum during working hours, and schedule these major changes in an off-hours maintenance window.&lt;/p&gt;  &lt;p&gt;What about “less than optimized” management packs?&amp;#160; What does that mean?&amp;#160; Well, this means management packs that you might be using, that have poorly written discoveries.&lt;/p&gt;  &lt;p&gt;We have long known that a worst practice in Management Pack development, is to have a discovery that discovers instances of a class, that has properties for those instances that are likely to change frequently.&amp;#160; Here is a write-up from OpsManJam on the topic:&amp;#160; &lt;a href="http://www.opsmanjam.com/Lists/General%20Discussion/DispForm.aspx?ID=2&amp;amp;RootFolder=/Lists/General%20Discussion/WORST%20PRACTICE%20Class%20properties%20that%20get%20updated%20frequently"&gt;LINK&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Ok… wait… &lt;em&gt;Whaaaaat&lt;/em&gt;?&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Let me put that in English:&lt;/p&gt;  &lt;p&gt;Say we have a discovery for a Logical Disk.&amp;#160; This will discover any logical disk, like C:, D:, E:, Q:, etc….&amp;#160; When we write the discovery for a logical disk, we can add &lt;strong&gt;&lt;em&gt;properties&lt;/em&gt;&lt;/strong&gt; to that discovery.&amp;#160; These are attributes of the discovered instances.&amp;#160; So – in this case – lets say we decided to add “Size” of the disk as a property, and “Free Space” as a property.&amp;#160; And for the discovery frequency – we will run this discovery every hour, looking for new disks.&lt;/p&gt;  &lt;p&gt;“Size” is an excellent property for the Logical Disk class.&amp;#160; We like to know the size of the disks…. we can use this property group them if needed.&amp;#160; “Size” of a logical disk is not something that we would expect to change very often.&lt;/p&gt;  &lt;p&gt;“Free Space” is a horrible property for the Logical Disk class.&amp;#160; Free space is something that will likely change, just a small amount even, between each run of the discovery.&amp;#160; Free space is a property that is likely to change frequently, therefore – it should NOT be used in a discovery.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Make sense?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Ok – so… what's the big deal?&lt;/p&gt;  &lt;p&gt;Well, the agent will run almost all discoveries that it knows about when the health service starts up (like when you bounce the service, or after a reboot).&amp;#160; It will always send this discovery data to the management server.&amp;#160; (this is another reason why &lt;a href="http://blogs.technet.com/kevinholman/archive/2009/03/26/are-your-agents-restarting-every-10-minutes-are-you-sure.aspx"&gt;agents restarting all the time&lt;/a&gt; is very bad)&amp;#160; Then, it will run then based on the “Interval” frequency specified on the discovery.&amp;#160; Sometimes this is as frequent as once per hour, sometimes as long as once per day.&amp;#160; When the discovery runs, the agent will inspect the discovery data that it gets, and compare it to the last discovery data it sent to the management server.&amp;#160; If nothing changed – the agent drops the discovery data and does nothing.&amp;#160; IF anything changed in the values of the discovery data – it will re-submit the new data to the management server, which will submit this data to the database.&amp;#160; The RMS will detect the change, and will have to recalculate (regenerate) configuration.&amp;#160; You will see this on the RMS as a 21025 event:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Log Name:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Operations Manager        &lt;br /&gt;Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; OpsMgr Connector         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 9/27/2009 11:51:49 PM         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 21025         &lt;br /&gt;Task Category: None         &lt;br /&gt;Level:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Information         &lt;br /&gt;Keywords:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Classic         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; OMRMS.opsmgr.net         &lt;br /&gt;Description:         &lt;br /&gt;OpsMgr has received new configuration for management group PROD1 from the Configuration Service.&amp;#160; The new state cookie is &amp;quot;D7 9B A4 BE 00 90 CF 13 35 B5 9B 5F 3B 14 FF 78 D6 13 9A 2D &amp;quot;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The 21025 event isn’t really “bad”… it simply means the config service did its job.&amp;#160; It re-generated its configuration file from the database data, and wrote it to:&amp;#160; \Program Files\System Center Operations Manager 2007\Health Service State\Connector Configuration Cache\&amp;lt;MGNAME&amp;gt;\OpsMgrConnector.Config.xml&amp;#160; The problem is – when this config file gets large (like in large agent count environments) and when the “Config Instance Space” is large (number of discovered objects in total).&amp;#160; Recalculating this config can have a significant impact on the disk where the file exists on the RMS, use lots of memory and CPU on the RMS for the config service, and use significant disk I/O on the SQL database.&lt;/p&gt;  &lt;p&gt;If the RMS is in a perpetual cycle of recalculating config, and sending these config updates to all agents…. the performance of the management group is impacted.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://nocentdocent.wordpress.com/bloggers/daniele-grandini/"&gt;Daniele Grandini&lt;/a&gt; of &lt;a href="http://nocentdocent.wordpress.com/"&gt;Quaue Nocent Docent&lt;/a&gt; is pretty much the “godfather” of good information researching the 21025 event.&amp;#160; Read his 3 part series on config churn here:&lt;/p&gt;  &lt;p&gt;&lt;a title="http://nocentdocent.wordpress.com/2009/07/09/troubleshooting-21025-events-wrap-up/" href="http://nocentdocent.wordpress.com/2009/07/09/troubleshooting-21025-events-wrap-up/"&gt;http://nocentdocent.wordpress.com/2009/07/09/troubleshooting-21025-events-wrap-up/&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;So – what can I do if I think I have too much config churn?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The biggest problem causing the most frequent config updates is &lt;strong&gt;&lt;em&gt;management packs with noisy discoveries&lt;/em&gt;&lt;/strong&gt;.&amp;#160; However, lets wrap up all the issues that can cause it, and what you can do:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;New agents.&amp;#160; Discover/install/approve new agents in bulk and off-hours. &lt;/li&gt;    &lt;li&gt;Overrides.&amp;#160; Set overrides during off-hours, or create override MP’s in a lab, then synch to production management groups during schedule off-hours times. &lt;/li&gt;    &lt;li&gt;Custom rules and monitors.&amp;#160; Create these during off-hours, or create using the authoring console, test in a lab, then import to production during off-hours. &lt;/li&gt;    &lt;li&gt;Newly discovered instances.&amp;#160; For instance – someone adds a new disk, or SQL database, or DNS zone, to an existing agent.&amp;#160; Not much we can do about this, except the expectation that this would be done during off hours. &lt;/li&gt;    &lt;li&gt;Management packs with noisy discovery properties.&amp;#160; See below. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Ok – the remainder of this article will touch on #5.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;How can I tell which discoveries are noisy?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Danilele Grandini has put together a good query on this, from his link:&amp;#160; &lt;a title="http://nocentdocent.wordpress.com/2009/05/23/how-to-get-noisy-discovery-rules/" href="http://nocentdocent.wordpress.com/2009/05/23/how-to-get-noisy-discovery-rules/"&gt;http://nocentdocent.wordpress.com/2009/05/23/how-to-get-noisy-discovery-rules/&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;I will repost these (slightly modified) below:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;/* Top Noisy Rules in the last 24 hours */ &lt;/p&gt;    &lt;p&gt;select ManagedEntityTypeSystemName, DiscoverySystemName, count(*) As 'Changes'      &lt;br /&gt;from       &lt;br /&gt;(select distinct       &lt;br /&gt;MP.ManagementPackSystemName,       &lt;br /&gt;MET.ManagedEntityTypeSystemName,       &lt;br /&gt;PropertySystemName,       &lt;br /&gt;D.DiscoverySystemName, D.DiscoveryDefaultName,       &lt;br /&gt;MET1.ManagedEntityTypeSystemName As 'TargetTypeSystemName', MET1.ManagedEntityTypeDefaultName 'TargetTypeDefaultName',       &lt;br /&gt;ME.Path, ME.Name,       &lt;br /&gt;C.OldValue, C.NewValue, C.ChangeDateTime       &lt;br /&gt;from dbo.vManagedEntityPropertyChange C       &lt;br /&gt;inner join dbo.vManagedEntity ME on ME.ManagedEntityRowId=C.ManagedEntityRowId       &lt;br /&gt;inner join dbo.vManagedEntityTypeProperty METP on METP.PropertyGuid=C.PropertyGuid       &lt;br /&gt;inner join dbo.vManagedEntityType MET on MET.ManagedEntityTypeRowId=ME.ManagedEntityTypeRowId       &lt;br /&gt;inner join dbo.vManagementPack MP on MP.ManagementPackRowId=MET.ManagementPackRowId       &lt;br /&gt;inner join dbo.vManagementPackVersion MPV on MPV.ManagementPackRowId=MP.ManagementPackRowId       &lt;br /&gt;left join dbo.vDiscoveryManagementPackVersion DMP on DMP.ManagementPackVersionRowId=MPV.ManagementPackVersionRowId       &lt;br /&gt;AND CAST(DefinitionXml.query('data(/Discovery/DiscoveryTypes/DiscoveryClass/@TypeID)') AS nvarchar(max)) like '%'+MET.ManagedEntityTypeSystemName+'%'       &lt;br /&gt;left join dbo.vManagedEntityType MET1 on MET1.ManagedEntityTypeRowId=DMP.TargetManagedEntityTypeRowId       &lt;br /&gt;left join dbo.vDiscovery D on D.DiscoveryRowId=DMP.DiscoveryRowId       &lt;br /&gt;where ChangeDateTime &amp;gt; dateadd(hh,-24,getutcdate())       &lt;br /&gt;) As #T       &lt;br /&gt;group by ManagedEntityTypeSystemName, DiscoverySystemName       &lt;br /&gt;order by count(*) DESC&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;and&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;/* Modified properties in the last 24 hours */&lt;/p&gt;    &lt;p&gt;select distinct      &lt;br /&gt;MP.ManagementPackSystemName,       &lt;br /&gt;MET.ManagedEntityTypeSystemName,       &lt;br /&gt;PropertySystemName,       &lt;br /&gt;D.DiscoverySystemName, D.DiscoveryDefaultName,       &lt;br /&gt;MET1.ManagedEntityTypeSystemName As 'TargetTypeSystemName', MET1.ManagedEntityTypeDefaultName 'TargetTypeDefaultName',       &lt;br /&gt;ME.Path, ME.Name,       &lt;br /&gt;C.OldValue, C.NewValue, C.ChangeDateTime       &lt;br /&gt;from dbo.vManagedEntityPropertyChange C       &lt;br /&gt;inner join dbo.vManagedEntity ME on ME.ManagedEntityRowId=C.ManagedEntityRowId       &lt;br /&gt;inner join dbo.vManagedEntityTypeProperty METP on METP.PropertyGuid=C.PropertyGuid       &lt;br /&gt;inner join dbo.vManagedEntityType MET on MET.ManagedEntityTypeRowId=ME.ManagedEntityTypeRowId       &lt;br /&gt;inner join dbo.vManagementPack MP on MP.ManagementPackRowId=MET.ManagementPackRowId       &lt;br /&gt;inner join dbo.vManagementPackVersion MPV on MPV.ManagementPackRowId=MP.ManagementPackRowId       &lt;br /&gt;left join dbo.vDiscoveryManagementPackVersion DMP on DMP.ManagementPackVersionRowId=MPV.ManagementPackVersionRowId       &lt;br /&gt;&amp;#160;&amp;#160;&amp;#160; AND CAST(DefinitionXml.query('data(/Discovery/DiscoveryTypes/DiscoveryClass/@TypeID)') AS nvarchar(max)) like '%'+MET.ManagedEntityTypeSystemName+'%'       &lt;br /&gt;left join dbo.vManagedEntityType MET1 on MET1.ManagedEntityTypeRowId=DMP.TargetManagedEntityTypeRowId       &lt;br /&gt;left join dbo.vDiscovery D on D.DiscoveryRowId=DMP.DiscoveryRowId       &lt;br /&gt;where ChangeDateTime &amp;gt; dateadd(hh,-24,getutcdate())       &lt;br /&gt;ORDER BY MP.ManagementPackSystemName, MET.ManagedEntityTypeSystemName&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Wow – that returned a LOT of discoveries running all the time!&amp;#160; What can I do?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Don't import too many MP’s!&amp;#160; The FIRST line of defense – is NOT to import ANY management packs into a management group that you don't absolutely need RIGHT THEN.&amp;#160; Management packs are constantly updated, and by the time you have an actual SLA in a technology area – there will likely be a newer, better MP available for it.&amp;#160; The biggest mistake many customers make is to import any available MP for a technology that they have internally.&amp;#160; They end up with a FLOOD of alerts, big fat databases, slow consoles, and lots of weird errors.&amp;#160; MP’s should be transitioned slowly, one at a time – tuning and resolving as you go. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Disable the noisy discoveries.&amp;#160; Probably not a great solution, unless they discover objects that you really don't care about – but there are other objects in the MP that you DO want to monitor. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Increase the interval of the discovery frequency.&amp;#160; This means… essentially – change any “bad” discoveries to run only once per day.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Add a “synch time” override to the discovery – if possible.&amp;#160; This option is not available unless the MP author of the discovery exposed it.&amp;#160; What this will do – it cause all the agents to ONLY run the discovery at a distinct and specified time every day (say…. 1AM).&amp;#160; This might cause too much discovery data to flood in at one time… but since it will all come in at the same time – it wont cause constant config churn all throughout the day.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Re-write the discovery.&amp;#160; If this is a custom MP – rewrite the discovery/MP, and remove that property which changes too often.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Make sure your hardware and software is optimized for scalability.&amp;#160; On your RMS – it is good to place your config file on fast disks, especially in large environments.&amp;#160; I have worked with very large customers who were experiencing config churn, but had zero ill effects, because their RMS disk I/O was on a 4 spindle RAID10 with 15K spindles, CPU and memory were really good, and their SQL database disk I/O for the OpsDB was excellent with plenty of breathing room.&amp;#160; I have also worked with smaller agent counts, where config churn has a serious impact…. mostly due to the RMS config file being places on the same RAID spindle set as that OS and pagefile, using only 2 older 10,000 RPM disks.&amp;#160; The SQL disk I/O was also just borderline for their agent count.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Re-run the queries periodically – especially after importing/upgrading to a new management pack in your management group.&amp;#160; This “instance space change” report should be part of your testing and evaluation of a new MP when brought into your lab…. if you have a large agent count environment.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Some very common discoveries I have seen – that have properties that change very frequently – are listed below.&amp;#160; I often recommend these be overridden to run once per day (86,400 seconds)&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;table border="2" cellspacing="0" cellpadding="0" width="864"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td width="248"&gt;&lt;strong&gt;Discovery Display Name&lt;/strong&gt;&lt;/td&gt;        &lt;td width="166"&gt;&lt;strong&gt;Discovery Target Class&lt;/strong&gt;&lt;/td&gt;        &lt;td width="173"&gt;&lt;strong&gt;Discovered Type&lt;/strong&gt;&lt;/td&gt;        &lt;td width="135"&gt;&lt;strong&gt;Default frequency&lt;/strong&gt;&lt;/td&gt;        &lt;td width="138"&gt;&lt;strong&gt;Modified frequency&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Discover File Groups and Files&lt;/td&gt;        &lt;td width="166"&gt;SQL 2005 DB Engine&lt;/td&gt;        &lt;td width="173"&gt;SQL 2005 DB File&lt;/td&gt;        &lt;td width="135"&gt;7260&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Discover File Groups and Files&lt;/td&gt;        &lt;td width="166"&gt;SQL 2005 DB Engine&lt;/td&gt;        &lt;td width="173"&gt;SQL 2005 DB File Group&lt;/td&gt;        &lt;td width="135"&gt;7260&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Discover SQL 2000 Databases&lt;/td&gt;        &lt;td width="166"&gt;SQL 2000 DB Engine&lt;/td&gt;        &lt;td width="173"&gt;SQL 2000 DB&lt;/td&gt;        &lt;td width="135"&gt;1800&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Discover Databases for a Database Engine&lt;/td&gt;        &lt;td width="166"&gt;SQL 2005 DB Engine&lt;/td&gt;        &lt;td width="173"&gt;SQL 2005 DB&lt;/td&gt;        &lt;td width="135"&gt;7200&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;DNS 2003 Component Discovery&lt;/td&gt;        &lt;td width="166"&gt;DNS 2003 Server&lt;/td&gt;        &lt;td width="173"&gt;DNS 2003 Zone&lt;/td&gt;        &lt;td width="135"&gt;21600&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;DNS 2008 Component Discovery&lt;/td&gt;        &lt;td width="166"&gt;DNS 2008 Server&lt;/td&gt;        &lt;td width="173"&gt;DNS 2008 Zone&lt;/td&gt;        &lt;td width="135"&gt;21600&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Windows Internet Information Services Base Classes Discovery Rule&lt;/td&gt;        &lt;td width="166"&gt;IIS 2003 Server Role&lt;/td&gt;        &lt;td width="173"&gt;IIS FTP Site&lt;/td&gt;        &lt;td width="135"&gt;3600&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="248"&gt;Windows Internet Information Services Base Classes Discovery Rule&lt;/td&gt;        &lt;td width="166"&gt;IIS 2000 Server Role&lt;/td&gt;        &lt;td width="173"&gt;IIS NNTP Virtual Server&lt;/td&gt;        &lt;td width="135"&gt;3600&lt;/td&gt;        &lt;td width="138"&gt;86400&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;   &lt;br /&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The above is just a sample – you should examine the query output of the query above and see what is impacting your management group the most.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3284780" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/management+pack/default.aspx">management pack</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Authoring/default.aspx">Authoring</category></item><item><title>Keep your management pack names SHORT in SP1!</title><link>http://blogs.technet.com/kevinholman/archive/2009/10/02/keep-your-management-pack-names-short-in-sp1.aspx</link><pubDate>Fri, 02 Oct 2009 23:31:08 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3284634</guid><dc:creator>kevinhol</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3284634.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3284634</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3284634</wfw:comment><description>&lt;p&gt;I have seen this twice now… so I will blog about it.&amp;#160; It seems to be rare in the wild, but it will completely cripple a management group when this occurs.&amp;#160; So beware SP1 users!&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;This article does not apply to R2.&amp;#160; This is only an issue in OpsMgr 2007 SP1.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;When you create your custom management packs – and especially your override management packs – keep the names as simple and short as possible.&amp;#160; There is an issue in OpsMgr SP1 – when an agent tries to download management packs – where it will fail if the MP ID (derived from the MP Name) is too long.&amp;#160; The worst part about this problem is there there WONT be an error logged.&amp;#160; What will happen – is that the agent will keep trying to re-download the MP in question, and it will block ALL other MP’s from being downloaded from that point forward.&lt;/p&gt;  &lt;p&gt;There is no simple way to know this condition is impacting you.&amp;#160; What will happen – is that an agent will continue to work just fine… but will not get any NEW management packs.&amp;#160; So… you might think all is well, but not so.&amp;#160; The symptoms thatmight lead you to notice that something is wrong:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Performance data not collected for newer MP’s.&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Objects not being discovered for newer MP’s&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Alerts not generating as expected from newer MP’s.&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The root problem is an age old Windows issue…. file paths over 255 characters not supported well.&amp;#160; This has been resolved in R2 in how the agent copies the files over.&lt;/p&gt;  &lt;p&gt;In both cases I have seen – someone was creating an override MP for the “IBM Hardware Management Pack for IBM System x and BladeCenter x86 Blade Systems” management pack.&amp;#160; So – when they created their override MP – they named it something like:&amp;#160; “Overrides - IBM Hardware Management Pack for IBM System x and BladeCenter x86 Blade Systems”&lt;/p&gt;  &lt;p&gt;This equated to a Management Pack ID of:&amp;#160; Override.IBM.Hardware.Management.Pack.for.IBM.System.x.and.BladeCenter.x.Blade.Systems&lt;/p&gt;  &lt;p&gt;That MP ID is 86 characters!&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;What happens…. is when this management pack is created, and an override is placed in it…. (or custom rule)… the agents that require it will:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Get contacted by the RMS to update their config, and then issue a config change request (&lt;em&gt;21024 in the event log&lt;/em&gt;)&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Receive new config from the RMS (&lt;em&gt;event 21025&lt;/em&gt;)&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;Process new config – and realize they need a new MP, and request that MP (&lt;em&gt;event 1200&lt;/em&gt;)&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Where this process breaks…. is the next step in the chain should be that the agent RECEIVES the MP (event 1201) and then issues a statement that the new config has become active (event 1210).&amp;#160; The never happen.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Behind the scenes…. from looking at a ETL tracelog, we can see this is failing, when we try to move the file from the “downloaded files” folder to the “management packs” folder:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Error CMPFileManager::MoveManagementPackFile(MPFileManager_cpp383)&lt;font color="#0000ff"&gt;MoveFile from '\\?\C:\Program Files\System Center Operations Manager 2007\Health Service State\Downloaded Files\MGNAME\1\Override.IBM.Hardware.Management.Pack.for.IBM.System.x.and.BladeCenter.x.Blade.Systems.{26504FED-2FF4-4AC4-A63D-59BF8C09F51F}.{7136257C-1791-7BAB-7072-2FA24284C102}.xml' to 'C:\Program Files\System Center Operations Manager 2007\Health Service State\Management Packs\Override.IBM.Hardware.Management.Pack.for.IBM.System.x.and.BladeCenter.x.Blade.Systems.{26504FED-2FF4-4AC4-A63D-59BF8C09F51F}.{7136257C-1791-7BAB-7072-2FA24284C102}.xml'&lt;/font&gt; &lt;font color="#ff0000"&gt;failed with code 3(ERROR_PATH_NOT_FOUND).&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In the example above – the bad path is:&lt;/p&gt;  &lt;p&gt;C:\Program Files\System Center Operations Manager 2007\Health Service State\Management Packs\Override.IBM.Hardware.Management.Pack.for.IBM.System.x.and.BladeCenter.x.Blade.Systems.{26504FED-2FF4-4AC4-A63D-59BF8C09F51F}.{7136257C-1791-7BAB-7072-2FA24284C102}.xml&lt;/p&gt;  &lt;p&gt;Which is 261 characters.&amp;#160; The limit is 255.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Therefore – I recommend you keep your Management Pack *ID* to less than 60 characters.&amp;#160;&amp;#160; You can examine your long management packs by looking in the console – generally your longest display names will be the longest ID’s:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/KeepyourmanagementpacknamesSHORTinSP1_D9FD/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/KeepyourmanagementpacknamesSHORTinSP1_D9FD/image_thumb.png" width="311" height="193" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Even some Microsoft MP’s are dangerously close to the limit…. such as: Microsoft.SystemCenter.VirtualMachineManager.Pro.2008.VMWare.HostPerformance with 76 characters.&amp;#160; In most environments you can squeak by at 79 characters…. more or less depending on where you installed your agent path.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Here is a SQL query you can run against the OpsDB to also detect this condition…. and quick check all your potentially long MP’s:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;select MPName from managementpack       &lt;br /&gt;WHERE len(MPName) &amp;gt; 60&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Just change 60 to whatever character count you want.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;DONT freak out if you have some more than 60.&amp;#160; Just be aware.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/KeepyourmanagementpacknamesSHORTinSP1_D9FD/image_4.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/KeepyourmanagementpacknamesSHORTinSP1_D9FD/image_thumb_1.png" width="96" height="101" /&gt;&lt;/a&gt;&amp;#160; &lt;/p&gt;  &lt;p&gt; DO freak out if you have some more than 80! &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;If someone had the time and was really handy – you could write a monitor – that runs against the RMS – that would in turn query the OpsDB, and run this query, and change the RMS to unhealthy when over a threshold.&amp;#160; THAT would be cool…. and alert you when some author makes a really long MP that has the potential to break all your agents.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Another idea I had… was to create a correlated missing event monitor…. and when you get a 1200, but do NOT get a 1201 within, say, 15 minutes….. that might be a problem.&amp;#160; Of course if you wrote this and were already impacted…. the bad agents would never get your new MP to tell you.&amp;#160; :-)&amp;#160; &lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3284634" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/management+pack/default.aspx">management pack</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>Fixing troubled agents</title><link>http://blogs.technet.com/kevinholman/archive/2009/10/01/fixing-troubled-agents.aspx</link><pubDate>Fri, 02 Oct 2009 00:23:46 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3284447</guid><dc:creator>kevinhol</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3284447.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3284447</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3284447</wfw:comment><description>&lt;p&gt;Sometimes agents either will not “talk” to the management server upon initial installation, and sometimes an agent can get unhealthy long after working fine.&amp;#160; Agent health is an ongoing task of any OpsMgr Admin’s life.&lt;/p&gt;  &lt;p&gt;This post in NOT an “end to end” manual of all the factors that influence agent health…. but that is something I am working on for a later time.&amp;#160; There are so many factors in an agent’s ability to communicate and work as expected.&amp;#160; A few key areas that commonly affect this are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;DNS name resolution (Agent to MS, and MS to Agent)&lt;/li&gt;    &lt;li&gt;DNS domain membership (disjointed)&lt;/li&gt;    &lt;li&gt;DNS suffix search order&lt;/li&gt;    &lt;li&gt;Kerberos connectivity&lt;/li&gt;    &lt;li&gt;Kerberos SPN’s accessible&lt;/li&gt;    &lt;li&gt;Firewalls blocking 5723&lt;/li&gt;    &lt;li&gt;Firewalls blocking access to AD for authentication&lt;/li&gt;    &lt;li&gt;Packet loss&lt;/li&gt;    &lt;li&gt;Invalid or old registry entries&lt;/li&gt;    &lt;li&gt;Missing registry entries&lt;/li&gt;    &lt;li&gt;Corrupt registry&lt;/li&gt;    &lt;li&gt;Default agent action accounts locked down/out (HSLockdown)&lt;/li&gt;    &lt;li&gt;HealthService Certificate configuration issues.&lt;/li&gt;    &lt;li&gt;Hotfixes required for OS Compatibility&lt;/li&gt;    &lt;li&gt;Management Server rejecting the agent&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;How do you detect agent issues from the console?&amp;#160; The problem might be that they are not showing up in the console at all!&amp;#160; Perhaps they might be a manual install that never shows up in Pending Actions?&amp;#160; Or a push deployment, that stays stuck in Pending actions and never shows up under “Agent Managed”.&amp;#160; Or even one that does show up under “Agent Managed” but never shows as being monitored… returning agent version data, etc.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;One of the BEST things you can do when faced with an agent health issue… if to look on the agent, in the OperationsManager event log.&amp;#160; This is a fairly verbose log that will almost always give you a good hint as to the trouble with the agent.&amp;#160; That is ALWAYS one of my first steps in troubleshooting.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Another way of examining Agent health – is by the built in views in OpsMgr.&amp;#160; In the console – there is a view – Located at the following:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Fixingtroubledagents_E68F/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Fixingtroubledagents_E68F/image_thumb.png" width="798" height="369" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;This view is important – because it gives us a perspective of the agent from two different points:&lt;/p&gt;  &lt;p&gt;1.&amp;#160; The perspective of the agent monitors running on the agent, measuring its own “health”.&lt;/p&gt;  &lt;p&gt;2.&amp;#160; The perspective of the “Health Service Watcher” which is the agent being monitored from a Management Server&amp;quot;.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;If any of these are red or yellow – that is an excellent place to start.&amp;#160; This should be an area that your level 1 support for Operations manager checks DAILY.&amp;#160; We should never have a high number of agents that are not green here.&amp;#160; If they aren't – this is indicative of an unhealthy environment, or the admin team not adhering to best practices (such as keeping up with hotfixes, using maintenance mode correctly, etc…&lt;/p&gt;  &lt;p&gt;Use Health Explorer on these views – to drill down into exactly what is causing the Agent, or Health Service Watcher state to be unhealthy.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Now…. the following are some general steps to take to “fix” broken agents.&amp;#160; These are not in definitive order.&amp;#160; The order of steps really comes down to what you find when looking at the logs after taking these steps.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Start the HealthService on the agent.&amp;#160; You might find the HealthService is just not running.&amp;#160; This should not be common or systemic.&amp;#160; Consider enabling the recovery for this condition to restart the HealthService on Heartbeat failure.&amp;#160; However – if this is systemic – it is indicative of something causing your HealthService to restart too frequently, or administrators stopping SCOM.&amp;#160; Look in the OpsMgr event log for verification.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Bounce the HealthService on the agent.&amp;#160; Sometimes this is all that is needed to resolve an agent issue.&amp;#160; Look in the OpsMgr event log after a HealthService restart, to make sure it is clean with no errors.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Clear the HealthService queue and config (manually).&amp;#160; This is done by stopping the HealthService.&amp;#160; Then deleting the “\Program Files\System Center Operations Manager 2007\Health Service State” folder.&amp;#160; Then start the HealthService.&amp;#160; This removes the agent config file, and the agent queue files.&amp;#160; The agent starts up with no configuration, so it will resort to the registry to determine what management server to talk to.&amp;#160; From the registry – it will find out if it is AD integrated, or a fixed management server to talk to if not.&amp;#160; This is located at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Agent Management Groups\PROD1\Parent Health Services\ location, in the \&amp;lt;#&amp;gt;\NetworkName string value.&amp;#160; The agent will contact the management server – request config, receive config, download the appropriate management packs, apply them, run the discoveries, send up discovery data, and repeat the cycle for a little while.&amp;#160; This is very much what happens on a new agent during initial deployment.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Clear the HealthService queue and config (from the console).&amp;#160; When looking at the above view (or any state view or discovered inventory view which targets the HealthService or Agent class) there is a task in the actions pane - “Flush Health Service State and Cache”.&amp;#160; This will perform a very similar action to that above…. as a console task.&amp;#160; This will only work on an agent that is somewhat responsive…. if it does not work you need to perform this manually as the agent is really broken from communication with the management server.&amp;#160; This task will never complete, and will not return success – because the task breaks off from itself as the queue is flushed.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;“Repair” the agent from the console.&amp;#160; This is done from the Administration pane – Agent Managed.&amp;#160; You should not run a repair on any AD-integrated agent – as this will break the AD integration and assign it to the management server that ran the repair action.&amp;#160; A “repair” technically just reinstalls the agent in a push fashion, just like an initial agent deployment.&amp;#160; It will also apply/reapply any agent related hotfixes in the management server’s \Program Files\System Center Operations Manager 2007\AgentManagement\ directories. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Reinstall the agent (manually).&amp;#160; This would be for manual installs or when push/repair is not possible.&amp;#160; This section is where the combination of options gets a little tricky.&amp;#160; When you are at this point… where you have given up, I find just going all the way with a brute force reinstall is the best way.&amp;#160; This means performing the following steps:&lt;/li&gt;    &lt;ul&gt;     &lt;li&gt;Uninstall the agent via add/remove programs.&lt;/li&gt;      &lt;li&gt;Run the &lt;a href="http://www.microsoft.com/downloads/details.aspx?familyid=14FF7073-C71B-4AD0-805A-A8E458D2C9E0&amp;amp;displaylang=en"&gt;Operations Manager Cleanup Tool&lt;/a&gt; CleanMom.exe or CleanMOM64.exe.&amp;#160; This is designed to make sure that the service, files, and all registry entires are removed.&lt;/li&gt;      &lt;li&gt;Ensure that the agent’s folder is removed at:&amp;#160; \Program Files\System Center Operations Manager 2007\&lt;/li&gt;      &lt;li&gt;Ensure that the following registry keys are deleted:&lt;/li&gt;      &lt;ul&gt;       &lt;li&gt;HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager&lt;/li&gt;        &lt;li&gt;HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HealthService&lt;/li&gt;     &lt;/ul&gt;      &lt;li&gt;Reboot the agent machine (if possible)&lt;/li&gt;      &lt;li&gt;Delete the agent from Agent Managed in the OpsMgr console.&amp;#160; This will allow a new HealthService ID to be detected and is sometimes a required step to get an agent to work properly, although not always required.&lt;/li&gt;      &lt;li&gt;Now that the agent is gone cleanly from both OpsMgr console and the agent Operating System…. manually reinstall the agent.&amp;#160; Keep it simple – install it using a named management server/management group, and use Local System for the agent action account (these will remove any common issues with a low priv domain account, and AD integration if used)&amp;#160; If it works correctly – you can always reinstall again using low priv or AD integration.&lt;/li&gt;      &lt;li&gt;Remember to import certificats at this point if you are using those on the individual agent.&lt;/li&gt;      &lt;li&gt;As always – look in the OperationsManager event log…. this will tell you if it connected, and is working, or if there is a connectivity issue.&lt;/li&gt;   &lt;/ul&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;To summarize…. there are many things that can cause an agent issue, and many methods to troubleshoot.&amp;#160; However – to summarize at a very general level, my typical steps are:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Review OpsMgr event log on agent&lt;/li&gt;    &lt;li&gt;Bounce HealthService&lt;/li&gt;    &lt;li&gt;Bounce HealthService clearing \Health Service State folder.&lt;/li&gt;    &lt;li&gt;Complete brute force reinstall of the agent.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;If it an external issue is causing the issue (DNS, Kerberos, Firewall) then these steps likely will not help you…. but those should be available from the OpsMgr event log.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Also – make sure you see my other posts on agent health and troubleshooting during deployment:&lt;/p&gt;  &lt;p&gt;&lt;a title="Console based Agent Deployment Troubleshooting table" href="http://blogs.technet.com/kevinholman/archive/2009/01/27/console-based-agent-deployment-troubleshooting-table.aspx"&gt;Console based Agent Deployment Troubleshooting table&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a title="Agent discovery and push troubleshooting in OpsMgr 2007" href="http://blogs.technet.com/kevinholman/archive/2007/12/12/agent-discovery-and-push-troubleshooting-in-opsmgr-2007.aspx"&gt;Agent discovery and push troubleshooting in OpsMgr 2007&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a title="Getting lots of Script Failed To Run alerts- WMI Probe Failed Execution- Backward Compatibility" href="http://blogs.technet.com/kevinholman/archive/2009/06/29/getting-lots-of-script-failed-to-run-alerts-wmi-probe-failed-execution-backward-compatibility-script-error.aspx"&gt;Getting lots of Script Failed To Run alerts- WMI Probe Failed Execution- Backward Compatibility&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a title="Agent Pending Actions can get out of synch between the Console, and the database" href="http://blogs.technet.com/kevinholman/archive/2008/09/29/agent-pending-actions-can-get-out-of-synch-between-the-console-and-the-database.aspx"&gt;Agent Pending Actions can get out of synch between the Console, and the database&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a title="Which hotfixes should I apply-" href="http://blogs.technet.com/kevinholman/archive/2009/01/27/which-hotfixes-should-i-apply.aspx"&gt;Which hotfixes should I apply-&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3284447" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Hotfix/default.aspx">Hotfix</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Tools/default.aspx">Tools</category></item><item><title>Getting lots of Script Failed To Run alerts? WMI Probe Failed Execution? Backward Compatibility Script Error?</title><link>http://blogs.technet.com/kevinholman/archive/2009/06/29/getting-lots-of-script-failed-to-run-alerts-wmi-probe-failed-execution-backward-compatibility-script-error.aspx</link><pubDate>Mon, 29 Jun 2009 17:24:31 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3259635</guid><dc:creator>kevinhol</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3259635.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3259635</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3259635</wfw:comment><description>&lt;p&gt;In OpsMgr 2007, it is likely that your most common alert is not really a MP based alert from a technology management pack…. it could be a built-in alert that a script failed, or WMI could not be accessed.&amp;#160; This is because when WMI is broken on a machine, almost EVERYTHING fails to execute properly on that agent.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;At a recent health check at a customer site, we found the top 5 alerts in his environment (by cumulative repeat count) were:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;WMI Probe Module Failed Execution&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Service Check Data Source Module Failed Execution &lt;/strong&gt;&lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Backward Compatibility Script Error &lt;/strong&gt;&lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Script or Executable Failed to run &lt;/strong&gt;&lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Service Check Probe Module Failed Execution&lt;/strong&gt; &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Sometimes – these alerts are normal…. the server is busy, or someone rebooted it without putting it into maintenance mode and allowing the workflows to unload gracefully.&lt;/p&gt;  &lt;p&gt;However, if you have a high repeat count on these, it is typically indicative of something seriously broken on that agent(s).&amp;#160; Most of the time – the failure is in WMI.&amp;#160; Many customers get frustrated with these script errors, because they see them as “false alerts” because they don't know how to resolve the root cause, and we just tell you “this action broke”, we don't tell you why.&amp;#160; It is critical that you examine these alerts, however, because these alerts will indicate something seriously wrong with an agent, such as broken WMI/cscript/OS issue.&amp;#160; If you ignore them, or disable them – you will never know that monitoring is not functioning 100%.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;Generally – here is how I attack script/WMI failures&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;1.&amp;#160; If the repeat count is 0 or 1, I ignore these as random failures, and close the alerts from time to time.&lt;/p&gt;    &lt;p&gt;2.&amp;#160; If the repeat count is very high, then something is wrong with the agent, and needs remediation on the agent OS.&amp;#160; Investigate the OpsMgr event log on the agent for Warning/Critical events – to see if a lot of workflows are failing due to this issue.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The FIRST thing I do – is to see if WMI is responsive.&amp;#160; I run WBEMTEST, and connect to “root\cimv2”.&amp;#160;&amp;#160; I then hit “query” and execute a “select * from win32_operatingsystem” to see if it returns results, or an error.&amp;#160; Next – I look at the namespace from the alert in SCOM…. perhaps it is “root\MicrosoftDNS”, or “root\CCM”.&amp;#160; Then – I try and run the query that is failing from the alert.&lt;/p&gt;  &lt;p&gt;If EITHER of the above connections/queries fail…. then I know what's wrong.&amp;#160; WMI has a core issue, and I punt this to my platform or application team to fix it.&amp;#160; Sometimes it needs a MOF recompile, sometimes it needs WMI service bounced or the OS bounced.&lt;/p&gt;  &lt;p&gt;If these all appear to work correctly, or, the problem is resolved after a WMI service bounce, then re-appears later – check out the following:&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;There are many things you can do to resolve/remediate these issues&lt;/em&gt;&lt;/strong&gt;.&amp;#160; Here is a list of the most common fixes:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;1.&amp;#160; &lt;strong&gt;Apply&lt;/strong&gt; &lt;a title="http://support.microsoft.com/kb/933061" href="http://support.microsoft.com/kb/933061"&gt;http://support.microsoft.com/kb/933061&lt;/a&gt;&amp;#160; This resolves a LOT of issues on the Windows 2003 OS with WMI.&amp;#160; This should be one of your first steps.&amp;#160; This applies to x86 or x64 Windows Server 2003 SP1 or SP2.&lt;/p&gt;  &lt;p&gt;2.&amp;#160; &lt;strong&gt;Registry modification&lt;/strong&gt; for WMI buffer thresholds (see below)&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;“HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WBEM\CIMOM\Low Threshold On Events (B)&amp;quot; to 35000000 (default is 10000000)      &lt;br /&gt;”HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WBEM\CIMOM\High Threshold On Events (B)&amp;quot; to 70000000 (default is 20000000)&lt;/p&gt; &lt;/blockquote&gt;  &lt;blockquote&gt;   &lt;p&gt;The registry modification to WMI buffers increases the amount of objects that WMI can hold before injecting sleep delays to the WMI service.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;3.&amp;#160; &lt;strong&gt;Apply&lt;/strong&gt; &lt;a title="http://support.microsoft.com/kb/955360" href="http://support.microsoft.com/kb/955360"&gt;http://support.microsoft.com/kb/955360&lt;/a&gt;&amp;#160; This updates the Windows Scripting Host (cscript) to version 5.7.&amp;#160; This resolves script timeouts, and scripts consuming a LOT of CPU during execution, and problems with multiple scripts running at the same time.&amp;#160; This applies to x86 or x64 Windows Server 2003 SP1 or SP2.&amp;#160; This is a very good hotfix for DNS servers, DHCP servers, and Domain Controllers.&amp;#160; This has been seen to lessen the impact of VBscripts consuming a large amount of CPU during runtime.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Making these three modifications should resolve the majority of systemic issues out there, unless WMI is completely corrupt/unresponsive and needs repair.&amp;#160; Sometimes, rebooting a server, or bouncing WMI will temporarily resolve these issues as well, if you cannot apply the fixes immediately.&lt;/p&gt;  &lt;p&gt;If you have applied all three of these above, and are still experiencing a systemic repeat of a WMI query/script failure…. the next step would be to try running the query directly, accessing the namespace in WBEMtest.&amp;#160; I’d like to hear about any experiences here.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3259635" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/management+pack/default.aspx">management pack</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>Health Service and MonitoringHost thresholds in R2 – how this has changed and what you should know</title><link>http://blogs.technet.com/kevinholman/archive/2009/06/22/health-service-and-monitoringhost-thresholds-in-r2-how-this-has-changed-and-what-you-should-know.aspx</link><pubDate>Tue, 23 Jun 2009 01:52:29 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3257617</guid><dc:creator>kevinhol</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3257617.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3257617</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3257617</wfw:comment><description>&lt;p&gt;In &lt;a href="http://blogs.technet.com/kevinholman/archive/2009/03/26/are-your-agents-restarting-every-10-minutes-are-you-sure.aspx"&gt;THIS&lt;/a&gt; post – I described the way the agent HealthService will bounce on a regular basis, and how to alert on that, and change the thresholds.&amp;#160; Please see that post for details on SP1.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Now – in R2 – much of this has changed in how it works.&amp;#160; That said – the core challenge in SP1 still exists in R2:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;&lt;font color="#0000ff"&gt;1.&amp;#160; The default agent threshold is 100MB for the HealthService and MonitoringHost process.&amp;#160; That is too low for many of the typical agents in production environments.&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;font color="#0000ff"&gt;2.&amp;#160; When we bounce the agent for using more than 100MB, we do this silently, and do not alert.&amp;#160; If your agents are constantly restarting in a loop, you will never know.&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Lets take a look at R2:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;First off – lets examine the HealthService class.&amp;#160; There are two Monitors located at&amp;#160; Health Service &amp;gt; Entity Health &amp;gt; Performance &amp;gt; Health Service Performance &amp;gt; Health Service State.&amp;#160; They are for Handle Count threshold, and Private Bytes Threshold.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb.png" width="613" height="233" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Health Service Handle Count Threshold:&lt;/strong&gt;&amp;#160; The default threshold is 2000 handles for an agent.&amp;#160; There are built-in overrides for the Management servers – bumping this number up to 10,000 handles.&amp;#160; Also, the new Native Exchange 2007 MP, bumps the threshold up to 5000, for Exchange 2007 computers.&amp;#160; It is common that you *might* have to bump up this threshold for SOME agents.&amp;#160; It is also common to bump the Management Server threshold up to 20,000 or even 50,000 if yours is constantly using more, but stable.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_4.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_1.png" width="611" height="211" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Health Service Private Bytes Threshold:&lt;/strong&gt;&amp;#160; The default is 100MB for all agents.&amp;#160; There is a threshold override for Management servers to use up to 1.6GB.&amp;#160; The new Native Exchange 2007 MP bumps the threshold to 600MB for Exchange 2007 servers.&amp;#160; This is the monitor that needs the most attention!&amp;#160; The default of 100MB is not enough for many server roles, especially if hosted on Server 2008 OS.&amp;#160; You will likely need to override this monitor, for groups of Windows Computer objects, that are affected.&amp;#160; Your agents will potentially be a in perpetual restart loop until this is done.&amp;#160; Here is an example of mine – with a few overrides in place for SQL computers and DNS servers:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_6.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_2.png" width="608" height="250" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Now…. in R2 – the MonitoringHost workflows have changed, from rules, to Monitors.&amp;#160; These are located under the “&lt;strong&gt;Agent&lt;/strong&gt;” class.&amp;#160; You will find them under &lt;strong&gt;&lt;em&gt;Agent &amp;gt; Entity Health &amp;gt; Performance &amp;gt; Health Service Performance &amp;gt; Health Service State&lt;/em&gt;&lt;/strong&gt;.&amp;#160; They are named &lt;strong&gt;&lt;em&gt;Monitoring Host Handle Count Threshold&lt;/em&gt;&lt;/strong&gt;, and &lt;strong&gt;&lt;em&gt;Monitoring Host Private Bytes Threshold&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_8.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_3.png" width="600" height="305" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;***Note:&amp;#160; In this view - you will also see the HealthService monitors, but note – these are inherited from the Health Service class.&amp;#160; This is because there is a dependency rollup that rolls up the Health Service State of the HealthService, to the Agent.&amp;#160; I will explain why in a moment.&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Monitoring Host Handle Count Threshold:&lt;/strong&gt;&amp;#160; The default threshold is 2000 handles for an agent.&amp;#160; There are built-in overrides for the Management servers – bumping this number up to 10,000 handles.&amp;#160; Also, the new Native Exchange 2007 MP, bumps the threshold up to 5000, for Exchange 2007 computers.&amp;#160; It is common that you *might* have to bump up this threshold for SOME agents.&amp;#160; It is also common to bump the Management Server threshold up to 20,000 or even 50,000 if yours is constantly using more, but stable.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_10.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_4.png" width="589" height="219" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Monitoring Host Private Bytes Threshold:&lt;/strong&gt;&amp;#160; The default is 100MB for all agents.&amp;#160; There is a threshold override for Management servers to use up to 1.6GB.&amp;#160; The new Native Exchange 2007 MP bumps the threshold to 600MB for Exchange 2007 servers.&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_12.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_5.png" width="573" height="212" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Now, in R2 – all four of these monitors roll their health state up to the Aggregate Roll-up monitor under the agent class, named “Health Service State”.&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_14.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_6.png" width="666" height="305" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;If we look at the properties of that aggregate monitor, we can see the recovery action to restart the HealthService is now on this monitor.&amp;#160; Therefore – if ANY of the 4 monitors below it are in a critical state – they will roll up to this monitor, which will launch a script to bounce the HealthService on the agents.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_16.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/HealthServiceandMonitoringHostthresholds_ED5F/image_thumb_7.png" width="617" height="598" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;This script, when it executes, will launch an event 6024 in the OpsMgr event log on the agent, that is is restarting the HealthService.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;***NOTE – the text used in the event log is not technically accurate, in that it always states &lt;font color="#0000ff"&gt;“Health Service exceeded Process\Handle Count or Private Bytes threshold.”&lt;/font&gt;&amp;#160; It could be an issue with the Monitoring Host – NOT the HealthService, and this event might mislead you in troubleshooting.&amp;#160; So just know that a 6024 event is a generic restart event – you need to look at the individual monitor state change history in Health Explorer to properly investigate.&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;So – to summarize the changes from SP1 to R2:&lt;/strong&gt;&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The MonitoringHost threshold rules are now standard monitors.&lt;/li&gt;    &lt;li&gt;The Health Service monitors roll up to Agent - Health Service State rollup monitor.&lt;/li&gt;    &lt;li&gt;The Health Service State rollup monitor has a recovery which runs a script to bounce the HealthService when it is in a critical state.&lt;/li&gt;    &lt;li&gt;We still do not alert by default when the script bounces your agent, and you need to create a rule to look for this, or, alert on the state-change of the monitor.&lt;/li&gt;    &lt;li&gt;You still will likely need to adjust the threshold of the Health Service Private Bytes monitor for many of your agents.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Lets talk about #4 above:&lt;/strong&gt;&amp;#160; You need to know when your agents are getting bounced, especially if they are caught in a loop of bouncing.&lt;/p&gt;  &lt;p&gt;You have a few choices here…. but I like to either:&amp;#160; &lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Create an alert rule, target “Agent”, Event ID 6024, in the OpsMgr event log.&lt;/li&gt;    &lt;li&gt;Override the “Health Service State” rollup monitor, to “Generates Alert = True”.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;Either one of those will give you a solution, that will detect the monitor state change, which results in bouncing the agent’s Health Service.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;With regard to #5 above:&lt;/strong&gt;&amp;#160; You will likely need to adjust this default threshold for many agents.&amp;#160; From my previous blog post on this topic – I have been seeing that mostly on the following types of servers:&lt;/p&gt;  &lt;ol&gt;   &lt;ol&gt;     &lt;li&gt;&lt;strong&gt;&lt;em&gt;Large SQL database servers&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;Server 2008 domain controllers&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;DHCP servers with large scope counts&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;DNS servers with large zone counts&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;Exchange 2007 servers&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;Large Exchange 2003 Mailbox servers&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;IIS7 (Server 2008) Servers&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;      &lt;li&gt;&lt;strong&gt;&lt;em&gt;Proxy agents that perform special agent-less monitoring (Nworks/Vmware, etc…)&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;   &lt;/ol&gt; &lt;/ol&gt;  &lt;p&gt;Create groups for these server types, and override the default threshold for this monitor for those groups.&amp;#160; In general, I have found bumping to 250MB resolves most agent issues, but some special cases could need much more.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3257617" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>R2 – Improved Agent Proxy Alerts</title><link>http://blogs.technet.com/kevinholman/archive/2009/04/07/r2-improved-agent-proxy-alerts.aspx</link><pubDate>Wed, 08 Apr 2009 00:46:17 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3223699</guid><dc:creator>kevinhol</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3223699.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3223699</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3223699</wfw:comment><description>&lt;p&gt;Here is a nice add in R2:&amp;#160; When we give you the old “agent proxy alert”, we now tell you the name of the Agent that needs agent proxy enabled, and resolve the name of the object type that it was bringing in:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Nice improvement.&amp;#160; I enable agent proxy for SQL1CLN2 and get on with my day.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/R2ImprovedAgentProxyAlerts_EBD3/image_4.png"&gt;&lt;img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="423" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/R2ImprovedAgentProxyAlerts_EBD3/image_thumb_1.png" width="736" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3223699" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/R2/default.aspx">R2</category></item><item><title>Are your agents restarting every 10 minutes? Are you sure?</title><link>http://blogs.technet.com/kevinholman/archive/2009/03/26/are-your-agents-restarting-every-10-minutes-are-you-sure.aspx</link><pubDate>Thu, 26 Mar 2009 21:50:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3218611</guid><dc:creator>kevinhol</dc:creator><slash:comments>12</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3218611.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3218611</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3218611</wfw:comment><description>&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;**Updated 6-22-2009 – This post applies to SP1 ONLY!!!&amp;#160; This architecture has changed for R2.&amp;#160; The version of this article updated for R2 is located here:&amp;#160; &lt;a title="http://blogs.technet.com/kevinholman/archive/2009/06/22/health-service-and-monitoringhost-thresholds-in-r2-how-this-has-changed-and-what-you-should-know.aspx" href="http://blogs.technet.com/kevinholman/archive/2009/06/22/health-service-and-monitoringhost-thresholds-in-r2-how-this-has-changed-and-what-you-should-know.aspx"&gt;http://blogs.technet.com/kevinholman/archive/2009/06/22/health-service-and-monitoringhost-thresholds-in-r2-how-this-has-changed-and-what-you-should-know.aspx&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Here is something I have been seeing with more and more customers…. &lt;strong&gt;and I think everyone should take a look and consider this.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;They have a decent percentage of agents, that the HealthService is being programmatically restarted every 10-12 minutes (sometimes less often than 12 minutes, but still very frequent.&lt;/p&gt;  &lt;p&gt;This is being caused by one of a few workflows.&amp;#160; By default, there is a monitor that watches the HealthService resources, and runs a script to bounce the Health Service anytime that service is consuming too many resources.&amp;#160; This is good – because we don't want an issue with a SCOM agent to ever impact the available resources on a monitored agent.&lt;/p&gt;  &lt;p&gt;The bad?&amp;#160; Well, we don't (by default) alert when the script is called that restarts the HealthService.&amp;#160; This means – this can be affecting you – and you really have no way of knowing this out of the box.&amp;#160; Also – sometimes the restart fails…. and this leaves you with a handful of agents that generate a Heartbeat failure.&amp;#160; When you look at the agent – it is fine, just the HealthService isn't running…. you start it, and everything goes back to normal… or so you think.&amp;#160; If every day you have to respond to a few Heartbeat failures…. and you find the OS up – just the agent stopped… this might be the cause.&lt;/p&gt;  &lt;p&gt;Here are the two monitors which target “Health Service” – and can restart the service:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_2.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb.png" width="785" height="239" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Here are the two rules, targeting “Agent”:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_10.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_10.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_4.png" width="680" height="100" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_4.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Most often – I see it is the monitor causing the restart… on the Health Service Private bytes, but it could be any of the above – and you should consider any of these if you are impacted by this.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Note the default overrides.&lt;/strong&gt;&amp;#160; We allow the management servers to use up to 1.6GB of HealthService privatebytes, and if you have the current Exchange 2007 MP – we have an override which allows Exchange 2007 agents to use up to 600MB, up from the default which is 100MB.&lt;/p&gt;  &lt;p&gt;The Exchange 2007 MP was updated with this override, because this issue was already detected for large Exchange servers.&lt;/p&gt;  &lt;p&gt;The problem is – that other large servers can potentially use more than 100MB.&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;I have been seeing this mostly on:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;1. Large SQL database servers&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;2.&amp;#160; Server 2008 domain controllers&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;3.&amp;#160; DHCP servers with large scope counts&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;4.&amp;#160; Exchange 2007 servers&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;5.&amp;#160; Large Exchange 2003 Mailbox servers&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;    &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;font color="#0000ff"&gt;6.&amp;#160; IIS7 (Server 2008) Servers&lt;/font&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here is what &lt;strong&gt;ALL customers&lt;/strong&gt; should implement…&amp;#160; A rule that watches for the event, when the restart script is called.&lt;/p&gt;  &lt;p&gt;Here is the event that gets created, in the OpsMgr event log on the agent, when this script tries to restart the HealthService:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Event Type:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Warning        &lt;br /&gt;Event Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Health Service Script         &lt;br /&gt;Event Category:&amp;#160;&amp;#160; None         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 6024         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 3/26/2009         &lt;br /&gt;Time:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 9:22:33 AM         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DC01         &lt;br /&gt;Description:         &lt;br /&gt;LaunchRestartHealthService.js : Launching Restart Health Service. Health Service exceeded Process\Handle Count or Private Bytes threshhold. &lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here is the event created, when it is a problem with the MonitoringHost process:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Event Type:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Warning        &lt;br /&gt;Event Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Health Service Script         &lt;br /&gt;Event Category:&amp;#160;&amp;#160;&amp;#160; None         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 6025         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 4/21/2009         &lt;br /&gt;Time:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 5:23:41 AM         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; EX2CLN1         &lt;br /&gt;Description:         &lt;br /&gt;LaunchRestartHealthService.js : Launching Restart Health Service. Monitoring Host exceeded Process\Handle Count threshhold. &lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;or&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Event Type:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Warning        &lt;br /&gt;Event Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Health Service Script         &lt;br /&gt;Event Category:&amp;#160;&amp;#160;&amp;#160; None         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 6026         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 3/26/2009         &lt;br /&gt;Time:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 10:14:30 AM         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DC01         &lt;br /&gt;Description:         &lt;br /&gt;LaunchRestartHealthService.js : Launching Restart Health Service. Monitoring Host exceeded Process\Private Bytes threshhold.&lt;/strong&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here is the event after the restart is noted as a success:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Event Type:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Information        &lt;br /&gt;Event Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Health Service Script         &lt;br /&gt;Event Category:&amp;#160;&amp;#160;&amp;#160;&amp;#160; None         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 6062         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 3/26/2009         &lt;br /&gt;Time:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 10:35:30 AM         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; DC01         &lt;br /&gt;Description:         &lt;br /&gt;RestartHealthService.js : Restarting Health Service. Service successfully restarted.&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here is the event after the restart failed for some reason:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Log Name:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Operations Manager        &lt;br /&gt;Source:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Health Service Script         &lt;br /&gt;Date:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 5/18/2009 11:58:28 AM         &lt;br /&gt;Event ID:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; 6061         &lt;br /&gt;Task Category: None         &lt;br /&gt;Level:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Error         &lt;br /&gt;Keywords:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; Classic         &lt;br /&gt;User:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; N/A         &lt;br /&gt;Computer:&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; SQL3CLN1.opsmgr.net         &lt;br /&gt;Description:         &lt;br /&gt;RestartHealthService.js : Restarting Health Service. Failed to restart service.&lt;/strong&gt;       &lt;br /&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;So – in order to see if you are impacted by this – the simplest thing to do – is to create a custom rule – that alerts when these events happen.&lt;/p&gt;  &lt;p&gt;Create a new rule – target “Windows Server Operating System”, (or whatever is your standard).&amp;#160; Look in the OpsMgr event log, with an expression of “Event ID Equals &lt;strong&gt;6024&lt;/strong&gt;” &lt;em&gt;&lt;strong&gt;OR “&lt;/strong&gt;Event ID Equals &lt;strong&gt;6025” &lt;em&gt;OR&lt;/em&gt;&lt;/strong&gt; &lt;/em&gt;“Event ID Equals &lt;strong&gt;6026&lt;/strong&gt;” &lt;strong&gt;&lt;em&gt;OR&lt;/em&gt;&lt;/strong&gt; “Event ID Equals &lt;strong&gt;6062&lt;/strong&gt;” &lt;strong&gt;&lt;em&gt;OR&lt;/em&gt;&lt;/strong&gt; “Event ID Equals &lt;strong&gt;6061&lt;/strong&gt;”.&amp;#160;&amp;#160;&amp;#160; You could also just as easily write individual rules – once for each… and use a different name for the alert.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_5.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_5.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_1.png" width="561" height="414" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_1.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Once this is created – you will get an alert for any agent that is restarting, or failed to restart.&amp;#160; This will tell you if you need to bump up the default values, for specific agents, or a group of agents, such as SQL servers, domain controllers, etc….&amp;#160; What I have found, is that bumping this number up to 250MB will generally address most agent’s issues…. but you need to monitor this in your environment and see how much memory your HealthService.exe and/or MonitoringHost.exe process needs.&lt;/p&gt;  &lt;p&gt;Next – create a view in the Monitoring Console – just for these alerts.&amp;#160; New view – Alert View – for all alerts with a given Name.&amp;#160; Use something that matches your alert name.&amp;#160; Here is mine:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_7.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_7.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_2.png" width="343" height="473" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_2.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now – watch this view for any alerts that come in…. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_17.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_17.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_3.png" width="762" height="259" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_3.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;What we see is – that I have many agents restarting from time to time… due to the default threshold not being enough.&amp;#160; The best way to monitor this… is to watch and see which agents are affected, and then bump up their threshold via overrides.&amp;#160; I would recommend bumping the number up in small increments, like 100MB at a time, and see where your “happy place” is.&lt;/p&gt;  &lt;p&gt;This article specifically addresses OpsMgr SP1.&amp;#160; I am not sure yet if this is going to need to be done in R2, so I will update this article when R2 releases.&lt;/p&gt;  &lt;p&gt;As a side note – one of the symptoms I see, is when the OpsDB StateChangeEvent table is one of the largest tables.&amp;#160; This is caused – because the constant restart of the agent, causes state to be recalculated for every monitor on the agent.&amp;#160; It sends this recalculated state on every restart of the agent, flooding the database with state data.&lt;/p&gt;  &lt;p&gt;I’d recommend putting these rules in place to alert on this event, for ANY customer…. knowing is power.&lt;/p&gt;  &lt;p mce_keep="true"&gt;&amp;#160;&lt;/p&gt;  &lt;p mce_keep="true"&gt;&amp;#160;&lt;/p&gt;  &lt;p mce_keep="true"&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;As far as some recommended values…. the best thing is to find your own “happy place”… but as of this writing, I start with 250MB as an initial adjustment.&amp;#160; You can create a group of Windows Computer objects that are affected, and simply add computers to this group that seems to need more privatebytes.&amp;#160; Use this custom group for your override on the HealthService and/or Monitoring host workflows.&lt;/p&gt;  &lt;p mce_keep="true"&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Recommendation examples:&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;I don't recommend overriding the Health Service Private Bytes Threshold Monitor “for all objects of type: Health Service”.&amp;#160; I have seen this impact the Health Service on the management servers – even though there is an override for the management servers which should be more specific in a conflict case – but this doesn't always work in the field.&lt;/p&gt;  &lt;p&gt;You can use groups of Windows Computer objects (Or groups of Health Service Instances if you so desire) for this override.&lt;/p&gt;  &lt;p&gt;If we wanted to override this value – and increase it to 200MB for &lt;strong&gt;&lt;em&gt;all agents&lt;/em&gt;&lt;/strong&gt;, override for a group – and use the “Agent Managed Computer Group”:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_14.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_6.png" width="615" height="258" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_6.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;You have to be VERY careful when doing the above&lt;/em&gt;&lt;/strong&gt; – because this override will potentially conflict with other group overrides… if you want specific overrides for Exchange, DHCP, DNS, SQL, etc…&lt;/p&gt;  &lt;p&gt;A better approach is like so:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_12.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areyouragentsrestartingevery10minutesAre_90DF/image_thumb_5.png" width="675" height="262" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Just make SURE that whatever groups you use – they don't have/share any of the same computers… or you can get some conflicting values here.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;I am attaching a management pack below.&amp;#160; This MP contains a simple rule as described about to alert on the HealthService restart, and contains a view which are scoped only to these alerts.&amp;#160; You should keep an eye on these and take action on them.&amp;#160; I added alert suppression on these rules – they will create a single alert for each identical computer and event ID, and increment repeat counts on the worst offenders.&amp;#160; &lt;strong&gt;&lt;em&gt;This is one warning alert you should NOT ignore&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3218611" width="1" height="1"&gt;</description><enclosure url="http://blogs.technet.com/kevinholman/attachment/3218611.ashx" length="2395" type="application/x-zip-compressed" /><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>The Cluster Service will automatically restart itself</title><link>http://blogs.technet.com/kevinholman/archive/2009/03/20/the-cluster-service-will-automatically-restart-itself.aspx</link><pubDate>Fri, 20 Mar 2009 17:02:47 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3215705</guid><dc:creator>kevinhol</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3215705.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3215705</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3215705</wfw:comment><description>&lt;p&gt;Something I ran across with a customer.&lt;/p&gt;  &lt;p&gt;There aren’t many situations where service recoveries run automatically in Microsoft MP’s, but this is one case where they do.&amp;#160; The cluster service running is critical to a healthy cluster.&amp;#160; In the current cluster MP, the service monitor for the cluster service will automatically start the cluster service on a node, if it detects it stops.&amp;#160; There is a recovery action on the monitor to do just that. &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/TheClusterServicewillautomaticallyrestar_7F24/image_2.png"&gt;&lt;img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="555" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/TheClusterServicewillautomaticallyrestar_7F24/image_thumb.png" width="795" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;As always – if you don't like this intended behavior – you can override just the recovery, and disable it.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Why do you need to know this?&lt;/p&gt;  &lt;p&gt;Because – some service packs for clustered applications, require you to stop the cluster service, in order to apply.&amp;#160; If you stop this service on a node while doing application maintenance, SCOM will restart it, almost immediately.&amp;#160; The correct solution – is to use Maintenance Mode in SCOM, which will unload the monitors, and hence, any automatic recoveries will no long run.&amp;#160; So…. make SURE you are effectively using maintenance mode if you ever need to stop your cluster service, or, disable this automatic recovery action.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3215705" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/management+pack/default.aspx">management pack</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Cluster/default.aspx">Cluster</category></item><item><title>Applying an OpsMgr hotfix to a RMS Cluster node? Some things to be aware of.</title><link>http://blogs.technet.com/kevinholman/archive/2009/02/25/applying-an-opsmgr-hotfix-to-a-rms-cluster-node-some-things-to-be-aware-of.aspx</link><pubDate>Wed, 25 Feb 2009 05:20:09 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3206405</guid><dc:creator>kevinhol</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3206405.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3206405</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3206405</wfw:comment><description>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;When you apply a SCOM hotfix to a RMS cluster, you need to be aware of some issues, and some workarounds.&amp;#160; This is something I have seen several times in the field… &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;On any server/agent, the Hotfix installer will stop any discovered OpsMgr services, including the SDK, Config, and HealthService.&amp;#160; This part is normal.&amp;#160; It does this in order to update the files (DLL’s) that are part of the hotfix payload, and then it will start the services again when complete.&amp;#160; This all works well, except for on RMS clusters.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The reason for this, is that the Hotfix installer is not 100% cluster aware.&amp;#160; &lt;/p&gt;  &lt;p&gt;In a RMS cluster… the passive node will have these three services stopped, and the services will be set to Manual Startup.&amp;#160; On the active node – the OpsMgr services are also set to Manual Startup, but the services are running, because the Cluster service controls these services now.&amp;#160; This is how a clustered service works, and we should not ever stop a clustered service in Service Control Manager, we really should take the resource offline, in Cluster Admin.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;So I have two options… I can apply the hotfix to the Active Node… or the Passive node.&amp;#160; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;If I choose the active node – the hotfix installer will try and stop all the OpsMgr services, and this will cause the Cluster service to try and restart them, or eventually fail them over to the passive node – depending on your Cluster configuration settings.&amp;#160; Therefore – it is probably best to patch the passive node first… ensure the hotfix applied correctly, and then move the cluster group and OpsMgr RMS group over to the freshly hotfixed node… and go patch the other one (now passive)&lt;/p&gt;  &lt;p&gt;This works – but is not 100% smooth.&amp;#160; When we apply the hotfix to the passive node, the hotfix installer will try and start the services at the end of the process, even though they were not running previously.&amp;#160; We do NOT want these services trying to run on the passive node – since it does not own the cluster disk resources…. so the services will start, but cannot do anything but log errors.&amp;#160;&amp;#160; &lt;/p&gt;  &lt;p&gt;You will also see an error from the HealthService – not being able to start.&amp;#160; It is apparent that this service fails because it cannot access the disk resource, but the SDK and config services WILL start.&lt;/p&gt;  &lt;p&gt;What is worse – is that the hotfix installer – changes the config of the service startup types to&lt;strong&gt; Automatic&lt;/strong&gt; – which means these services will continue to try and run on the passive node across reboots.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;So – the guidance I have, for RMS clusters – is:&lt;/strong&gt;&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Patch the passive node (we will call this Node 2)&lt;/li&gt;    &lt;li&gt;Click ok on the HealthService start failure error.&lt;/li&gt;    &lt;li&gt;Ensure the hotfix applied by inspecting the DLL(s) versions as documented in the KB.&lt;/li&gt;    &lt;li&gt;Stop the running SDK and Config services on the passive node.&lt;/li&gt;    &lt;li&gt;Set any OpsMgr services that were changed to Automatic – BACK to Manual.&lt;/li&gt;    &lt;li&gt;Move the cluster resource groups over to the freshly patched Node 2.&lt;/li&gt;    &lt;li&gt;On Node 1 (now passive) apply the hotfix, and repeat steps starting at Step 2 above.&lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;NOTE:&amp;#160; This is only applicable to OpsMgr specific hotfixes.&amp;#160; For OS hotfixes – you would follow your standard clustered OS hotfix routine.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3206405" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Hotfix/default.aspx">Hotfix</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Cluster/default.aspx">Cluster</category></item><item><title>Getting and keeping the SCOM agent on a Domain Controller – how do YOU do it?</title><link>http://blogs.technet.com/kevinholman/archive/2009/02/20/getting-and-keeping-the-scom-agent-on-a-domain-controller-how-do-you-do-it.aspx</link><pubDate>Fri, 20 Feb 2009 21:27:57 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3205036</guid><dc:creator>kevinhol</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3205036.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3205036</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3205036</wfw:comment><description>&lt;p&gt;I’d like to hear some community feedback on this….&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In OpsMgr – deploying a SCOM agent to a DC often presents companies with a bit of a challenge.&amp;#160; The reason is – in order to install software to a DC and manage it – we need rights on the DC to accomplish this.&amp;#160; These rights are needed, anytime we are going to deploy an agent, hotfix an agent, or run a repair on a broken agent to keep the agent healthy.&lt;/p&gt;  &lt;p&gt;When we push agents from the console, the default account used to perform the push is the &lt;strong&gt;&lt;em&gt;Management Server Action Account&lt;/em&gt;&lt;/strong&gt;.&amp;#160; If this account does not have Domain Admin rights – the push will fail to a DC, with an Access Denied.&amp;#160; We do allow the option to type in temporary (encrypted) credentials, which are used to deploy the agent, one time, and then are discarded.&amp;#160; See the image below:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/GettingandkeepingtheSCOMagentonaDomainCo_AF4A/clip_image002_2.jpg"&gt;&lt;img title="clip_image002" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="500" alt="clip_image002" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/GettingandkeepingtheSCOMagentonaDomainCo_AF4A/clip_image002_thumb.jpg" width="649" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Here is a list of the most common options I have observed, in place at customer sites… and potential custom options that can be developed.&amp;#160; &lt;strong&gt;&lt;font color="#ff0000"&gt;I’d be interested in any community feedback on any options you are using, that I dont cover or haven't seen before.&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;1. &lt;/b&gt;&lt;b&gt;Grant the Management Server Action account Domain Admin or Builtin\Administrators.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. Not recommended as a best practice, this gives rights to the MSAA that are not required for day to day activities.&lt;/p&gt;    &lt;p&gt;b. Con - SCOM Admins now control a domain admin account.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;2. &lt;/b&gt;&lt;b&gt;Grant a SCOM Administrator a special domain account, for this purpose, that is a domain admin.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. This allows us to track the actions of that SCOM admin, when he/she uses that special privileged account.&lt;/p&gt;    &lt;p&gt;b. That SCOM admin will be able to do repairs, hotfixes, and deployments for DC’s.&lt;/p&gt;    &lt;p&gt;c.&amp;#160; Con – Domain Admin teams often wont delegate these rights as they are tightly controlled.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;3. &lt;/b&gt;&lt;b&gt;The SCOM admin team delegates console based agent management to a Domain Administrator for DC agent health.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a.&amp;#160; The domain admin must become a SCOM Admin, and therefore could potentially hurt the SCOM environment.&lt;/p&gt;    &lt;p&gt;b.&amp;#160; Pro – the admins in charge of the DC’s now have full responsibility to keep the agents healthy.&lt;/p&gt;    &lt;p&gt;c.&amp;#160; Con – the Domain Admins might not understand components of SCOM, and create something that impacts the monitoring environment.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;4. &lt;/b&gt;&lt;b&gt;The SCOM admin team must partner with the Domain Admin team, and have the Domain Administrator type in his credentials any time the SCOM administrator needs to deploy/hotfix/repair an agent on a domain controller.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. This is a bit more labor intensive… because the SCOM admin must wait for a domain admin to be available to work on DC agents, but tight security boundaries are maintained.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;5. &lt;/b&gt;&lt;b&gt;All DC based agents will be manually installed/updated/repaired.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. This is very common, when the two teams do not trust each other.&amp;#160; The Domain Admin team is now required to manually deploy agents to domain controllers, and keep them up to date, and healthy.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;6. &lt;/b&gt;&lt;b&gt;Use a software deployment tool already in place to deploy/update/repair agents.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. If a software deployment tool is already in place on DC’s, like SMS/SCCM, you can create packages to deploy, hotfix, and repair agents, similar to your patching of the OS today.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;b&gt;7. &lt;/b&gt;&lt;b&gt;Customized solution:&amp;#160; Create a Run-As account that is a domain admin, one time, for use in agent deployment/repair.&lt;/b&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;a. This involves the domain admin typing in credentials ONCE, into a RUN-AS account, which is stored securely and encrypted in the SCOM database.&amp;#160; &lt;/p&gt;    &lt;p&gt;b. This run-as account can be associated with a run-as profile, which is used by a custom task, which will remotely deploy the agent to the domain controller.&amp;#160; This task will execute under the security context of the privileged run-as account.&lt;/p&gt;    &lt;p&gt;c. The benefit is that the domain admin gets to control the password for this account, the SCOM admin does not need to know the account credentials.&lt;/p&gt;    &lt;p&gt;d. The downside, is that this run-as account could potentially be leveraged by some other workflow, if a SCOM admin intentionally misused it…. Similar to solution #2 above.&lt;/p&gt;    &lt;p&gt;e.&amp;#160; This is just an idea I had – curious if anyone has already developed a solution like this?&lt;/p&gt;&lt;/blockquote&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3205036" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/Active+Directory+MP/default.aspx">Active Directory MP</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>Console based Agent Deployment Troubleshooting table</title><link>http://blogs.technet.com/kevinholman/archive/2009/01/27/console-based-agent-deployment-troubleshooting-table.aspx</link><pubDate>Tue, 27 Jan 2009 20:35:15 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3192043</guid><dc:creator>kevinhol</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3192043.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3192043</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3192043</wfw:comment><description>&lt;p&gt;This post is a list of common agent push deployment errors… and some possible remediation options.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Most common errors while pushing an agent:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;table border="1" cellspacing="0" cellpadding="2" width="749"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="362"&gt;Error&lt;/td&gt;        &lt;td valign="top" width="76"&gt;Error Code(s)&lt;/td&gt;        &lt;td valign="top" width="309"&gt;Remediation Steps&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server could not execute WMI Query &amp;quot;Select * from Win32_Environment where            &lt;br /&gt;NAME='PROCESSOR_ARCHITECTURE'&amp;quot; on computer server.domain.com&lt;/p&gt;          &lt;p&gt;Operation: Agent Install            &lt;br /&gt;Install account: domain\account             &lt;br /&gt;Error Code: 80004005             &lt;br /&gt;Error Description: Unspecified error &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;80004005&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;1.&amp;#160; Check the PATH environment variable.&amp;#160; If the PATH statement is very long, due to lots of installed third party software - this can fail.&amp;#160; Reduce the path by converting any long filename destinations to 8.3, and remove any path statements that are not necessary.&amp;#160; Or apply hotfix:&amp;#160; &lt;a title="http://support.microsoft.com/?id=969572" href="http://support.microsoft.com/?id=969572"&gt;http://support.microsoft.com/?id=969572&lt;/a&gt;&lt;/p&gt;          &lt;p&gt;2.&amp;#160; The cause could be corrupted Performance Counters on the target Agent. &lt;/p&gt;          &lt;p&gt;To rebuild all Performance counters including extensible and third party counters in Windows Server 2003, type the following commands at a command prompt. Press ENTER after each command.            &lt;br /&gt;cd \windows\system32             &lt;br /&gt;lodctr /R             &lt;br /&gt;Note /R is uppercase.             &lt;br /&gt;Windows Server 2003 rebuilds all the counters because it reads all the .ini files in the C:\Windows\inf\009 folder for the English operating system. &lt;/p&gt;          &lt;p&gt;How to manually rebuild Performance Counter Library values            &lt;br /&gt;&lt;a href="http://support.microsoft.com/kb/300956"&gt;http://support.microsoft.com/kb/300956&lt;/a&gt;&lt;/p&gt;          &lt;p&gt;3.&amp;#160; Manual agent install.&amp;#160; &lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server could not execute WMI Query &amp;quot;Select * from Win32_OperatingSystem&amp;quot; on            &lt;br /&gt;computer “servername.domain.com”             &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: 800706BA             &lt;br /&gt;Error Description: The RPC server is unavailable.&lt;/p&gt;          &lt;p&gt;The MOM Server could not execute WMI Query &amp;quot;(null)” on            &lt;br /&gt;computer “servername.domain.com”             &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: 800706BA             &lt;br /&gt;Error Description: The RPC server is unavailable.&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;8004100A          &lt;br /&gt;          &lt;p&gt;800706BA&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Ensure agent push account has local admin rights          &lt;br /&gt;          &lt;br /&gt;2.&amp;#160; Firewall is blocking NetBIOS access.&amp;#160; If Windows 2008 firewall is enabled, ensure “Remote Administration (RPC)” rule is enabled/allowed.&amp;#160; We need port 135 (RPC) and the DCOM port range opened for console push through a firewall.&amp;#160; &lt;br /&gt;          &lt;br /&gt;3.&amp;#160; Inspect WMI service, health, and rebuild repository if necessary           &lt;br /&gt;          &lt;br /&gt;4.&amp;#160; Firewall is blocking ICMP&amp;#160; (Live OneCare)           &lt;br /&gt;          &lt;br /&gt;5.&amp;#160; DNS incorrect           &lt;br /&gt;          &lt;br /&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server failed to open service control manager on computer &amp;quot;servername.domain.com&amp;quot;. Access is Denied            &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DomainName\User Account             &lt;br /&gt;Error Code: 80070005             &lt;br /&gt;Error Description: Access is denied.&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070005          &lt;br /&gt;          &lt;br /&gt;80041002&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;1.&amp;#160; Verify SCOM agent push account is in Local Administrators group on target computer. &lt;/p&gt;          &lt;p&gt;2.&amp;#160; On Domain controllers will have to work with AD team to install agent manually if agent push account is not a domain admin.&lt;/p&gt;          &lt;p&gt;3.&amp;#160; Disable McAfee antivirus during push&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server failed to open service control manager on computer &amp;quot;servername.domain.com&amp;quot;.            &lt;br /&gt;Therefore, the MOM Server cannot complete configuration of agent on the computer.             &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: 800706BA             &lt;br /&gt;Error Description: The RPC server is unavailable.&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;800706BA&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Firewall blocking NetBIOS ports          &lt;br /&gt;          &lt;br /&gt;2.&amp;#160; DNS resolution issue.&amp;#160; Make sure the agent can ping the MS by NetBIOS and FQDN.&amp;#160; Make sure the MS can ping the agent by NetBIOS and FQDN           &lt;br /&gt;          &lt;br /&gt;3.&amp;#160; Firewall blocking ICMP           &lt;br /&gt;          &lt;br /&gt;4.&amp;#160; RPC services stopped.           &lt;br /&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server failed to acquire lock to remote computer servername.domain.com. This means there is already an agent management operation proceeding on this computer, please retry the Push Agent operation after some time.            &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: 80072971             &lt;br /&gt;Error description: Unknown error 0x80072971 &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p&gt;80072971 &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;This problem occurs if the LockFileTime.txt file is located in the following folder on the remote computer:            &lt;br /&gt;%windir%\422C3AB1-32E0-4411-BF66-A84FEEFCC8E2             &lt;br /&gt;When you install or remove a management agent, the Operations Manager 2007 management server copies temporary files to the remote computer. One of these files is named LockFileTime.txt. This lock file is intended to prevent another management server from performing a management agent installation at the same time as the current installation. If the management agent installation is unsuccessful and if the management server loses connectivity with the remote computer, the temporary files may not be removed. Therefore, the LockFileTime.txt may remain in the folder on the remote computer. When the management server next tries to perform an agent installation, the management server detects the lock file. Therefore, the management agent installation is unsuccessful. &lt;/p&gt;         &lt;a title="http://support.microsoft.com/kb/934760/en-us" href="http://support.microsoft.com/kb/934760/en-us"&gt;http://support.microsoft.com/kb/934760/en-us&lt;/a&gt;           &lt;br /&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server detected that the following services on computer &amp;quot;(null);NetLogon&amp;quot; are not running. These services are required for push agent installation. To complete this operation, either start the required services on the computer or install the MOM agent manually by using MOMAgent.msi located on the product CD.            &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Remote Computer Name: servername.domain.com Install account: DOMAIN\account             &lt;br /&gt;Error Code: C000296E             &lt;br /&gt;Error Description: Unknown error 0xC000296E &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p&gt;C000296E &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;1.&amp;#160; Netlogon service is not running.&amp;#160; It must be set to auto/started &lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server detected that the following services on computer            &lt;br /&gt;&amp;quot;winmgmt;(null)&amp;quot; are not running&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;C000296E&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; WMI services not running or WMI corrupt&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The MOM Server detected that the Windows Installer service (MSIServer) is disabled on computer &amp;quot;servername.domain.com&amp;quot;. This service is required for push agent installation. To complete this operation on the computer, either set the MSIServer startup type to &amp;quot;Manual&amp;quot; or &amp;quot;Automatic&amp;quot;, or install the MOM agent manually by using MOMAgent.msi located on the product CD.            &lt;br /&gt;Operation: Agent Install             &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: C0002976             &lt;br /&gt;Error Description: Unknown error 0xC0002976&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p&gt;C0002976 &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;1.&amp;#160; Windows Installer service is not running or set to disabled – set this to manual or auto and start it.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;         &lt;p&gt;The Agent Management Operation Agent Install failed for remote computer servername.domain.com.            &lt;br /&gt;Install account: DOMAIN\account             &lt;br /&gt;Error Code: 80070643             &lt;br /&gt;Error Description: Fatal error during installation.             &lt;br /&gt;Microsoft Installer Error Description:             &lt;br /&gt;For more information, see Windows Installer log file &amp;quot;C:\Program Files\System Center Operations Manager 2007\AgentManagement\AgentLogs\servernameAgentInstall.LOG             &lt;br /&gt;C:\Program Files\System Center Operations Manager 2007\AgentManagement\AgentLogs\servernameMOMAgentMgmt.log&amp;quot; on the Management Server. &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="76"&gt;         &lt;p&gt;80070643 &lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Enable the automatic Updates service…. Install the agent – then disable the auto-updates service if desired.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;Call was canceled by the message filter&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80010002&lt;/td&gt;        &lt;td valign="top" width="309"&gt;Install latest SP and retry. One server that failed did not have Service pack installed&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server could not find directory \\I.P.\C$\WINDOWS\. Agent will not be installed on computer &amp;quot;name&amp;quot;. Please verify the required share exists.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070006&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;1.&amp;#160; Manual agent install&lt;/p&gt;          &lt;p&gt;Possible locking on registry?&lt;/p&gt;          &lt;p&gt;&lt;a href="http://www.sysadmintales.com/category/operations-manager/"&gt;http://www.sysadmintales.com/category/operations-manager/&lt;/a&gt;&lt;/p&gt;          &lt;p&gt;Try manual install.&lt;/p&gt;          &lt;p&gt;Verified share does not exist.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The network path was not found.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070035&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Manual agent install          &lt;br /&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The Agent Management Operation Agent Install failed for remote computer &amp;quot;name&amp;quot;. There is not enough space on the disk.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070070&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Free space on install disk&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server failed to perform specified operation on computer &amp;quot;name&amp;quot;. The semaphore timeout period has expired.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070079&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;NSlookup failed on server. Possible DNS resolution issue.&lt;/p&gt;          &lt;p&gt;Try adding dnsname to dnssuffix search list.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server could not start the MOMAgentInstaller service on computer &amp;quot;name&amp;quot; in the time.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;8007041D          &lt;br /&gt;          &lt;br /&gt;80070102&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;NSlookup failed on server. Possible DNS resolution issue.&lt;/p&gt;          &lt;p&gt;Verify domain is in suffix search list on management servers.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The Agent Management Operation Agent Install failed for remote computer &amp;quot;name&amp;quot;&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070643&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Ensure automatic updates service is started          &lt;br /&gt;2.&amp;#160; Rebuild WMI repository           &lt;br /&gt;3.&amp;#160; DNS resolution issue&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The Agent Management Operation Agent Install failed for remote computer &amp;quot;name&amp;quot;. Another installation is already in progress.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070652&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;Verify not in pending management. If yes, remove and then attempt installation again.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server detected that computer &amp;quot;name&amp;quot; has an unsupported operating system or service pack version&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80072977&lt;/td&gt;        &lt;td valign="top" width="309"&gt;Install latest SP and verify you are installing to Windows system.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;Not discovered&lt;/td&gt;        &lt;td valign="top" width="76"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="309"&gt;Agent machine is not a member of domain&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;Ping fails&lt;/td&gt;        &lt;td valign="top" width="76"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Server is down          &lt;br /&gt;2.&amp;#160; Server is blocked by firewall           &lt;br /&gt;3.&amp;#160; DNS resolving to wrong IP.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;Fail to resolve machine&lt;/td&gt;        &lt;td valign="top" width="76"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; DNS issue&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server failed to perform specified operation on computer &amp;quot;name&amp;quot;. Not enough server storage…&lt;/td&gt;        &lt;td valign="top" width="76"&gt;8007046A&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; This is typically a memory error caused by the remote OS that the agent is being installed on.&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;There are currently no logon servers available to service the logon request.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;8007051F&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Possible DNS issue&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;This installation package cannot be installed by the Windows Installer service. You must install a Windows service pack that contains a newer version of the Windows Installer service.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;8007064D&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Install Windows Installer 3.1&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The network address is invalid&lt;/td&gt;        &lt;td valign="top" width="76"&gt;800706AB&lt;/td&gt;        &lt;td valign="top" width="309"&gt;         &lt;p&gt;Possible DNS name resolution issue.&lt;/p&gt;          &lt;p&gt;Tried nslookup on server name and did not get response.&lt;/p&gt;          &lt;p&gt;Verify domain is in suffix search list on management servers.&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server failed to perform specified operation on computer servername.domain.com&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80070040&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Ensure agent push account has local admin rights&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;The MOM Server detected that the actual NetBIOS name SERVERNAME is not same as the given NetBIOS name provide for remote computer SERVERNAME.domain.com.&lt;/td&gt;        &lt;td valign="top" width="76"&gt;80072979&lt;/td&gt;        &lt;td valign="top" width="309"&gt;1.&amp;#160; Correct DNS/WINS issue.          &lt;br /&gt;2.&amp;#160; Try pushing to NetBIOS name&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="76"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="309"&gt;&amp;#160;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="362"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="76"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="309"&gt;&amp;#160;&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3192043" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>Agent Pending Actions can get out of synch between the Console, and the database</title><link>http://blogs.technet.com/kevinholman/archive/2008/09/29/agent-pending-actions-can-get-out-of-synch-between-the-console-and-the-database.aspx</link><pubDate>Mon, 29 Sep 2008 21:57:59 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3130040</guid><dc:creator>kevinhol</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3130040.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3130040</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3130040</wfw:comment><description>&lt;p&gt;When you look at your agent pending actions in the Administration pane of the console.... you will see pending actions for things like approving a manual agent install, agent installation in progress, approving agent updates, like from a hotfix, etc.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;This pending action information is also contained in the SQL table in the OpsDB - agentpendingaction&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;It is possible for the agentpendingaction table to get out of synch with the console, for instance, if the server was in the middle of updating/installing an agent - and the management server Healthservice process crashed or was killed.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;In this case, you might have a lingering pending action, that blocks you from doing something in the future.&amp;#160; For instance - if you had a pending action to install an agent, that did not show up in the pending actions view of the console.&amp;#160; What might happen, is that when you attempt to discover and push the agent to this same server, you get an error message:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&amp;quot;One or more computers you are trying to manage are already in the process of being managed.&amp;#160; Please resolve these issues via the Pending Management view in Administration, prior to attempting to manage them again&amp;quot;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/AgentPendingActionscangetoutofsynchbetwe_C463/ss2_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="138" alt="ss2" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/AgentPendingActionscangetoutofsynchbetwe_C463/ss2_thumb.jpg" width="501" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;The problem is - they don't show up in this view!&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;To view the database information on pending actions:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;select * from agentpendingaction&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;You should be able to find your pending action there - that does not show up in the Pending Action view in the console, if you are affected by this.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;To resolve - we should first try and reject these &amp;quot;ghost&amp;quot; pending actions via the SDK... using powershell.&amp;#160; Open a command shell, and run the following:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;get-agentpendingaction&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To see a prettier view:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;get-agentpendingaction | ft agentname,agentpendingactiontype&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To see a specific pending action for a specific agent:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;get-agentPendingAction | where {$_.AgentName -eq &amp;quot;servername.domain.com&amp;quot;}&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;To reject the specific pending action:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;get-agentPendingAction | where {$_.AgentName -eq &amp;quot;servername.domain.com&amp;quot;}|Reject-agentPendingAction&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;We can use the last line - to reject the specific pending action we are interested in.&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;You might get an exception running this:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;Reject-AgentPendingAction : Microsoft.EnterpriseManagement.Common.UnknownServiceE       &lt;br /&gt;xception: The service threw an unknown exception. See inner exception for details        &lt;br /&gt;. ---&amp;gt; System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]:        &lt;br /&gt; Exception of type 'Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistExc        &lt;br /&gt;eption' was thrown.&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;If this fails, such as gives an exception, or if our problem pending action doesn't even show up in Powershell.... we have to drop down to the SQL database level.&amp;#160; &lt;strong&gt;This is a LAST resort and NOT SUPPORTED....&lt;/strong&gt; run at your own risk.&lt;/p&gt;  &lt;p&gt;There is a stored procedure to delete pending actions.... here is an example, to run in a SQL query window:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;exec p_AgentPendingActionDeleteByAgentName 'agentname.domain.com'&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Change 'agentname.domain.com' to the agent name that is showing up in the SQL table, but not in the console view.&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3130040" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/database/default.aspx">database</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category></item><item><title>Helper Objects are not copied to gateways</title><link>http://blogs.technet.com/kevinholman/archive/2008/07/11/helper-objects-are-not-copied-to-gateways.aspx</link><pubDate>Fri, 11 Jul 2008 23:47:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3087529</guid><dc:creator>kevinhol</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3087529.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3087529</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3087529</wfw:comment><description>&lt;P&gt;Something I noticed today..... when you deploy a gateway server - the helper&amp;nbsp;object &lt;STRONG&gt;&lt;EM&gt;oomads.msi&lt;/EM&gt;&lt;/STRONG&gt; was not copied to the local \&lt;STRONG&gt;AgementManagement&lt;/STRONG&gt; directory for agent push.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;This means, that if you have a DC in the untrusted forest, managed by a gateway, that Oomads will not get copied or installed automatically.&amp;nbsp; You will need to manually copy and install Oomands on any DC's you will monitor in the untrusted forest.&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3087529" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/kevinholman/archive/tags/Active+Directory+MP/default.aspx">Active Directory MP</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/UI+Console/default.aspx">UI Console</category></item><item><title>A report to show all agents missing a specific hotfix</title><link>http://blogs.technet.com/kevinholman/archive/2008/06/27/a-report-to-show-all-agents-missing-a-specific-hotfix.aspx</link><pubDate>Sat, 28 Jun 2008 01:11:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3079710</guid><dc:creator>kevinhol</dc:creator><slash:comments>12</slash:comments><comments>http://blogs.technet.com/kevinholman/comments/3079710.aspx</comments><wfw:commentRss>http://blogs.technet.com/kevinholman/commentrss.aspx?PostID=3079710</wfw:commentRss><wfw:comment>http://blogs.technet.com/kevinholman/rsscomments.aspx?PostID=3079710</wfw:comment><description>&lt;p&gt;This is a continuation of my previous post on determining which agents are missing a hot-fix:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a title="How do I know which hotfixes have been applied to which agents-" href="http://blogs.technet.com/kevinholman/archive/2008/06/24/how-do-i-know-which-hotfixes-have-been-applied-to-which-agents.aspx" mce_href="http://blogs.technet.com/kevinholman/archive/2008/06/24/how-do-i-know-which-hotfixes-have-been-applied-to-which-agents.aspx"&gt;How do I know which hotfixes have been applied to which agents-&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I wrote up a report that allows you to paste in a KB article number into the report as a parameter, and then it will show all agents that are potentially missing that hotfix.&amp;#160; This will help you easily find agent which need to be patched and got missed for some reason.&lt;/p&gt;  &lt;p&gt;You can run this report if you create the SQL reporting data source as specified in my previous post: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a title="Creating a new data source for reporting against the Operational Database" href="http://blogs.technet.com/kevinholman/archive/2008/06/27/creating-a-new-data-source-for-reporting-against-the-operational-database.aspx" mce_href="http://blogs.technet.com/kevinholman/archive/2008/06/27/creating-a-new-data-source-for-reporting-against-the-operational-database.aspx"&gt;Creating a new data source for reporting against the Operational Database&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Once imported - it will show up in the console.&amp;#160; Open the report, and paste in any KB article number for a OpsMgr hotfix you have applied.&amp;#160; The number MUST begin and end with &amp;quot;%&amp;quot;.... such as %951380% as shown:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areporttoshowallagentsmissingaspecificho_F1C7/image_2.png" mce_href="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areporttoshowallagentsmissingaspecificho_F1C7/image_2.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="392" alt="image" src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areporttoshowallagentsmissingaspecificho_F1C7/image_thumb.png" width="770" border="0" mce_src="http://blogs.technet.com/blogfiles/kevinholman/WindowsLiveWriter/Areporttoshowallagentsmissingaspecificho_F1C7/image_thumb.png" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;The report is attached below:&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3079710" width="1" height="1"&gt;</description><enclosure url="http://blogs.technet.com/kevinholman/attachment/3079710.ashx" length="8255" type="application/octet-stream" /><category domain="http://blogs.technet.com/kevinholman/archive/tags/agents/default.aspx">agents</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Reporting/default.aspx">Reporting</category><category domain="http://blogs.technet.com/kevinholman/archive/tags/Hotfix/default.aspx">Hotfix</category></item></channel></rss>