Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Are your agents restarting every 10 minutes? Are you sure?

Are your agents restarting every 10 minutes? Are you sure?

  • Comments 14
  • Likes

**Updated 12-21-2009

This post is OLD, and the way this process works has changed.

Please see the updated post at:

http://blogs.technet.com/kevinholman/archive/2009/12/21/the-new-and-improved-guide-on-healthservice-restarts-aka-agents-bouncing-their-own-healthservice.aspx

Comments
  • Kevin, any idea why when trying to override the Health Service Private Bytes Threshold monitor you cannot override it for a group of computer objects that you've created if you choose override "for a group"?  The only groups that appear in the list to choose from are groups that are created by management packs, etc., not any user created groups.  In order to override for a group you've created you have to choose override  "for all objects of another type", view all targets, and then choose the group you've created.

    Conversly, the rule for Monitoring Host Private Bytes Threshold will allow you to override for a group that you've created if you choose "override for a group".

    Great articles, keep up the good work.

  • This is a bug in SP1 - when you try and do this from the Authoring pane.

    Open Health Explorer - create the override there, and you will see your group.  :-(

  • Hi Kevin, As suggested by you in the above blog I have made changes Health Service Private bytes to 200MB for a particular SCOM agent. But I still see my agent not getting restarted.

    I am running nworks application on this particular computer and also see my healthservicestore.edb size growing 220 MB.Is this normal??

    I also see the below errors in the event log on the same agent machine.

    Event ID:4506

    Event Source:HealthService

    Description:

    Data was dropped due to too much outstanding data in rule "many" running for instance "many" with id:"many" in management group "XXXXXX".

  • The Nworks MP causes an agent to act as a proxy - and load workflows and collect data for potentially a HUGE number of machines.  Therefore it is common that this agent will need more memory for this.

    This is documented in the Nworks documentation I believe.

    I would disable these monitors for those instances - and measure how much they *consume* and how fast they consume it - and stop bouncing them.

    If you must set a value - I would start at 600-800MB for privatebytes.... and monitor the consumption closely.

  • I've created the custom rules and the custom view per your post, but the 'Source' column on my alerts show "Microsoft(R) Windows(R) Server 2003, Enterprise Edition" rather than the actual server name as shown on your screenshot. How did you get it to show the server name in the Source column of your custom view?

  • You need to personalize the view - and add "Path" next to "source".

    This is true for any alert view - depending on the target class of an alert, the FQDN will either be in Source or Path.... not always both and not consistent.

  • When working with multihomed agents you need to do the override in the other management group as well.

  • Yep - I just ran into that yesterday with a customer.  He had some restarting all the time - because they were multi-homed with his pre-prod management group.

  • Do you think it is becouse they are multi-homed they restart?

  • No - they are restarting because BOTH management groups are monitoring the SAME healthservice process... and the lower value from either MG will bounce the service.

    Overrides will need to be kept in synch for each management group - for this.

  • By the way - this has changed quite a bit in R2 - when I have some time - I am going to document how that works.... and how it is different than SP1.

  • The Agents do not restart but I have "MonitoringHost.exe Handle Count Threshold Alert Message" how could I measure the actual value used so I could override with something closed to it.

    Thanks,

    Dom

  • Hi Kevin,

    I have seen this issue reported in many newsgroup but have not be able to receive a satisfactory answer.

    Using your example above, I have created a Group in a new unsealed MP, say 'NewMP'

    Then I go to the monitor for Private Bytes threshold for the health service (which is stored in the DefaultMP)

    I try to Override this Monitor for a Group and once the Group List comes up, I dont see the Group I created earlier.

    In fact, even if I create a group and store that in the default MP, I am still not able to view it when I try to override that monitor.

    Can you shed some light into this?

    Thanks,

    Mahmood

  • @ Mahmood:

    What you are describing is a known SP1 bug - when you override a monitor in the authoring pane of the UI - custom groups are not displayed.

    Simply go to discovered inventory - target HealthService - and open health explorer.  From here - you can create the overrides for your groups.

    NOTE - there is a new update for SP1 that YOU MUST apply - which is n MP update which sets these monitors to 300MB by default - up from 100MB - which resolves 95% of the restart problems.  Only a handful of machines will need more than that, such as very large 64bit Exchange and SQL servers, possiby DNS and DHCP roles.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs