Just an FYI that we just published a new Knowledge Base article that describes how to troubleshoot issues where an agent, management server or gateway in System Center Operations Manager 2007 or System Center Essentials 2007 and 2010 is in a gray state. Or grey state, depending on your location and preferred spelling.
The article is way too long to post here but below are some of the scenarios it covers:
Scenario 1: There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache helps in resolving the problem temporarily. However the problem comes back after a few days.
Scenario 2: There are only few agents that are impacted and they report to different management servers. Agents stay in this state all the time. Clearing the agent cache doesn’t help.
Scenario 3: All the agents reporting to one particular management server/gateway are grayed out.
Scenario 4: All the agents reporting to one particular management server flip-flop from gray to healthy and healthy to gray state intermittently. Scenario 5: All the agents reporting in the environment keep flip flopping from gray to healthy and healthy to gray state intermittently.
and much much more. If you ever find yourself having to troubleshoot OpsMgr 200 or SCE then this article is a must read.
Check out the following new KB for all the details: KB2288515 - Troubleshooting gray agent states in System Center Operations Manager 2007
Is there a scenario if the Root Management Server is grayed out and restarting the Healt Service doesn't help?
Is there a way to get a view with only the machines geryed out?
Hearbeat Failure, Failed to connect, ping are okay but !!! one more step ... willhelp also ...
In my case all the SCOM Servers and DB Servers are healthy. The Agents (Machines) in the Domain reporting to the RMS and MS are healthy. However, for some reason the WorkGroup Machines reporting to RMS are fliping from Healthy to Gray and vice-versa.
The necessary channels and ports are opened between the WorkGroup Machines and SCOM Servers. No error events on the WorkGroup Machines or the SCOM Servers (except hearthbeating failure error for those workgroup servers)
I can see event 21024 on the WorkGroup saying the "Opsmgr's configuration may be out of day for Management Group XXXXXX, and has requested updated configuration from the configuration service. the current (out of date) state cookie is " xx xx xx xx xx xx xx xx xx xx xx"
I have restarted health service several times .. and also the Health Service State folder. I have tried all the possible options but nothing seems to fix this issue.
The certificate is fine as event id 20053 is generated everytime I restart the service on WorkGroup or run the momcert import on the WorkGroup.
The Machines do get monitored however, for most of the time they are grayed out and they turn healthy by themselves without making any changes in our IT infrastructure.
Any idea what exactly is causing this issue ?
Any help will be Much appreciated.
Thanks in Advance