Network Bandwidth Utilization for the various OpsMgr 2007 Roles
I recently started owning performance and scale for OpsMgr and it is definitely one of the most interesting and challenging areas I have ever worked on. I know the first question that is popping up in most of your minds is why is console performance so darn slow in OpsMgr 2007? There are various reasons for this which I will divulge at another time but the one thing I will assure you is that the console performance with Service Pack 1 is a lot faster (Geo Metro to BMW M3 faster, if that is a valid comparison). But I wanted to dedicate today’s blog to talk about the network bandwidth utilization as it seems to be a question a lot of customers have been asking. There are essentially three sections to discuss a) Agent to Root Management Servers\ Management Server\ Gateway Server b) Root Management Server\MS to Database c) Audit Forwarders to Audit Collectors.
a) Agent to Root Management Servers\ Management Server (MS)\ Gateway Servers
The amount of data sent to MS is based on the kind of Management Packs (MPs) you have in your environment and how you as the end user have tuned these MPs. Some MPs by default send more data to MSs than others and to get an idea of what these MPs are you can view the table attached in the doc. Data sent between agents and MSs is always compressed, in our test environment with the Active Directory, Base OS, DNS and OpsMgr MPs we noticed that the estimated bandwidth utilization was about 500 bytes per sec on a single agent and about 75 kilobytes per sec on server for 150 agents. We also did a test with all the MPs that are there out of the box and with an additional stress test MPs (simulating real world load) and we noticed that about 200Kbytes per sec received by a Management Server for 2000 agents. So as you can see from the numbers that the data sent between the agent and MS is compressed. The one common question that we get is the minimum bandwidth requirement for agent to MS in the supported configurations document is 64Kbps but my customer has a few servers that only have 56Kbps. Can we support this?. If your agent data packet size is small enough for the bandwidth you have then it should not be a problem. Our Microsoft product support folks (PSS) have always been very accommodating so while you maybe outside the support requirements they will not stop from helping you troubleshoot your issues. This is definitely one of the big perks of using Microsoft products and having Microsoft support. Gateway server are basically proxy agents that tunnel data from multiple agents to a Management Server. We have seen bandwidth utilization of about 22Kbytes per sec received by Gateway Servers for 400 agents with all the out of the box Management Packs and some stress Management Packs.
b) Root Management Server (RMS) \ Management Server(MS) to Database(DB) and Data Warehouse (DW)
In OpsMgr 2007 the RMS and MS both write directly to the DB and DW. There is no DTS jobs like we had in MOM 2005. Since, the RMS and MS write directly to the DB and DW the data is not compressed and the size of data is larger as well. What we recommend many customers to do is to have their RMS\ MS close to the DB and DW and have fast links between them. It is much better to have the agents in remote geographic locations report to an MS than to have management server in remote geographic locations write to a DB and DW.
c) Audit Forwarders to Audit Collectors
While I am no expert in Audit Collection as a feature I will share with you the data we have collected for ACS so far. My colleague Joseph is the definitive ACS guru and should be the ultimate source of information on this topic.
ACS forward events in near real time, rather than batching them together. This is different from how MOM2005 sends events. Therefore when we say ACS bandwidth utilization is 100 bytes per events and a system generates ~27 events/sec, you can literally translate that to 2.7KB per sec
If there’s a loss of network connectivity, the forwarder will resend all security events that are not confirmed to be written to the DB by the collector. The forwarder sends a heartbeat (in the form of an event) to the collector every 45 seconds. If the collector does not receive the default of 3 heartbeats from the forwarder, the collector will drop the connection and the forwarder will automatically re-initiate the connection (if it is alive)
The size of a typical security event when it is being transmitted from the managed system to the ACS collector is usually less than 100 bytes. The size of a typical security event when it has been recorded in the ACS SQL database is less than 0.5KB. Typical CPU and memory utilization be for an agent assuming that ACS functionality is enabled also on the managed system CPU is typically is less than 1% and memory is about 4-6 MB.
Joseph in one of his mail threads to a OpsMgr discussion alias had written a quick and dirty script which is attached to the blog(rename it back to .vbs) that helps you count the number of events generated per sec on the local computer.
(It can run against remote computer, just supply the computer name as an argument, but it seems to be slow…) Run it like “CScript SecurityEventPerSecond.vbs >> NumOfEvtsGenPerSec.csv” and just load the csv in Excel to calculate the average. This can be useful in situation where it is not possible to install a pilot ACS collector in order to measure incoming event rate (by looking at the perf counter ACS Collector\Incoming Event per Sec)
Satya Vel | Program Manager | System Center |
PingBack from http://stressmanagement.healthcareblog.net/network-bandwidth-utilization-for-the-various-opsmgr-2007-roles/
as a admin of a rms with +500 servers in europe and very long experience in operating MOM let me say that this is a very very bad joke... if your team would habe done a real better job the last 2 yrs you do not have to promote silly things like "better op console performance with SP1..."
There are so many, many bugs within scom 2007 that I can not believe it... I for myself have raised 12 (!) bug (!) incidents at pss the last 2 months only belonging to scom... :-(
Hi, any information like this is helpful, please keep it coming.
Where is the table you mention in section a)?
While the console and other issues are painful, there IS light at the end of the tunnel. While we too have documented and experienced numerous issues with Operations Manager, the improvements and enhancement have, thus far, far outweighed the negatives. We will see soon enough whether SP1 resolves these issues and are looking forward to the RC...
As far as bandwidth utilization, our testing confirms Satya's comments. In fact, we are seeing better results that that documented above. See http://helpmemanage.blogspot.com/2007/10/mom-2005-versus-scom-2007-bandwidth.html for a utilization report comparing MOM 2005 with SCOM 2007.
The opsmgr product team just wrote a very nice post about network bandwidth utilization specified per
Guido... I completely feel for you and I promise we are trying our best to make performace better. But please keep filing bugs and DCRs as your feedback is very important to us.
You mentioned in point c) that a security event is typically 100bytes and is taking 0.5KB(so, 512 bytes) in the database. So, do you mean that my DC that create 1GB of security log per day will take around 5GB of space in the database ???
Whats the unit on the Average column in the document? Is this Kbps?
Steve those numbers are the average number of <dataitems> per <MP> per day.
Sylvain it is on a average 0.45KB per event with indexes the reason for this is because we do not store the whole security event but we store the guid of each security event which gives us all the information of a particular event. So the ID reference a particular event this way we do not need to store thousands of security event.
Can someone tell me where to find the EventsPerSecond.vbs mentioned in this post?
Thanks in advance!
DISREGARD my last post please :| I did have it the .vbs script ... I just didn't recognize the name of the .zip file......
A month or so ago, one of our esteemed Program Managers named Satya Vel wrote a great blog post describing
Two blog posts I've seen about this important topic. http://blogs.technet.com/smsandmom/archive/2007