I was brought up in an era before computers and computer science degrees and so of necessity did an engineering degree. This leaves me at a disadvantage when discussing the details of computer science such as the nuances of OO or the latest search algorithms. Like all black clouds however it does have an advantage because much of my engineering training covered how to understand different domains such as schedule, budgets, power and heat as well as the required functionality. In addition how to do trade offs between these different domains was extensively and sometimes painfully drummed into us. This is of course much of what computer architecture is about; not so much coding as test, schedule, etc.
I was reminded of this the other day when I went to see a major trading bank in the city. The nice thing about working with city institutions is that they seem to have a large number of very smart people there; not that I am saying there aren’t smart people elsewhere, its just there seem to be more of them in the city. Must be these huge salaries I keep reading about (Note to self: “Why do I work for a pittance at Microsoft?”). I had a long meeting with a couple of their chief architects where we discussed many and varied topics ranging over a wide range of domains. I felt drained at the end of it but it was fun, I think they thought it was too, one commented that “it is great to sharpen my mind every now and again” which is much how I felt.
Anyway one of the topics that came up was power and space, not typically a subject that comes up in application architecture discussions. The architect explained that they support new applications by adding more blades to the racks in their computer centre. The previous week they had added another blade for a new application and it had overloaded the rack. This in turn had overloaded the space power and so a large section of their DP centre had powered down. This of course had very serious implications to their business. So the Architects question was “how can we avoid this in the future”? Should the application designer provide power requirements for their application, especially in this day of services and distributed design? Whose responsibility is it to ensure that the power consumption of a new application (a strange concept) doesn’t overload the DP centre? How can this be monitored and verified? All good questions without simple answers (we don’t have a Windows power centre manager.. yet J).
This of course fits well into the DSI initiative and the SDM; you could add power requirements as part of the application design, this would be stored in the SDM. Then the infrastructure architect would add the power constraints of the DP centre as part of the data centre designer. Finally a validation program would check one against the other. Whilst all these concepts are supported by Whitehorse, DSI and the SDM and I know we are working on other designers, I didn’t know if this scenario could be catered for. I will ask around and post the result of my inquiries here.
This whole area of engineering in computing seems like a rich vein of interest so I will revisit it in future blogs.
I believe this issue should be covered by Capacity Planning which should include power, space, and cooling, requirements not just CPUs and Terabytes of storage. The problem is that many large companies have stopped doing serious capacity planning because they perceive it as an unnecessary overhead given that hardware is cheaper than people.
A secondary factor is that nobody wants to pay for infrastructure so if all projects are funded by business units then investing ahead in adequate environmental controls becomes very difficult. Until we learn the hard way....
Amen to that. I think people are learning at last. What tools can we provide though..
I'm not sure what tools would be usefull for a software architect. Maybe there should be a set of tools to help System Engineers plan entire sites (both data centres and offices). Everything from printer requirements, network requirements (both wired and wireless), to server room size and temperature control. Software like exchange servers, active directory requirements etc. could also be included. Put in the number of employees, dimension of the offices and the app spits out recommended values for your infrastructure. Starting from the other side, you could add all your existing parameters and the tools would tell you where you would likely to get issues.
a Subset of these tools could then be made available to software architects to help in planning their projects. They can simply add their requirements into the existing site plan which could then spit out a list of issues that might arrise.
Thats my braindump anyway, should possibly start a bit smaller :-)
I would hope that any rack design would be specified to remove heat from a system assuming all processors are running continuously at 100% utilisation. Anything else, from an engineering perspective would be under specified. Civil engineers would be sued if a bridge collapsed due to a traffic jam and this should not be any different for IT engineers.
I just don't get it why the automatic trackback creation failed, but I posted a message on http://blogs.msdn.com/mikezand/archive/2004/06/24/165134.aspx
If you notice why the trackback failed, please I'm eager to learn ;-)
Thanks for sharing great stories, Michael!
Thanks Mike. I dont know why the trackback failed I'm afraid
Tools that would be of use would include an infrastructure planning tool that included assigning equipment to facilities and enabled power/cooling/space to be defined both for the equipment type and the facility itself. Then to be able to do "What ifs" based on growth trends/consolidation opportunities. So I could drag an application instance onto a server and it would tell me how much capacity to add and what the impact would be on the environmentals.
And of course, to get all the equipment suppliers to provide web services that enabled queries on their equipment capacity and environmental requirements in a standard XML format that could automatically populate the model!
Ok, so I asked around and the answer I got was this is all about SDM extensibility and is being covered in the Indy project, http://research.microsoft.com/%7Eefp/
part of System Center. Very cool and well worth a look.