by Frank Chism on October 20, 2006 03:06pm


"`When I use a word,' Humpty Dumpty said, in rather a scornful tone, `it means just what I choose it to mean -- neither more nor less.' "

Through

the

Looking

Glass

Lewis

Carroll

Where’s the glory?
I work in the cluster business. I can tell you that all too often I have felt like Alice trying to hold a conversation with Humpty Dumpty in Looking Glass Land. This usually occurs when I’m talking to someone new to cluster computing or someone who comes from a different tread of the industry than I do. My roots are in a thread that used number crunching to mean serious floating point arithmetic done by Fortran programs to simulate physical processes. Of course, some of the support routines and tools and even the operating system might be written on C, but Fortran ruled. Imagine my surprise when I found there was a ‘Number Crunchers Users Group’ in Seattle and they got together to discuss using spreadsheets. “Now where’s the glory in that?”  I thought to myself.

Time marches on, but technology runs as fast as it can just to stay in one place.
Fortunately for me, the object oriented police have provided me with just the right jargon to describe my predicament. Just consider that in any modern object oriented language it is possible that + can mean any number of things. Humpty would be proud. In OOP + means just exactly what the developer chooses it to mean. This is called overloading an operator. That may be OK for a compiler, but what about me? When I use cluster I am thinking of something that descended from the original Beowulf. No, not the King of the Geats. I mean the seminal work of those oft sung NASA nerds who put together the first Beowulf compute clusters. When I say nerds, I am here to praise cluster creators, not heap dirt on them or their work. After all, they ain’t dead yet.

For example, I work for a company that has several cluster offerings. There’s failover clusters, and load balancing scale out clusters, and my baby compute clusters. Now that’s overloading. You can usually tell what kind of cluster we mean by the type of work we talk about feeding it. If you had one type of cluster in mind and I had another and we kept talking long enough we’d either figure out the root cause of the confusion or dismiss our conversational partner as an idiot.

But wait. It gets worse.
Within my own little compute centric world, two new terms have come into common usage. They are farm and grid. So how do I tell a farm from a cluster if both are eating compute intensive programs? And worse yet, how is a cluster or a farm related (or not) to a grid? I was recently told by a co-worker to not tell our customer that he had a cluster, because as far as he was concerned it was a grid. This is proof that technical correctness is not nearly as important as political correctness. As in politics, so in life.

I can’t claim to have invented farms, but I can certainly claim to be one of the first of the render farmers. I was working at an early Computer Generated Images (CGI) site that was falling behind schedule for a major (OK, it was a big deal to us) Hollywood movie. If we were to finish in time for the planned release, we needed to get our CGI effects generated at just about twice the rate we were running at on our current machine. Fortunately the little ol’ mainframe we were using, a Cray-1, had just been superseded by the Cray X/MP, which had two CPUs instead of one and each CPU was about 50% faster than the Cray-1 CPU.  In an example of embarrassingly parallel render farming, we ran odd numbered frames on one thread, even numbered frames on another and ran a third thread to collate the frames and send them to the camera.

I can’t be blamed for grid at all. Well yes, some of the computers my company sold were ‘on the grid’, but I never thought of the grid as anything other than a route for users to do cool things with our machines. In fact I wasn’t sure that grid was anything other than a buzz word used to get NSF funding. Now, thanks to the efforts of the hardworking and unpaid volunteers at Wikipedia, I have at least one fixed mark to guide my wondering barking.

If a cluster on the grid failed over and no one was there to farm it, would it make any sense?
So, can we all agree on one set of definitions for clusters (several flavors to be sure), farms and grids? If not, I’m sure I’ll hear from the more assertive of the Port 25 readers and perhaps we can reach a group consensus and I can start quoting the  group mind of an entire community in defense of my own use of these terms without sounding too much like Humpty Dumpty making up meanings as I see fit.

Cluster: Making more than one computer behave as a single resource.

Failover or High Availability Cluster: A cluster specifically designed to perform functions in a manner that makes the service it provides continuous, even in the event of individual computer failures.

Load balancing or Scale out Cluster: Generally a high availability cluster that in addition to offering resiliency against individual computer failures also offers addition ability to deliver more of the intended service.

Compute Cluster: A cluster that is built as a single unit and treated as a single system and tuned to perform compute intensive tasks either as a capacity engine, that is to run lots of single node jobs or many low scale parallel jobs, or a capability engine, that is to run much bigger parallel jobs than a single node can accommodate.

Compute Farm: A cluster that uses a collection of computers, generally in a centralized location, to run many similar jobs in parallel for improved time to completion of a particular process. This is very similar to a Compute Cluster in capacity mode but the farm is not necessarily built to look like a single system.

Compute Grid: A heterogeneous farm that is spread out across a wider network or even the Internet but more importantly that is controlled by and conforms to the standards, concepts, and tools originating in the Global Toolkit. It can be used in both capacity and capability mode but is generally a distributed collection of resources, not a single system.

I tried to turn the handle but—

That’s all for now. I enjoyed writing this and hope to hear from some of you about what you think of my proposed definitions and how they can be improved. Other items on my blog-fodder list are ‘The Parallel Imperative’ and ‘What the Heck is Parallel I/O Anyway?’

So, never stop studying and I’ll blog at you later.
- Frank