by anandeep on October 27, 2006 12:27pm


I loved doing development in a research and university environment. You got to write cool code, prove new ideas, break new ground and generally ended up with bragging rights to say “I did an image recognition algorithm on a multi-layer architecture implementing reactive and planning parallelism on an autonomous robot!”  The code had to work on your workstation or maybe on a demo machine once.  Once you wrote the code, the only people who touched the system were hapless graduate students implementing the next big idea. They had to come to you and you could then dazzle them with your insight!  This was “sexy development”!

When I moved to industry and wrote software for day to day use – things changed.  Now you had all those people with “manager” titles telling you what to do, and those people called “testers” who told you why your code sucked (you couldn’t logically argue your way out of that because the weasels usually had proof)!.  Of course being consummate professionals you adapted. You got the religion of “bullet proof code” and worked on making sure the testers only had “fit and finish” bugs filed against you. Which the intern could work on.  That was still fun  - a different challenge maybe not as “pure” as designing a neat new algorithm but pretty good nevertheless!

You got past the testers but when they integrated the components that you had bullet-proofed to run end-to-end or user acceptance tests, unexpected stuff happened. Who would have thought that they would configure the machine that way or that another non-surface component could pass you null strings. Now you had to plan not only for the testers – but also for other developers and those pesky sys admin guys.  How did they become sys admins? They couldn’t tell a polynomial solution from a log n solution anyway!  But being nothing if not adaptable you adapted.  You now built bullet proof AND idiot proof code.  (My father, a military pilot and flight instructor, when teaching flight safety used to say “Nothing is foolproof because fools are so ingenious!”).  It got a little boring at times but you still had the satisfaction of building something that was “engineered”.

I thought I had shipped the product but I  found couldn’t sit back and relax. The support guys were making insinuations against my code. It didn’t work they said – and you hadn’t put in the right level of granularity in the logs for them to do a diagnosis.  This had nothing to do with Computer Science – any bozo could write stuff to the log. Why didn’t the intern do it? What do you mean he can’t make sense of my code? Yeah, I do know my code best. I guess it’s the right thing to do. Certainly not as fun as designing, bullet proofing and idiot proofing new code but good supportability is “sine qua non”  for a well done project!

Is that the end of it? No, further design and coding needs to be done for making software more manageable, to make the logs more systematic, to make sure that the product works when its deployed to multiple configurations, that it performs well and fails gracefully.

Unless you specialize in a certain aspect of manageability, reliability or diagnosis – this is not “sexy” development.  I probably wouldn’t get as much satisfaction from designing event logs as I would from designing a new search algorithm.

I was getting paid to do all this (ok, so it was my own startup but I was getting paid in VC money!) and it was still very hard. We did do it but it took lots of coaxing of our developers to pay attention to this.  They all preferred to work on the next release that had all the sexy features. Even though they knew that to make the startup successful and still have a job, the unsexy stuff needed to be done and done RIGHT!

When you are working for the “love of the game”  and not money, like in Open Source – who coaxes you?  Who does the unsexy stuff? Are there enough people who specialize in the esoteric aspects of event logs, that this is not  a problem? Or do users who need the feature “just do it” and add the code to the community version? Or are things slipping through the cracks?

I did a sweep of the usual suspect Linux developer mailing lists and found that there is concern about whether unsexy stuff gets done. Here is a typical comment that I saw

“I think that the only issue with Open Source boils down to this:

The things that nobody wants to do, but somebody has to.

Nobody wants to think about documentation. Or user interfaces. These things are hard, tedious, and a hell of a lot more boring than actually coming up with stuff to "make things work".”  (from here)

Documentation is famously one of those things that is considered “unsexy” (well, ok in commercial software too).  There are efforts like Grokdoc to make documentation of Open Source projects sexy by making it a priority. But the “who does unsexy?” issue is a real concern in Open Source.

We ran into a similar issue with event logs. You know the text stuff you write so that you can find out later what happened.  At the lab we just did an investigation of whether we could tell if one of our boxes had crashed from the syslog and from console messages. We were a little taken aback by how many times we couldn’t tell what states the machine had gone through.

On doing some investigation we found that the most influential project that was addressing this issue,  the Evlog project (most supported by IBM) has been quiet since 2004. This code is used internally within IBM but was not mainstreamed into the Linux kernel.

How does one get  unsexy stuff like this into the Linux kernel so that is comparable to UNIX/VMS/Windows?

I contend that it is critical to Open Source that attention be paid to the event logs. They are critical in making any operating systems reliable. VMS/UNIX/Windows all went through the process of making their event logs more meaningful – and this has helped make them much more reliable.

We will be addressing this further in the next couple of weeks – keep tuned!