As I mentioned earlier, Microsoft has released the specifications for our binary Office 97-2003 file formats.  Joel Spolsky of Fog Creek Software has a great take on the announcement (and specs):

If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly. A normal programmer would conclude that Office’s binary file formats:

  • are deliberately obfuscated
  • are the product of a demented Borg mind
  • were created by insanely bad programmers
  • and are impossible to read or create correctly.

You’d be wrong on all four counts. With a little bit of digging, I’ll show you how those file formats got so unbelievably complicated, why it doesn’t reflect bad programming on Microsoft’s part, and what you can do to work around it.

The rest of the post is a very enlightening read into the history of the file formats, the design decisions that came into play, and recommendations on how (and when) to develop against the spec.