I was recently pointed to a presentation about Open XML that raised my curiosity. It found its way to me because it included my picture, but the content is what's on my mind. I take the Open XML discussion pretty seriously; I've had very interesting and stimulating discussions about Open XML with a lot of folks, but I've also seen a lot of the nonsense that makes the discussion cloudy and difficult (see below).
The slide references a comment I made in a ZDNet Australia interview in reference to advantages of XML-based document formats over binary formats for enabling better security. Specifically, having file formats represented in XML makes parsing simpler, because XML documents are expressed using a pre-defined (in this case public) schema. They can be easier to parse than binary formats, which can be opaque and obscure, even when you already have its documentation. Given a choice, I'm sure that 99/100 developers would prefer to work with an XML-based format over a binary format, if only for the sake of simplicity, and my comment here illustrates one of those reasons.
The deck goes onto state that Open XML allows "arbitrary binary blobs of data", citing this as a "security hole" (this isn't really anything new; this has been rehashed on several forums). I'll just take a guess and say the presenter probably missed a few important references about ODF (search for "Binary" in the text), or within the ODF spec itself… Section 9.3 of the ODF specification discusses how frames can contain "Objects represented either in the OpenDocument Format or in a Object Specific Binary Format." Section 9.3.5, describes the ability to add "plug-ins" to documents for "a media type that is not usually handled natively by office application software." Base64Binary is a core data type of ODF, as described in section 16.1.
Of course both Open XML and ODF allow the embedding of binary content. So I guess it's not clear to me why we're picking on the binary DevMode structure when (so-called "arbitrary") Binary data is supported in both formats (and probably every other authoring file format that is in widespread use today). If the implication is that ODF doesn't allow the inclusion of "arbitrary" binary information the implication is absurd and false. By this logic I'd guess it's worth a question to OASIS if we should expect binary data to be removed from a future version of the ODF spec? – I know the answer to that question; it's not even worth asking.
I haven't heard the deck presented, nor do I plan to tear the rest of it down (might be fun for a rainy day), but it looks to me like whoever created this slide deck is attempting to criticize a fundamental purpose of XML. Or maybe this is a criticism of the entire list of XML-based format specifications. Nothing about this criticism is specific to Open XML… it is an indictment of XML and document formats.
It seems odd to pick a fight with yourself (… very Fight Club-ish… "I am Jack's Self-deprecating Argument"…);
The discussion about parsing XML formats vs. binary formats is equally applicable to Open XML, ODF, UOF, CDF, or (pick your XML-based format of the day). These slides contribute nothing to the XML formats discussion other than confusion. Part of the reason that the XML Formats debate exists is because (I think) we at least agree that XML offers us better opportunities for document format management than a binary format would… but according to the their point of view, I seem to be mistaken on that point. I must also be seeing things, because when I read the ODF spec, I see a lot of "arbitrary" binary data types in there too… obviously I've missed something.
Silly me J.
This is not the first time that OOXML is being attacked by the pro-ODF crowd in a "people in glass houses shouldn't throw stones" kind of way and I'm sure it won't be the last.
There have been complaints about "rushing", but I contend that there have been more man-hours expended on Open XML than ODF, most likely both on design and review.
I also posted on Brian's or Jason's blog about the issue raised by an NB on the term "ole" which they didn't like. This exists in ODF, as does "DDE".
Did they lie awake at night after the ODF finalisation saying "damn, I wish I'd made sure they didn't use the term "ole" in there, I must make sure I stamp this kind of thing out in the next open document format that comes along"
It would be so refreshing to hear them come out with some honesty like:
ODF is based on Sun OpenOffice functionality, Open XML is based on Microsoft Office functionality. They were both reviewed and refined based on feedback by experts and both allow developers a much easier way to create and interact with office documents than was previously possible.
They both have their strengths and weaknesses, with ODF being probably easier to implement, but currently less functionally rich, Open XML probably more difficult, but currently more functionally rich.
Both standards have been implemented by third parties in their commercial software.
If your target is mainly users of Microsoft Office or applications that interact with Microsoft Office, then it makes good business sense to implement Open XML support.
If your target is mainly users of OpenOffice and it's derivatives, or IBMs products and the applications that interact with them, then it makes good business sense to implement ODF support.
If you have a very wide target market, then you should implement support for both.
We will certainly consider implementing ODF in addition to our existing Open XML support, should customer demand change for ODF from when we last measured it.
- An amusing typo (don't know if it was in the original) - "OOXML allows the inclusion of arbitrary binary blobs of data in ways that could be abused, MY MALICIOUS DOCUMENT AUTHORS"
Was this a statement, or an invitation?
Maybe someone should really write a virus that will make all *.docx viruses themselves.
So that we should what should be and what should not be there...
From my standpoint, what's interesting is the opportunity Open XML opponents have missed... Open file formats that support Office functionality are an opportunity for others to more fully integrate file formats, much like the countless implementations of Microsoft Office binary formats.
It's worth taking these formats and committing them to standardization for a lot of reasons. Regardless of market, potential, etc., it's just smart to take Open XML as a standard, if only to commit the current spec to the public record.
I'd be surprised if anything positive was said... given the rest of the deck, (is that really a dead bear?) it doesn't seem like it was intended to improve the perception of Open XML.
But I think we're agreed on many points. Parsing of document formats makes a huge difference in their security. http://support.microsoft.com/kb/935865 is an example of some of the work we've done in that area. Parsing is critically important to both binary and XML implementations.
Also, I'd agree that binary formats are going to be around for a while, given how many there are, and their widespread use today.
On the relative maturity of XML vs binary implementations, remember Office has been parsing both since Office 2000 (with XML taking an increasingly prominent role). So even the XML implementations in Office are fairly mature.
But I think we're saying similar things here. ALL XML formats reference binary data, and (at least it seems to me) that the parsing / implementation matters a lot.
See my post from back in June-
Thanks Gareth... that comment makes a lot of sense, and I couldn't agree more. (and isn't it the whole point? :)
I was pointed at a document created by the ODF alliance (with a creation date of Feb 6 th ) that discusses
[quote]This should be stored in XML, not in some undefined application-dependent format.[/quote]
If the devmode data is loaded directly into a driver the format will be defined by that driver. Storing the data in XML would mean putting tags around driver defined data formats and would only hide the fact that the dataformat is still an application/driver defined binary format and still allows data manipulation and has the same security risk as having the data being defined as binary data.
If the data were truly driver indepent XML data than it would require either arbitrary conversion to each and every kind of printerdriver format or for each and every printerdriver to be able to accept standardized XML settings.