I was recently pointed to a presentation about Open XML that raised my curiosity. It found its way to me because it included my picture, but the content is what's on my mind. I take the Open XML discussion pretty seriously; I've had very interesting and stimulating discussions about Open XML with a lot of folks, but I've also seen a lot of the nonsense that makes the discussion cloudy and difficult (see below).
The slide references a comment I made in a ZDNet Australia interview in reference to advantages of XML-based document formats over binary formats for enabling better security. Specifically, having file formats represented in XML makes parsing simpler, because XML documents are expressed using a pre-defined (in this case public) schema. They can be easier to parse than binary formats, which can be opaque and obscure, even when you already have its documentation. Given a choice, I'm sure that 99/100 developers would prefer to work with an XML-based format over a binary format, if only for the sake of simplicity, and my comment here illustrates one of those reasons.
The deck goes onto state that Open XML allows "arbitrary binary blobs of data", citing this as a "security hole" (this isn't really anything new; this has been rehashed on several forums). I'll just take a guess and say the presenter probably missed a few important references about ODF (search for "Binary" in the text), or within the ODF spec itself… Section 9.3 of the ODF specification discusses how frames can contain "Objects represented either in the OpenDocument Format or in a Object Specific Binary Format." Section 9.3.5, describes the ability to add "plug-ins" to documents for "a media type that is not usually handled natively by office application software." Base64Binary is a core data type of ODF, as described in section 16.1.
Of course both Open XML and ODF allow the embedding of binary content. So I guess it's not clear to me why we're picking on the binary DevMode structure when (so-called "arbitrary") Binary data is supported in both formats (and probably every other authoring file format that is in widespread use today). If the implication is that ODF doesn't allow the inclusion of "arbitrary" binary information the implication is absurd and false. By this logic I'd guess it's worth a question to OASIS if we should expect binary data to be removed from a future version of the ODF spec? – I know the answer to that question; it's not even worth asking.
I haven't heard the deck presented, nor do I plan to tear the rest of it down (might be fun for a rainy day), but it looks to me like whoever created this slide deck is attempting to criticize a fundamental purpose of XML. Or maybe this is a criticism of the entire list of XML-based format specifications. Nothing about this criticism is specific to Open XML… it is an indictment of XML and document formats.
It seems odd to pick a fight with yourself (… very Fight Club-ish… "I am Jack's Self-deprecating Argument"…);
The discussion about parsing XML formats vs. binary formats is equally applicable to Open XML, ODF, UOF, CDF, or (pick your XML-based format of the day). These slides contribute nothing to the XML formats discussion other than confusion. Part of the reason that the XML Formats debate exists is because (I think) we at least agree that XML offers us better opportunities for document format management than a binary format would… but according to the their point of view, I seem to be mistaken on that point. I must also be seeing things, because when I read the ODF spec, I see a lot of "arbitrary" binary data types in there too… obviously I've missed something.
Silly me J.
Some things are just good no matter how you look at them. Managing macro-size products can be strange sometimes, because you lose perspective on individuals. Product managers spend a lot of time (at least the good ones do) trying to connect with people to get a tactile "feel" for what their product does, how it is used, and so on. But every once in a while, you get involved with something that you just feel good about from the very first moment.
For me, the Open XML to DAISY Translator is one of those projects.
I'm not sure I've been as proud to be involved in anything in my software career as I am this project. I was only involved in the early planning stages of the thing; Reed Shaffner has been taking point on this for us, working with the DAISY consortium and Sonata on the project. But I am just thrilled to see this project progressing. Open XML and DAISY have permanently raised my awareness and interest level in software for Accessibility. I've now joined the large group of folks at our company who are passionate about the topic. I can only hope to have more chances in the future to do good things in this area.
Online Videos by Veoh.com
Yesterday at our press event, I was with Reed and DAISY Consortium Secretary General George Kerscher, where they were discussing the DAISY project. I wanted to point back to this video of the coverage to share it. It is wonderful to see. The full article is here: http://blogs.inquirer.net/techaddicts/2008/01/17/ivdo-daisy-makes-reading-documents-easier-for-the-blind/
Brian is in a unique position of being a TC-45 member and Microsoft employee, and his post really illustrates how much work has to go into the Ecma efforts around ISO standardization. I've been in the trenches with Brian on file formats for quite a while now; I've seen how hard the work is. I am always happy to applaud my fellow Buckeye fans, especially when they work so hard to carry the ball forward on Open XML and interoperability.
If you didn't see Brian's post, it's worth a read before you proceed here. In essence, it says that Microsoft will adjust the existing binary format program so that the documentation will be available directly from the Web, and offered under the Open Specification Promise. It goes on to say that Microsoft has committed to sponsoring a binary to Open XML conversion tool as an Open Source project. These developments are a response to national body comments on Open XML in the ISO/IEC standardization process.
It's important to recognize that binary format documents are important digital assets. This conversion project is important because it effectively makes the conversion of documents in binary formats to Open XML even simpler by providing a reference implementation that can be reused. It also provides more options for people to transition from binary to Open XML formats, with or without Office.
In addition, the OSS project will make it even easier for an array of products that currently support the binaries to transition to a more developer-friendly XML format. If you believe the OSS model, you'll agree that offering the source code for converting binary documents to open xml documents will hopefully stimulate a community of software products that will perform this valuable service. I think of the scores of content management software providers who implement the current binary formats, who are faced with a question of what to do about file formats… happy about Open XML because they get an easier file format to develop, questioning what the best way is to go about beginning a transition. Having a reference implementation will provide an easier starting place in the transition process.
Adobe, Sun and IBM already have received our binary file format specifications for Word, Excel and PowerPoint. Each of these companies currently ship products which support .DOC, .XLS, or .PPT. (For example, Adobe Acrobat includes a "Save as .DOC" feature, StarOffice, Notes, and other applications support the existing binaries). The OSS project should provide them with an additional mechanism to understand how to convert binary to Open XML (and subsequently translate between XML formats if they choose), but also handle the binary formats in other applications. Many, many other companies have also licensed the formats.
In the end, these announcements are really a "rising tide" for everyone interested in file formats. It benefits our partners, standards participants, competitors, and hopefully answers a lot of national body comments as well.
I don't have the type of job where I distill global economic indicators into sound bites and loose predictions, but I do spend cycles trying to understand what people are going to be thinking about in the future. For any software product manager, keeping an eye on ".next" is as important as any other part of the job. It's especially tricky when you work on a product where decisions that involve "1% of our users" affect somewhere in the neighborhood of 5,000,000 people.
Much of what any product manager does (I've held the role at 3 companies) is answering an impossibly random variety questions & challenges. I've compared it before to a 24-hour-a-day, 365-day-a-year pop quiz where any topic is fair game. We get asked by many people about many things… what should this cost?, how should it be licensed?, why does my <widget> work this way?, where can I find a partner for <solution>?, when will my bug be fixed?, what features do we need in the next version?, what should we call this feature?, can we have a case study? What does your product adoption look like? Which industry events should we attend? ... topics emerge, simmer on the surface for a while, and then subside, it can be very trick to keep this many balls in the air at one time.
Much like the weather, good product management requires a lot of good forecasting. Successfully anticipating the questions makes life a lot easier, given the variety & depth of what we cover on a daily basis. But it is also the opportunity we have to do a bit of agenda-setting; where we can accept the feedback from the community at large, and take action to help shape how our product is behaving in the world. Good forecasting and successful reflection of market requirements is at the very heart of innovation and technology leadership; it is the scrum of technological advancement.
2006 was about preparing for the launch of Office 2007. Last year (and for the foreseeable future), Open XML was a big topic; we knew going in that a file format change in Office was going to be a big deal. Compatibility, standardization, adoption, accessibility are among the many dimensions of the problem that require active management. Security was a bet we made in 2007 also, and our investment there resulted in the Office Security Guide. Groove and InfoPath continue to be important investment areas, as these products become a more prominent solution for front-ending line of business applications.
Emerging interest areas from our customers for 2008 include application virtualization, (even more) accessibility, migration to VBA.NET and the .NET framework for Office solutions, (even more) security, and of course Open XML. There are countless other topics to consider, but these are among the important ones.
Application Virtualization has heated up considerably in the last 9 months. The emergence of SoftGrid and Vista Desktop Optimization is driving strong interest in using application virtualization technology to solve for application compatibility issues, allowing the use of multiple Office versions on a single desktop, and even for simplifying the deployment, patching and management processes. We're seeing how virtualizing Office with SoftGrid is delivering on its potential of a lower TCO. For us, 2008 is about helping people understand, when, why and how to consider using SoftGrid and Office in combination to simplify an IT environment. It's a great opportunity for us to provide some bottom-line benefit.
Accessibility continues to be a critical topic for Office. There was a big push in the political arena last year to confuse the role of a document format with the role of an application in enabling assistive technology, but hopefully this will subside in 2008. Every discussion I had about document format accessibility last year successfully concluded by reasoning that the functionality of the consuming applications matters much, much more than the format itself. Nonetheless, we will deliver the DAISY Translator for Open XML in 2008, and the Open XML Ballot Resolution Meeting will include several important spec changes that incorporate greater accessibility support. Our product-related investments for accessibility will look toward future Office releases, engagement with our assistive technology partners, and a bunch of other stuff. Reed Shaffner on my team will be gearing up a blog in 2008 to share progress (I'll link to him once he gets underway.)
Visual Studio 2008 is an important advancement for getting control of the code written behind custom Office solutions. We're seeing many of the macro developers of the past mature into .NET and managed code solutions, so Visual Studio and VSTO will help these folks migrate into a more secure and stable environment. They're also gearing up on blog activity. For us (like so many other things J) we'll spend a lot of time helping to explain when, why and how customers will use VSTO as a solution for migrating Macros into a more robust environment.
For Open XML, the Ballot Resolution Meeting in Geneva is just around the corner. This means that national bodies voting on Open XML will have an opportunity to review the proposed changes to the 3,500 national body comments. The result of this period of the voting is a much improved specification; everything from the clarity and organization of the standard down to the notation syntax for form field has been updated. We're hopeful that this will address the comments raised by national bodies sufficiently to obtain ISO approval.
InfoPath and Groove are achieving critical mass. Both are central to Office Business Applications, and based on the deployments we're seeing in the wild today, we're feeling pretty good about progress on both products. As we invest more in helping our customers achieve line of business interoperability with the Office client applications, our progress here will only strengthen. This is a real bright spot for Office, and will have a lot of focus in 2008 as well.
Needless to say 2008 will be a huge year for me and my team as well as the folks we're partnered with inside and outside the company.
We're excited, refreshed and geared up for the new year.
The first compatibility pack for Open XML was released in November of 2006. This add-in for Office XP and 2003 (which also works with Office 2000 in some cases) enables users to open, edit and save Open XML files using prior releases of Office. The compatibility pack is designed to ease the pain of introducing a new file format. As we learned in Office 97, changing file formats can create some significant deployment and compatibility challenges. It is a migration that we're handling with all due care and consideration for our customers' business continuity requirements.
The availability of the compatibility pack has been an interesting discussion. Today, the compatibility pack is only available as a manual download. In other words, Microsoft does not "push" the compatibility pack to users using its update tools. IT organizations or end users must manually download the tool, and deploy or install it themselves. Many organizations have (literally) demanded this be made available as an automatic update, while others would be dissatisfied with this, claiming that Microsoft is "forcing" Open XML onto its existing user community.
We decided to make it available as a manual download, and not as an automatic update, and during the first 12 months of its release, the compatibility pack has been successfully downloaded over 20 million times. This means that 20 million people have elected to manually download this 26.2MB software to their computer. This is a significant number of people adding Open XML to their environment.
Why do people download the compatibility pack? – to use Open XML, of course. If a user of Office 2003 or XP tries to read/edit an Open XML file type, Windows will offer the "Use a web service to find the appropriate program" dialog box to direct you to the compatibility pack download site. If you have updated Office with the latest service packs, you will get a similar (but more user-friendly) dialog box that directs you to the same place.
On the download center, users select their language, get the bits and off they go. The 20 million people who have already completed this demonstrate that Open XML is already in widespread use today, about 1 year after its formal introduction with Office 2007. This is in addition to the adoption Open XML is gaining in the broader software community: http://www.openxmlcommunity.org.
What is also interesting about the compatibility pack statistics is that they do not reflect deployment by IT organizations… It takes only one download by the IT desktop management team to prepare thousands of desktops with the compatibility pack (I have worked on a handful of these directly). The usage numbers for the compatibility pack are likely to be significantly higher than the download statistics indicate.
I won't explain in detail how these download numbers compare to things like the ODF Translator for Microsoft Office, but you can look at the download stats on SourceForge for that one and see for yourself. Being a product person (not a standards person) I'm far more interested in what users are doing with the software, so I don't have a positive or negative view of ODF (nor do I care to swordfight with the ODF community). But the statistics do speak pretty clearly about the preference of Microsoft Office users…
I believe in the marketing lexicon this is typically referred to as "rapid traction," but it does come with the responsibility of sustainability (speaking of buzzwords) and maintenance. Our commitment to the standard goes hand-in-hand with our long-term commitment to IT organizations and end users who have taken the opportunity to incorporate Open XML into their Office environment. Instead of the theoretical arguments and "what-if" scenarios that the document format standards community gets into, longevity of Open XML is a real consideration based only on the activity of people who use our products. In other words, Open XML is here to stay.
That's pretty exciting news.
Happy Holidays everybody.
I spend a lot of time working on the adoption of the Open XML Formats,
For IT organizations, it can be a daunting task to migrate document formats in Office, and it the benefits are not always immediately obvious. Microsoft spent a fair bit of time on tools / guidance to make the introduction of Open XML easier, and I'll drive deep on those in future posts. But I wanted to use this opportunity to discuss one of the primary reasons why you should let Open XML in, and how it can help. This will be the first in a 3 part series on file size reduction, document "sanitization" and improvements in document format security.
A tangible benefit of Open XML is file size reduction. Reducing file sizes means lower storage costs and reduced bandwidth consumption. Particularly for those paying for bandwidth on a meter, this can be quite helpful.
Why are Open XML Files smaller? With Open XML, and the Open Packaging Conventions, the file architecture is much more modular and is compressed using a ZIP archive. Storing XML content in a ZIP container lends itself very well to compression, so we do see great results for text-intensive documents like documents and spreadsheets. The benefits don’t translate as well for presentation files, because those tend to be image-intensive (and therefore do not benefit from ZIP compression), but even those are smaller.
The data in this post is a preview of a more comprehensive study we’re working on, but I thought I’d share some of the early returns. There’s no real magic in the study, it’s a pretty simple project. If you want to try this for yourself, you can do what we’re doing: use your favorite search engine / content store to retrieve 100 documents each for word processing (Word 97-2003), spreadsheet (Excel 97-2003) and presentation (PowerPoint 97-2003) format documents, and convert them to Open XML. Results will always vary slightly depending on your data set, but the results should be somewhat consistent with what we’re showing here.
You can do the document conversion using the desktop products, or the Office Migration Planning Manager (and the Office File Conversion tool, specifically), which has a command line interface. Other conversion tools are also available. Quality / results will vary depending on the translation environment.
This post will only discuss the Word documents converted using Word 2007, but the data will illustrate the survey results clearly.
"docx" Sizes
"doc" Sizes
Size Change
Storage Gain
Median
30Kb
69Kb
29Kb
52%
Minimum
11Kb
20Kb
-2Kb
-2%
Maximum
559Kb
975Kb
784Kb
87%
Percentiles
25
18Kb
35Kb
15Kb
40%
50
75
76Kb
160Kb
67Kb
62%
A median size reduction of 52% for documents is quite significant, and translates to real savings for disks and network traffic. We can assume a linear correlation between document size and the number of packets transmitted over a network; therefore we can assume a similar result in bandwidth consumption (bandwidth consumption data will be published in the final paper as well.)
Create a simple document in Word 2007. A great way to generate sample text in Word is by using a formula: “=rand(10,5)”, where 10 is the number of paragraphs in your document, and 5 is the number of sentences per paragraph. You can use this formula to generate documents of increasing length. In doing so, the benefit of compression in Open XML becomes instantly clear. I conducted this test 5 times, on documents ranging from 10 paragraphs of text to over 60 pages. (I have attached them here for you to use.)
I simply added the text, saved the file in binary format first, then saved the file again as Open XML. There is no formatting (beyond my default template, no tables, images or anything other than simple paragraphs.) As the documents increase in length, the benefit of compression is obvious:
Sample file name
.doc size
.docx size
Test 1
31k
11k
Test 2
86k
13k
Test 3
147k
15k
Test 4
269k
18k
Test 5
513k
26k
If you’re a graph type, we can make the relationship more clear:
This isn’t to say that 5,000 page documents stored using Open XML are going to be 1 – 2 % of their original size, but this is to point out that it is very easy to demonstrate real space savings with Open XML. Depending on the nature of the documents you are creating, especially if they are text-intensive, the size difference can be quite dramatic.
We’ll eventually publish the full data set in a more detailed (and scientific) white paper, and the paper will publish in late January. But as an introductory post, I thought I’d make this an easy one, with a pretty clear benefit. I’ll let you work out the math for your own storage & bandwidth savings, but if you can ask yourself “what would I gain if my files were half of their current size?” – I’ll bet the answer will usually be a good one.
Greetings,
I've created this blog to discuss topics related to Microsoft Office. I'm a Group Product Manager of the Office technical product management team. I'll use this blog to discuss important topics, upcoming releases, Open XML, interoperability and other issues. I'm excited to share this information with you, I'll look forward to adding insight as we move along.
Gray