<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.technet.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx</link><description>Open XML documents can help save a significant amount of disk space and bandwidth usage, as demonstrated in this post.</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Geek Lectures - Things geeks should know about &amp;raquo; Blog Archive   &amp;raquo;  File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2655400</link><pubDate>Tue, 18 Dec 2007 03:25:08 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2655400</guid><dc:creator>Geek Lectures - Things geeks should know about » Blog Archive   »  File size reduction for Open XML</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://geeklectures.info/2007/12/17/file-size-reduction-for-open-xml/"&gt;http://geeklectures.info/2007/12/17/file-size-reduction-for-open-xml/&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Great new blog - Gray matter</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2656950</link><pubDate>Tue, 18 Dec 2007 16:57:07 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2656950</guid><dc:creator>OfficeRocker!</dc:creator><description>&lt;p&gt;Some blogs you wait a long time for and here is one.&amp;amp;#160; Gray Knowlton , Group product manager in Office&lt;/p&gt;
</description></item><item><title>Great new blog - Gray matter</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2657061</link><pubDate>Tue, 18 Dec 2007 17:40:35 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2657061</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;Some blogs you wait a long time for and here is one.&amp;amp;#160; Gray Knowlton , Group product manager in Office&lt;/p&gt;
</description></item><item><title>New blog: Gray Matter</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2657468</link><pubDate>Tue, 18 Dec 2007 20:04:23 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2657468</guid><dc:creator>Doug Mahugh</dc:creator><description>&lt;p&gt;People often ask me how much smaller Open XML documents are than corresponding Office binary documents.&lt;/p&gt;
</description></item><item><title>New blog: Gray Matter</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2657553</link><pubDate>Tue, 18 Dec 2007 20:46:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2657553</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;People often ask me how much smaller Open XML documents are than corresponding Office binary documents&lt;/p&gt;
</description></item><item><title>re: File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2658965</link><pubDate>Wed, 19 Dec 2007 07:46:21 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2658965</guid><dc:creator>Dave S.</dc:creator><description>&lt;p&gt;In a recent test I found that a text-only Office 97 Word document compressed to about 75% the size of the MSO-XML version of the same text. &lt;/p&gt;
&lt;p&gt;The zipped version of &amp;quot;Test 5 Binary.doc&amp;quot; is 15kb, vs 26k in the MSO-XML format. This is a 42% reduction and vice-versa the MSO-XML version is 170% the size of the MSO binary version. &lt;/p&gt;
&lt;p&gt;Zipping the MSO-XML format document shrinks it to 23k, so there is some room at the bottom. You can do the math on 23/15.&lt;/p&gt;
&lt;p&gt;As to saving bandwidth, aren't some transmission modes already compressing the data? I recall that compressing an ideally compressed file can make it larger. &lt;/p&gt;</description></item><item><title>re: File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2663959</link><pubDate>Thu, 20 Dec 2007 20:12:17 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2663959</guid><dc:creator>Gray Knowlton</dc:creator><description>&lt;P&gt;Hi Dave,&lt;/P&gt;
&lt;P&gt;This is interesting feedback. Quite right that mileage will vary by the specific file in question, the Zip algorithm used as well as transmission modes. And the results you're seeing with the binary compression are similar to (and partly the reason for) the existence of the new XLSB format -- a new Binary for Excel 2007 that uses the OPC like Open XML, but uses binary parts instead of XML parts. &lt;/P&gt;
&lt;P&gt;So, acknolweding that one could potentially do more to shrink the document sizes using compression tools, we put the size reduction side by side with the modularity, extensibility, reduced data corruption, custom schema support and so on... where we are today with Open XML is a pretty good spot. But we'll definitely work to improve this for the future. &lt;/P&gt;</description></item><item><title>re: File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2664355</link><pubDate>Thu, 20 Dec 2007 22:45:03 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2664355</guid><dc:creator>Dave S.</dc:creator><description>&lt;P&gt;With XML the compression should be good, relative to arbitrarily chosen data, given the small character set (~100 out of 256) and the significant repetition of tags that's likely for the body of the document. &lt;/P&gt;
&lt;P&gt;I had asked another MS blogger almost the same question, but at the time it had not occured to me to re-zip the .docx file. My question then was what additional information was in the .docx that left it so much larger than the .doc-zipped file. &lt;/P&gt;
&lt;P&gt;There was an unsatisfying answer. &lt;/P&gt;
&lt;P&gt;Ideal compression removes all redundancy and leaves only information. A bigger ideally compressed file should generally have more information.&lt;/P&gt;
&lt;P&gt;Now I am curious if information (unused placeholders?) is left out of the .docx that was in the .doc. Perhaps it's the lack of ideal compression. &lt;/P&gt;
&lt;P&gt;&amp;lt;ramble&amp;gt;&lt;/P&gt;
&lt;P&gt;File and data handling has been of some interest to me because of the number of times vendors have (burned me)provided unsatisfying results. Like modems that convert all LFs to LFCRs. Had to write a program that figured out where all the LFs were, pass that info separately, and then use that file to take out the neighboring CRs to get binary files back in shape. Or a printer adapter that specifically filtered a useful character. No soft fix for that. &lt;/P&gt;
&lt;P&gt;Company A told us (many users) there was no way to find the reason behind a software fault. Fortunately, two things occured. I got a copy of the format manual and I didn't know how the entire thing was supposed to work. So my first attempt failed to run right. But it left a little file, which I read and found within the information Company A said could not be determined. Turns out, in their implementation, they delete that file as part of cleanup. Sigh. It did not require a fix, Company A just didn't know how the software they bought worked either. (Same software as TRON used - from MAGI Corp. Clever name.) &lt;/P&gt;
&lt;P&gt;&amp;lt;\ramble&amp;gt;&lt;/P&gt;</description></item><item><title>http://blogs.msdn.com/dmahugh/default.aspx</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2664920</link><pubDate>Fri, 21 Dec 2007 02:39:16 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2664920</guid><dc:creator>TrackBack</dc:creator><description /></item><item><title>http://blogs.msdn.com/officerocker/</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2664923</link><pubDate>Fri, 21 Dec 2007 02:39:23 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2664923</guid><dc:creator>TrackBack</dc:creator><description /></item><item><title>Open XML  blogging in 2007</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2693659</link><pubDate>Mon, 31 Dec 2007 06:14:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2693659</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;It&amp;amp;#39;s been quite a year for those who have been blogging about the Open XML file formats. Here&amp;amp;#39;s&lt;/p&gt;
</description></item><item><title>Vorteile von Open XML - Dateigröße</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2694299</link><pubDate>Mon, 31 Dec 2007 10:45:38 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2694299</guid><dc:creator>Gerhard´s Marktbeobachtungen</dc:creator><description>&lt;p&gt;Einer der am besten messbaren Vorteile von Open XML ist die Reduktion der Dateigr&amp;#246;&amp;#223;en. Auch in Zeiten&lt;/p&gt;
</description></item><item><title>re: File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2706299</link><pubDate>Thu, 03 Jan 2008 17:06:52 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2706299</guid><dc:creator>David Carlisle</dc:creator><description>&lt;P&gt;Create a simple document in Word 2007. A great way to generate sample text in Word is by using a formula: “=rand(10,5)”, where 10 is the number of paragraphs&lt;/P&gt;
&lt;P&gt;As far as I can see, this causes the same sentences to be repeated multiple times, and the entire document to consist of these repetitions. This makes a very bad test for the file size testing as it will give highly exaggerated figures for zip compression (or pretty much any other text compression for that matter) Your first suggested mechanism, of sampling real documents should (given a good enough sample) give much more realistic results.&lt;/P&gt;</description></item><item><title>re: File size reduction for Open XML</title><link>http://blogs.technet.com/gray_knowlton/archive/2007/12/17/file-size-reduction-for-open-xml.aspx#2719977</link><pubDate>Mon, 07 Jan 2008 21:00:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2719977</guid><dc:creator>Gray Knowlton</dc:creator><description>&lt;p&gt;Hi David,&lt;/p&gt;
&lt;p&gt;Correct that the &amp;quot;=rand&amp;quot; formula for Word pulls text from the help file, resulting in much repeated text. The size reduction research is based on a more realistic data set, so yes, the results will reflect what is stored in existing documents. &lt;/p&gt;
&lt;p&gt;As for this test, repeated text or not, the size of the XML format compares very well to the (identical content in the) binary format; so in the comparative sense, a valid observation.&lt;/p&gt;
</description></item></channel></rss>