<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.technet.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx</link><description>Brian Alvey points to a Word HTML cleaner , as well as explains why Word puts all those extra tags into HTML when you save a word doc as HTML, or copy from Word into another HTML editor. Also see my previous post about reducing the size of the HTML that</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Source code?</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#64483</link><pubDate>Thu, 29 Jan 2004 18:03:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:64483</guid><dc:creator>Fabrice</dc:creator><description>Too bad there is no source code to clean up Word's HTML. Or is there somewhere? Someone?</description></item><item><title>re: More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#64492</link><pubDate>Thu, 29 Jan 2004 18:19:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:64492</guid><dc:creator>JosephCooney</dc:creator><description>I think HtmlTidy can clean up word HTML, and they have .NET and COM bindings. You can look at the source too if you really want to.&lt;br&gt;&lt;br&gt;&lt;a target="_new" href="http://tidy.sourceforge.net/"&gt;http://tidy.sourceforge.net/&lt;/a&gt;</description></item><item><title>re: More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#64552</link><pubDate>Thu, 29 Jan 2004 20:17:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:64552</guid><dc:creator>Jon Galloway</dc:creator><description>I found that the combination of the Office 2000 HTML Filter 2.0 (&lt;a target="_new" href="http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&amp;amp;displaylang=EN"&gt;http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&amp;amp;displaylang=EN&lt;/a&gt;) and the Textism web tool worked well for me. I had more than 20K, so the Office tool got me part way and the Textism filter did the remaining cleanup.&lt;br&gt;&lt;br&gt;The tool says it's for Office 2000, but worked okay on Office XP.&lt;br&gt;&lt;br&gt;This should be a feature of Word - export as clean HTML or something.</description></item><item><title>re: More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#64730</link><pubDate>Fri, 30 Jan 2004 00:34:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:64730</guid><dc:creator>KC Lemson</dc:creator><description>Lucky you, Word does have this feature. Save a file as &amp;quot;Web page, filtered&amp;quot; and it will strip out all Office specific tags.&lt;br&gt;&lt;br&gt;Using the sample of HTML from my previous post (linked above):&lt;br&gt;&lt;br&gt;word.doc: 24KB&lt;br&gt;word-as-web-page.htm: 7KB&lt;br&gt;word-as-filtered-web-page.htm: 3KB</description></item><item><title>re: More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#64864</link><pubDate>Fri, 30 Jan 2004 07:26:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:64864</guid><dc:creator>Jon Galloway</dc:creator><description>Oops. Lucky. Thanks. You can delete my post to cover for me any time... (in my defence, I'm stuck using Office 2000 at work)</description></item><item><title>re: More on cleaning up Word's HTML</title><link>http://blogs.technet.com/kclemson/archive/2004/01/29/64461.aspx#94814</link><pubDate>Tue, 23 Mar 2004 20:33:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:94814</guid><dc:creator>Magnolia</dc:creator><description>I already used the service (Textism) by paying annual subscription and that service is not what it promises.&lt;br&gt;&lt;br&gt;The servers are always down and it only cleans part of the code.&lt;br&gt;&lt;br&gt;There are still a lot of problems with login issues and misrepresentation.&lt;br&gt;&lt;br&gt;Because og being a web-based service, he cuts the service whenever he desires manipulating access. I only got to use it 4 months and we were cut off without previous notice.&lt;br&gt;&lt;br&gt;He does not have ethics and professionalism.&lt;br&gt;&lt;br&gt;I would never recommend it. But I will recommend Word HTML Cleaner by Mambosoft; at last you get to keep the software and servers down won't interfere. You can also have lifetime FREE update.&lt;br&gt;&lt;br&gt;My suggestion: use Textism only to clean small files for free, that is, one-page documents, but not worthy to pay.&lt;br&gt;&lt;br&gt;Magnolia&lt;br&gt;&lt;br&gt;</description></item></channel></rss>