On Brian Jones' blog today, Zeyad Rajabi has posted the source code for the Custom XML Markup Detection tool from my last post. This should give readers a bit more insight into what it is looking for to identify affected documents.
The source is provided to you as an example of the Open XML SDK's incredible utility and power, but also for you to customize the tool for the purpose of scanning other areas of documents or to implement the scanning or reporting activities differently.
Quick Start The Installer is located here. (update: A new version of the installer is here: http://blogs.technet.com/cfs-file.ashx/__key/CommunityServer-Components-PostAttachments/00-03-35-69-65/Custom-XML-Markup-Detection-Tool-Setup.zip)
Background readingWhat is Custom XML- ... and the impact of the i4i judgment on Word, Regarding Custom XML Patch distribution and availability, Associating Data with Content Controls,Using Content Controls vs. Custom XML Elements,
After we made a patch available for Word related to the recent court ruling, we were asked by a handful of customers if there is a way to identify document files or solutions which may be affected. One way to identify solutions within an organization that may be affected is to scan your existing XML format based Word files (.docx and .docm) for the presence of the markup in question.
On my blog today I am posting a no-cost, unsupported tool to help you scan repositories for documents which may contain markup that would be affected by the patch. We are also providing the source code for the tool in case you would want to modify it to scan for specific directories, or to add additional functionality to the tool. It is written in C#.
Results provided by the tool can help you to identify possible areas of impact for your specific IT environment.
The documents identified by the tool as containing custom XML markup are themselves not affected by the ruling, and require no action on your part. What positive scan results will indicate are documents that will behave differently when opened in patched and unpatched versions of Office.
Positive results concentrated on a single machine or set of machines may also indicate the presence of a solution or template generating the affected markup, and may indicate the presence of a solution that will perform differently when opened by a patched or unpatched version of Word.
The scanning tool will work when run by a local user on a local machine. For SharePoint and other systems whose directories are identifiable by a UNC path, the scanning tool can be used on a server to examine documents stored within those systems.
IT Administrators can configure a login or startup script for computers in their domain to copy the .exe file to a local machine, and to execute the command line tool. Alternatively, a startup or login script can be created to run the installer locally, and execute the command line tool. For more information on login and startup scripts, visit http://www.technet.microsoft.com, or http://technet.microsoft.com/en-us/magazine/dd630947.aspx.
1. Run a command prompt window (Run as Administrator)
2. Go to the directory where you have installed the Custom XML Markup Detection Tool a. By default, this directory is "C:\Program Files\Microsoft\Custom XML Markup Detection Tool\"
3. Run the tool with the following command: DetectCustomXMLMarkup.exe [directory path] a. For example DetectCustomXMLMarkup.exe c:\temp
4. At this point, the tool will tell you how many files it is scanning and how many files it detected with Custom XML markup
5. An "output.log" file will be created, in the same directory the tool was run, that summarizes the findings of the tool. This log file includes information on files that include Custom XML markup and/or files that the tool encountered errors while scanning. This log file is a tab delimited text file, which can be opened in Notepad or Excel. Here is an example log file opened in Excel:
Notes: The tool works with directories and UNC file paths.
The Installer is located here.
Taking a break from Custom XML for a moment. more posts on the topic are imminent.
Jennifer Michelstein announced the Office Ribbon Hero today, a fun thing you can install with Office 2007 or the 2010 beta to test your depth of usage of Office.
The challenge, of course, is for you to beat me. :) Good luck with that.
Really, this is a tool that will help you become more proficient in using some of the capability in Office applications. And let's face it, there's nothing that says "I'm smarter than you" like being the best on your block at using Excel.
You can get it here: http://www.officelabs.com/ribbonhero
Background readingWhat is Custom XML- ... and the impact of the i4i judgment on Word, Regarding Custom XML Patch distribution and availability, Using Content Controls vs. Custom XML Elements, Scanning tool to detect documents with custom XML
......
In a previous post on Gray's blog, I discussed an approach for building a content publishing system using styles and content controls. This is only one of the scenarios where content controls are useful. Another scenario is where you have a sophisticated document generation system where content controls are replaced with automatically generated text. The replacement instructions could be fairly elaborate - perhaps including the database server name, table name, filter, and column name, for example. In a different scenario, you may have a system that automates testing of code listings that are contained in documents, and you may have build instructions for each snippet in the document. I blogged about this approach in OpenXmlCodeTester: Validating Code in Open XML Documents.
Note: I co-wrote this post with Anil Kumar. Many thanks to Anil for writing the managed add-in code.
In all of the above scenarios, you may have the need to associate arbitrary amounts of data with each content control. You may also have the requirement that the document author can create and edit this auxiliary information. Content controls don't directly have a facility for storing and maintaining such information, but there is a fairly easy approach to solving this problem.
Note: In this post, I refer to 'custom XML parts'. Open XML documents are stored using the Open Packaging Conventions. They are essentially zip files (packages) that contain multiple XML and other types of files (parts) within them. These parts are related to each other by a very specific mechanism called relationships. A custom XML part is a part (a file in XML format) of your own design stored in the package. You can design your own XML vocabulary for this part. I've written an MSDN article, The Essentials of Open Packaging Conventions, which explains what you need to know to work with packages and parts. 'Custom XML parts' in this sense are not affected by January 2010 update for Office Word that Gray has blogged about previously. Custom XML parts will continue to be supported in Word.
The gist of the technique for associating data with content controls is as follows:
You create a custom XML part that contains some XML that looks something like this:
If you need to maintain more than one value for each content control, you can have as many child elements of the Content element as necessary.
Each content control contains a unique ID that is assigned by Word upon creation of the content control. The data in the custom XML part is related to the content control using this ID. Following is the markup for one of the content controls that is related to the above XML:
You can make it easy for users to edit this auxiliary information in a custom task pane. To create this functionality, you create an Office managed add-in. When the focus is in a content control, the task pane is updated with information from the custom XML part, and when the user updates the data, the managed add-in updates the custom XML part. The following screen shot shows the task pane that is created by the example presented in this post:
As the user moves from content control to content control, the example updates the contents of the task pane. If the user moves to text in the document that is not in a content control, the example clears the text box in the task pane, and disables the "Update CustomXml Part" button. Another approach that you can take is to hide and show the task pane as the selection moves into and out of content controls. The example contains commented-out code that shows how to do this.
This example relies on the user pressing the "Update CustomXml Part" button. You may want to take another approach of updating the custom XML part data when the user changes any data in the task pane.
Custom Task Panes Overview provides a detailed explanation of custom task panes, and how to create them. Deploying a Visual Studio Tools for the Office System 3.0 Solution for the 2007 Microsoft Office System Using Windows Installer provides what you need to know to deploy an add-in.
There are three source files for this example:
ContentControlInfoAddIn.cs
Implements the add-in, registers various event handlers, and creates and updates the custom XML part.
ContentControlInfo.cs
Contains the event handlers for the user control that is placed on the task pane.
ContentControlInfo.Designer.cs
Contains the designer generated code for the user control.
You can download the code, along with a Visual Studio solution here.
Background readingWhat is Custom XML- ... and the impact of the i4i judgment on Word, Associating Data with Content Controls,Using Content Controls vs. Custom XML Elements, Scanning tool to detect documents with custom XML
A few questions have popped up about the patch for Custom XML and I thought I'd take a moment to address those. First, please see the patch posted on the public download site.
This patch will be not "pushed" through our update channels , because existing customers are not required to install it.
Will the patch be "pushed" to my system? Is this an automatic update?No. The patch will not be made available from Microsoft Update or Office Update, and will not be "pushed" to any user's machine by Microsoft services.
What about future Office updates such as hotfixes or Security patches? Will the patched version of Word require different updates?We will continue to provide security updates and hotfixes that work with both patched and unpatched versions of Word.
How can I tell if a document uses CustomXML markup?The easiest way to know is by opening the document in an unpatched version of Word and to look for the "pink tags" that typically delineate CustomXML markup. Alternatively, for large volumes of documents, we will make a document scanning tool available which will evaluate .DOCX and .DOCM files for the presence of CustomXML markup. My next blog post will offer more information regarding this tool, which we plan to make available at no charge. Because .DOC files are not affected, we do not plan to offer a scanning tool for them.
Background ReadingWhat is Custom XML- ... and the impact of the i4i judgment on Word, Regarding Custom XML Patch distribution and availability, Associating Data with Content Controls,Scanning tool to detect documents with custom XML
In my last post, What is "Custom XML" . . . and the impact of the i4i judgment on Word, I took some steps to identify the areas of Word that are affected by the ruling. This generated a handful of questions from people seeking to understand how they should consider moving forward with solutions in Word.
It is important to understand that using custom XML markup isn't the only way to supply semantic meaning to rich markup. Content controls provide an excellent user experience for this.
The following is a guest post by Eric White, explaining an approach for using styles and content controls for implementing an authoring environment for a publishing system. If you haven't read Eric's blog before, I highly recommend it as a source for learning Open XML and Word development. Eric is among the most knowledgeable about development using Open XML.
..........
I've seen publishing systems that use styles as a means for supplying semantic meaning to content, but this is problematic in some scenarios (but the problems are solvable by content controls). You could format the above document to look like this:
The above screen clipping uses an option in Word that allows you to display the style name for every paragraph to the left of the paragraph.
You could then write a transform from this to the desired format. This approach is problematic because the approach for extracting content involves grouping together adjacent paragraphs of a particular style, and this can lead to an idiosyncratic experience for content writers. For example, I've seen a system where the writer needed to supply a specially formatted line for a code block that would indicate the language for the code. If you needed to supply code snippets for multiple languages, you *needed* to supply a blank line formatted as normal between the code snippets:
This is, of course, problematic. If the writer did not supply this blank line, then the transform would conjoin the two code blocks. I suppose that the developers writing the transform could watch for the magic lines that contain [c#] or [vb], but writing code that looks for magic values is never a good idea. What if there is some valid code that contains that exact string on a line as part of a multi-line string? The transform would be broken, and worse, the writer would need to hack the code in some way to make the transform work properly.
If I were designing a content publishing system that needed to allow writers to work in Word and then transform the document to another format upon publishing, I would design it to use content controls. It is far better to use content controls to group multiple lines together, and to supply appropriate metadata about those lines.
The transform can then extract the contents of each code block in a deterministic fashion.
Here are the characteristics that I would give to a publishing system:
· If there is a direct transform from a single paragraph in a source document to a corresponding construct in the transformed document, then use paragraph styles. You can alter the user experience in Word to allow only valid styles.
· If there is a direct transform from runs in a paragraph to desired constructs in the transformed document, then use character styles.
· Using styles gives the writer a great user experience. The Word user interface is optimized to allow writers to select paragraphs, apply styles, specify what the style for the paragraph following should be, and so on.
· If you need to select multiple paragraphs for transform to some construct in the transformed document, then use content controls to group those paragraphs. This also gives a great experience. The metadata about grouped paragraphs (the code block in the above example) is clearly specified, and not through some magically formatted line or some other questionable technique. If you are the developer for this system, you can supply macros for easy insertion of content controls, although it is super easy to insert content controls using the stock user interface.
Using styles and content controls gives a user experience that is quite different from editing XML. Instead of inserting elements that surround content, you use styles to supply semantic meaning to paragraphs. If you need to associate groups of paragraphs together, you surround them with a content control.
Here are some additional resources to help you work with content controls:
Using LINQ to XML to Retrieve the Contents of Content Controls
Using DocumentBuilder with Content Controls for Document Assembly
Creating Data-Bound Content Controls using the Open XML SDK and LINQ to XML