Background ReadingWhat is Custom XML- ... and the impact of the i4i judgment on Word, Regarding Custom XML Patch distribution and availability, Associating Data with Content Controls,Scanning tool to detect documents with custom XML
In my last post, What is "Custom XML" . . . and the impact of the i4i judgment on Word, I took some steps to identify the areas of Word that are affected by the ruling. This generated a handful of questions from people seeking to understand how they should consider moving forward with solutions in Word.
It is important to understand that using custom XML markup isn't the only way to supply semantic meaning to rich markup. Content controls provide an excellent user experience for this.
The following is a guest post by Eric White, explaining an approach for using styles and content controls for implementing an authoring environment for a publishing system. If you haven't read Eric's blog before, I highly recommend it as a source for learning Open XML and Word development. Eric is among the most knowledgeable about development using Open XML.
I've seen publishing systems that use styles as a means for supplying semantic meaning to content, but this is problematic in some scenarios (but the problems are solvable by content controls). You could format the above document to look like this:
The above screen clipping uses an option in Word that allows you to display the style name for every paragraph to the left of the paragraph.
You could then write a transform from this to the desired format. This approach is problematic because the approach for extracting content involves grouping together adjacent paragraphs of a particular style, and this can lead to an idiosyncratic experience for content writers. For example, I've seen a system where the writer needed to supply a specially formatted line for a code block that would indicate the language for the code. If you needed to supply code snippets for multiple languages, you *needed* to supply a blank line formatted as normal between the code snippets:
This is, of course, problematic. If the writer did not supply this blank line, then the transform would conjoin the two code blocks. I suppose that the developers writing the transform could watch for the magic lines that contain [c#] or [vb], but writing code that looks for magic values is never a good idea. What if there is some valid code that contains that exact string on a line as part of a multi-line string? The transform would be broken, and worse, the writer would need to hack the code in some way to make the transform work properly.
If I were designing a content publishing system that needed to allow writers to work in Word and then transform the document to another format upon publishing, I would design it to use content controls. It is far better to use content controls to group multiple lines together, and to supply appropriate metadata about those lines.
The transform can then extract the contents of each code block in a deterministic fashion.
Here are the characteristics that I would give to a publishing system:
· If there is a direct transform from a single paragraph in a source document to a corresponding construct in the transformed document, then use paragraph styles. You can alter the user experience in Word to allow only valid styles.
· If there is a direct transform from runs in a paragraph to desired constructs in the transformed document, then use character styles.
· Using styles gives the writer a great user experience. The Word user interface is optimized to allow writers to select paragraphs, apply styles, specify what the style for the paragraph following should be, and so on.
· If you need to select multiple paragraphs for transform to some construct in the transformed document, then use content controls to group those paragraphs. This also gives a great experience. The metadata about grouped paragraphs (the code block in the above example) is clearly specified, and not through some magically formatted line or some other questionable technique. If you are the developer for this system, you can supply macros for easy insertion of content controls, although it is super easy to insert content controls using the stock user interface.
Using styles and content controls gives a user experience that is quite different from editing XML. Instead of inserting elements that surround content, you use styles to supply semantic meaning to paragraphs. If you need to associate groups of paragraphs together, you surround them with a content control.
Here are some additional resources to help you work with content controls:
Using LINQ to XML to Retrieve the Contents of Content Controls
Using DocumentBuilder with Content Controls for Document Assembly
Creating Data-Bound Content Controls using the Open XML SDK and LINQ to XML
"I highly recommend it as a source for learning Open XML and Word development. Eric is among the most knowledgeable about development using Open XML."
I completely agree on the merits of Eric's blog in terms of Word-documents - but do you guys also happen to have a "go-to-blog" for content about Open XML in Spreadsheets?
As you are aware, http://www.openxmldeveloper.org is a pretty useful resource, as is the Excel team blog. They can help with any questions you might have.