A few weeks ago we posted a blog entry titled "How to parse the .doc file format". Today's blog post will show you how to use that information to check whether a .doc file is specially crafted to exploit MS08-042, one of the vulnerabilities addressed by today's security updates. This particular vulnerability is being exploited out in the real world so we believe the benefits of releasing more information about it to help the defenders outweighs the risk of attackers learning more about the already-public vulnerability.  So today we are releasing information about this vulnerability and talking about a few tools that will help you analyze documents.

Defragment the .doc file
Wait – defragging is just for file systems, right? Well, if you remember our discussion of the compound binary format, it can be described as a file system within a file. One of the properties of the compound binary format is that the streams can be fragmented in the file. Last time we talked about using APIs to extract the streams of interest from the .doc file. This time we’ll talk about how to read the .doc file itself and detect attempts to exploit the vulnerability. Our technique will only work on a file that is not fragmented. You can learn about how fragmentation works in these files by reading the compound binary format specification.

A quick note about hex editors
While you can use any hex editor you like to do this analysis, for our examples we’re going to use the 010 Editor v3 from SweetScape Software. We chose this editor for its Binary Template functionality, which is well suited to detecting potential exploits of this vulnerability.

Reading the WordDocument stream
Once you’ve defragmented the .doc file you want to analyze, open up the WordDocument stream, look at the 16 bit value at offset 0xA, and examine bit number 10 (AND it with 0x200). If that bit is 0, then we will be working with the 0Table stream, and the 1Table stream otherwise.

Next, look at the 2 DWORDs that start at WordDocument stream offset 0x42A. The first is an offset into the xTable stream, the second is a length. If the length is 0, then the structure we are interested in does not exist in this file, and so it is not crafted to exploit MS08-042. Otherwise, read on to find out we use the offset to determine if the file is malicious.

To make things easier, I’ll use examples from our 010 binary template for this next section. You can find the entire template attached to this blog entry.  In our 010 binary template, after the portion that parses the compound binary format, we loop through the streams looking for one that is named WordDocument. Once we find it, we make note of which xTable stream to use, as well as the offset and length we just discussed:

// Loop through all of the streams looking for the ones we are interested in
for( dwCtr = 0; exists( stDir[dwCtr] ); ++dwCtr )
{
   // Memcmp doesn't compare arbitrary memory, just strings...
   if(    stDir[dwCtr].wCbEleName     ==  14
       && stDir[dwCtr].strEleName[ 0] ==  48   // 0
       && stDir[dwCtr].strEleName[ 1] ==  84   // T
       && stDir[dwCtr].strEleName[ 2] ==  97   // a
       && stDir[dwCtr].strEleName[ 3] ==  98   // b
       && stDir[dwCtr].strEleName[ 4] == 108   // l
       && stDir[dwCtr].strEleName[ 5] == 101   // e
       && stDir[dwCtr].strEleName[ 6] ==   0 )
   {
      dwTableStream[0] = dwCtr;
      dwTableStrLen[0] = stDir[dwCtr].dwSizeLow;

      if( dwTableStrLen[0] < OleHeader.dwMiniStrMax )
      {
         qwTableStrOffset[0] = MiniSectOffset( stDir[dwTableStream[0]].dwStartSect );
      }
      else
      {
         qwTableStrOffset[0] = SectOffset( stDir[dwTableStream[0]].dwStartSect );
      }
}

// Determine which table stream to use
FSeek( qwWordStrOffset + 0xA );
WORD Flags;
local uint iWhichTableStream = (Flags & 0x200) >> 9;

// Get the offset into the Table Stream and structure length from the WordDocument stream 
FSeek( qwWordStrOffset + 0x42A );
DWORD fcSttbfBkmkFactoid;
DWORD lcbSttbfBkmkFactoid;

if( lcbSttbfBkmkFactoid == 0 )
{
   Printf( "Clean document.\n" );
   Printf( "This document doesn't contain any smart tags.\n" );
   Warning( "Clean document." );
   Exit( 0 );
}

And finally, reading the xTable stream
If we’ve gotten this far, we have a properly parsed .doc and we’ve determined that there is a smart tag structure in the xTable stream. Now we need to examine the structure in the table stream, so we’ll define the structure in our binary template. You can see which value we’re testing to detect potential exploits as well:

typedef struct
{
   WORD wWordCount;
   byte bOtherData[12];
   if( wWordCount < 6 )
   {
      Printf( "This document is malicious!\n" );
      Warning( "This document is malicious!" );
      Exit( 2 );
   }
} SMARTTAG ;

typedef struct
{
   WORD wAlwaysFFFF;
   WORD wNumSmartTags;
   WORD wExtraData;
} SMARTTAGSECTION ;

Finally, we need to actually read in the SMARTTAGSECTION structure, and then all of the SMARTTAG structures it tells us about:

FSeek( qwTableStrOffset[iWhichTableStream] + fcSttbfBkmkFactoid );
SMARTTAGSECTION stSmartTagSection;
for( dwCtr = 0; dwCtr < stSmartTagSection.wNumSmartTags; ++dwCtr )
{
   SMARTTAG stSmartTag;
   FSeek( startof( stSmartTag ) + ( stSmartTag.wWordCount + 1 ) * sizeof( WORD ) );
}

Having a smart tag with WordCount < 6 sets up the condition for the vulnerability to occur, but it won’t actually be triggered unless there is some other corruption in the document which causes Word to abort the loading of the document. Since a benign .doc should never have a smart tag with WordCount < 6, we are done with our detection logic!  You'll find this logic encapsulated in the attached binary template.

Of course, being able to detect attempts to exploit a vulnerability is no substitute for installing the security update! We strongly encourage you to install MS08-042 as soon as possible.

- Security Vulnerability Research & Defense Bloggers
*Postings are provided "AS IS" with no warranties, and confers no rights.*