Hi all,

I’m Lars Opstad, an engineering manager in the MSEC Science group supporting the SDL within Microsoft. I wanted to share with you some of the ways that we are improving our internal security practices, specifically in the area of file fuzzing.

Many fuzzers take a good file (template) as a starting point for creating malformed content. For years, dating back to Writing Secure Code in 2002, the SDL has encouraged people to fuzz their file parsers with a “good representative set” of template files. The question is “how do I tell if I have a good set?” At Microsoft, our answer is code coverage, which is in line with other external research (Guruswami, Miller, and Sutton, Greene and Amini to cite only a few).

Code Coverage Definition

Before getting into how we use code coverage for fuzzing, I should briefly define the terms. If you are well versed in these concepts, feel free to skip to “Using Coverage for Better Fuzzing.”

Code coverage is a measure of how well a set of code is exercised based on a set of tests or inputs. The simplest scheme of code coverage is at the function level. Functional coverage measures which functions were called and which functions were not called. While this is useful at a very broad level, it does not really provide the granularity necessary to measure meaningful coverage for a file parser. A better metric is “block” coverage. A basic block in coverage terms is a section of code that always executes in sequence with no jumps in or out. Take the following code sample: 

x+=1;                                     // block 1
y+=2;                                    
if(

sin(x)

                <                              // block 2 (although executes after block 3)

                sin(y)    )              // block 3

{                                              // block 4
                y+=1;
                result=true;
}

else                                       // block 5
{
                result=false;
}

x+=1;                                     // block 6
return result;

For the above example, there are two main coverage cases depending on whether sin(x)<sin(y) or not. If it is, the blocks covered will be (in order of execution) 1, 3, 2, 4, 6 (not 5). If not, the blocks covered will be 1, 3, 2, 5, 6 (not 4). For full coverage, both those cases are necessary.

There are additional coverage concepts that can be useful in more detailed analysis, but for the purpose of this post, that lays a good foundation.

There are many free and commercially available coverage tools (including one built into Visual Studio, described here) that will give you this type of information. The focus of this post is not on any one tool, but using the measurements to improve file fuzzing effectiveness.

Using Coverage for Better Fuzzing

Our first goal with using coverage for fuzzing was simply to evaluate the overall completeness of a template set. This was achieved by measuring each template running through the application and calculating the overall block coverage of the parser code. The most important part of this exercise was performing the gap analysis when coverage was too low (e.g. less than 60%). Through the gap analysis, the team could then find or build additional templates that exercised the remaining code. For example, with a standard RIFF format, it is possible that one type of tag was missing from all the templates. With that knowledge, the owner of that component could easily find or generate a file that contains the missing record type and increase the coverage appropriately.

After doing some work in this area, we realized that many template files covered almost exactly the same blocks of code. Fuzzing one of these templates is roughly equivalent with fuzzing any of the other similar templates. We wanted to be able to increase our odds of hitting the less frequently used parts of code, under the basic assumption that bugs are much more common in the less tested parts of the code. Thus, we wanted to reduce duplication and choose the fewest number of templates while still providing equivalent coverage relative to our full original set.

Technical Details of Coverage Based Template Optimization

The algorithm we used to choose the optimized set of templates is known as the Greedy Algorithm, and operates under the premise of choosing the best fit, then the next best fit and so on. In this particular case, here is the pseudo-code for our algorithm (after having gathered coverage for each template): 

while (FullTemplateList.count>0)

{

Template candidate=null;

foreach (Template t in FullTemplateList)

{

                                if (candidate==null || t.BlocksCovered>candidate.BlocksCovered)

                                {

                                                candidate=t;

                                }

}

FullTemplateList.Remove(candidate);

OptimalTemplateList.Add(candidate);

 

// Exclude blocks from candidate template from coverage of other templates

foreach (Template t in FullTemplateList)

{

                t.CoverageMask = t.CoverageMask & ~candidate.CoverageMask;

                t.BlocksCovered=t.CoverageMask.BlocksSet();

                if (t.BlocksCovered == 0)

                {

                                FullTemplateList.Remove(t);

                }

}

}

 

To illustrate this algorithm, assume you have measured the coverage of seven input files that have given you the following masks:

 

FullTemplateList:

 

Template A

 

 

 

 

 

 

 

 

 

 

 

Template B

 

 

 

 

 

 

 

 

 

 

 

Template C

 

 

 

 

 

 

 

 

 

 

 

Template D

 

 

 

 

 

 

 

 

 

 

 

Template E

 

 

 

 

 

 

 

 

 

 

 

Template F

 

 

 

 

 

 

 

 

 

 

 

Template G

 

 

 

 

 

 

 

 

 

 

 

The best choice from this list is Template B, with six blocks covered. After one iteration of the loop, the lists would look like (gray blocks indicating blocks covered by the original template that already have coverage in the OptimalTemplateList):

 

OptimalTemplateList:

Template B

 

 

 

 

 

 

 

 

 

 

 

FullTemplateList (remaining):

Template A

 

 

 

 

 

 

 

 

 

 

 

Template C

 

 

 

 

 

 

 

 

 

 

 

Template D

 

 

 

 

 

 

 

 

 

 

 

Template F

 

 

 

 

 

 

 

 

 

 

 

Removed:

Template E

 

 

 

 

 

 

 

 

 

 

 

Template G

 

 

 

 

 

 

 

 

 

 

 

Ultimately, this would result in four templates in the OptimalTemplateList:

Template B

 

 

 

 

 

 

 

 

 

 

 

Template C

 

 

 

 

 

 

 

 

 

 

 

Template A

 

 

 

 

 

 

 

 

 

 

 

Template D

 

 

 

 

 

 

 

 

 

 

 

Results from Template Optimization

Coverage-based template optimization was first used at Microsoft during the development of Windows Vista. Applying the algorithm above yielded a significant reduction in the number of templates necessary to cover the file parser. In examining 90,000 JPG files parsed by MSHTML.DLL (used in Internet Explorer), we identified <100 files that gave the same coverage as all 90,000. Due to this, finding issues near blocks of code covered only by one template increased in odds by a factor of 900.

While investigating one MSRC case, Gavin Thomas from MSRC Engineering used template optimization to see how effective a small, optimized set of templates were in comparison to a normal, large set of templates. For this test, he used two different mutation fuzzers and ran them for 500,000 iterations each. The issue count was nearly twice for the optimized set for either fuzzer:

Fuzzing results using optimized vs. non-optimized templates

Conclusion

Code coverage is only one approach to improving the fuzzing process. While it does not guarantee that you will find all of the bugs in your product, it increases the probability that your fuzzer will reach the less frequently exercised parts of your code because you reduce the time spent exercising more common blocks of code. MSEC encourages software development teams to use this technique to maximize the efficacy of their fuzz testing.

Thanks to Adel Abouchaev and Michael Levin for refinements in the algorithm, Gavin Thomas and Andy Renk for experiments verifying effectiveness, Michael Howard for historical perspective, and Damian Hasse and Matt Miller for technical review.

Thanks,

Lars Opstad, MSEC Science