Okay, maybe it only causes me consternation, but this is exactly the sort of thing that raises my temperature.  With the academic background of Coverity founders, one should expect a certain amount of rigor and care when it comes to analysis and conclusions, but I find myself disappointed.

Jeff, you say, what are you talking about!?!?

It’s been a while now, but you may recall a headline similar to this one, Security research suggests Linux has fewer flaws, or this one, Study: MySQL Hard on Defects.  I read these headlines and think - this is exciting, this is good stuff!  Let’s dig into these, sit at the feet of Coverity as security Padawan and learn what we can learn.

Security Research Suggests Linux has Fewer Flaws

Lets look at the first article.  It refers to a report published by Coverity called “Analysis of the Linux Kernel” that documents the results of running the Coverity source code analysis tool on various versions of the Linux kernel over four years.  (You can download this report yourself at http://linuxbugs.coverity.com, if you are willing to register with them)    Back to the article, here is a quotation from the opening paragraph:

The project found 985 bugs in the 5.7 million lines of code that make up the latest version of the Linux core operating system, or kernel. A typical commercial program of similar size usually has more than 5,000 flaws or defects, according to data from Carnegie Mellon University.

So here’s the logic (so called) then:

1.       If a normal commercial program of similar size has 5,000 flaws

2.       But, the Linux kernel has only 985 flaws, then

3.       (obviously) the Linux kernel has fewer flaw (985 < 5000 = TRUE)

So, Jeff, what’s your problem then?!?!  My problem is the second statement, which may or may not be true but should more properly be stated here as “Coverity only found 985 flaws.”  For the Linux kernel to have fewer flaws than standard commercial software, we need this to be true:  Coverity_flaws + Other_flaws < 5000.  Essentially, the article assumes the there are no flaws in the Linux kernel other than those found by Coverity (ie, Other_flaws = 0).  Call me skeptical at this point, but I think there are some checks we can make to find out of Other_flaws is truly zero.

You might be thinking, well Jeff, it isn’t Coverity’s fault if the reporter is making this assumption.  They probably have a nicely scientific and accurate report that the writer “spun” a bit for the story.  Hmmm, could be, but let’s examine the quotations by Coverity CEO Seth Hallem:

"Linux is a very good system in terms of bug density," said Seth Hallem, CEO of Coverity.

And

Hallem stressed that the research on Linux--specifically, version 2.6 of the kernel--indicated that the open-source development process produced a secure operating system.

"There are other public reports that describe the bug density of Windows, and I would say that Linux is comparable or better than Windows," he said.

Ouch, this just hurts me.  Here is a test.  Let’s say I run the Coverity tool on some piece of code with 1 million lines of code and it find 100 flaws.  What conclusions can you draw concerning bug density of the code?  Me … I can draw … no conclusions.  Well, none without making some huge assumptions.  If I assume the Coverity tool finds all existing flaws, then yes, I could talk to you about bug density. 

It gets better.  Based upon the Coverity research (ie, running Coverity tools on different Linux kernels over time), Hallem is able to say definitively that the 2.6 kernel is a secure operating system.  This is great for all of us and takes such a burden off my mind.  I think this means if we apply the Coverity learnings, we can probably all produce secure products.

Finally, in a firm scientific comparison with some “other public reports” (unreferenced), Hallen can conclude that Linux comparable or better than Windows.

So, it seems to me that Coverity may not be entirely blameless for the leaps of imagination taken with the report.

Study: MySQL Hard on Defects

Is the MySQL study any different in terms of assumptions and analysis.  Not in my opinion.  I downloaded the study to read and found this paragraph in the executive summary:

An analysis of the source code for the MySQL database has revealed that the code is very good quality.  The results show that the number of defects detected by the Coverity analysis system is low.  In fact, the analysis found results that are at least four time better than is typical with commercial software, even before MySQL fixed the defects that Coverity found.

Same flawed reasoning as before.  First statement not supportable from the analysis performed alone.  Second statement accurate, but not actually a proofpoint for statement 1.  Statement 3 may be technically accurate in that the number of Coverity found flaws is one-quarter of typical commercial software, but misleading unless Coverity found all flaws.

Incidentally, I did a very rough analysis of the MySQL study itself and found the following breakdown:

·         50% of content was related to the MySQL analysis

·         50% of content was Marketing content and value/benefit material concerning Coverity tools

I will leave any conclusions to the reader.

Does Coverity Find all Flaws?

So, I come to the final stretch of my Ironic Diatribe on Coverity Analysis, with the final question that answers all other questions: do Coverity tools find all flaws?  If they do, then the comparisons with “typical commercial software of similar size” are valid and I’ll bow my head and salute them in respect.  If not … then I’ll keep a keen eye out for future studies and try to get the reporter to quote my thoughts.

First, let’s see what Coverity says about their tools’ ability to find all flaws.  In the MySQL study, they say this:

Although the Coverity InspectedTM mark means that software quality is high, it does not mean that the software will be defect-free.  Notwithstanding the foregoing statement, the Coverity Inspected logo does not imply any warranty or guaranteee as to the performance of the software in the deployment environment.  Many defect types fall outside the scope of Coverity analysis.

Or, in other words, Other_flaws != Zero.  There you have it, Coverity says they don’t find all flaws and that MANY types of defects fall out of scope.

Let’s do another check, just to be sure.  Coverity analyzed MySQL 4.1.8.  Maybe Coverity does find all security flaws, if not all flaws?  We pop over the nvd.nist.gov, search for MySQL, scan down …. Hmmm.  There is CVE-2006-1517 which affects 4.1.8.  So does CVE-2006-1516.  CVE-2006-0903, maybe CVE-2006-0692, CVE-2005-2573, CVE-2005-1636, … and so on.  So, check #2 also finds that there are security flaws that don’t get found by Coverity, so their numbers should not be compared to “all typical flaws in similar size commercial code.”

My Final Word on Coverity and Source Code Scanning

I like Source Code Scanning technology.  They are really good for automated searches for certain classes of coding problems that can lead to security vulnerabilities.  At Microsoft under the Security Development Lifecycle, we require the use of Source Code Scanners on all code to be check into tree, and it does, no doubt, help to find lots of common coding flaws that would otherwise make it into products.  I quote Jon Pincus on three questions related to source code scanners:

1.       Do these tools find important defects?  Yes.

2.       Is every warning emitted by the tool useful? No.

3.       Do these tools find all defects? NO, No, no.

For this reason, Microsoft uses source code scanning in conjunction with AppVerifier, the Source Code Annotation Language, code reviews, threat modeling, QA testing, FxCop and much, much more, all in addition to an extensive training program based upon “Writing Secure Code”.

I think Coverity probably produces a fine tool and it would benefit many software vendors if they made it, or a similar product, a part of their development lifecycle.  However, it is what it is, and I am skeptical of reports that assume one can compare Coverity flaws found with … well, anything.  Especially, if the comparison is intended to show something is more secure.

Regards ~ Jeff