Our guiding vision at the Microsoft Malware Protection Center (MMPC) is to keep every customer safe from malware. Our research team and machine learning systems, as well as industry engagement teams, function around the clock in an effort to achieve this vision.

As part of these efforts, we are also working with independent antimalware testing organizations towards advancing the relevance of independent testing and reporting. Our goal is to help enable independent antimalware testing organizations to test using malware that has significant customer impact. We have come a long way together, and we can still make significant advances to on-demand file-detection tests.

Current on-demand file-detection tests have some limits. They are typically carried out by first assembling a set of malware samples, and then scanning them with antimalware products. The samples in the testing set that aren’t detected by the products are counted, and then their percentage is calculated. Finally, the undetected percentage is compared to other products to calculate the comparative test results. Some testers use prevalence data to choose their sample set, and some apply curves to the results, but ultimately the fundamental test scheme is the same across the board.

One major issue with the above methodology is that there is no differentiation between samples in the test set. While each sample in the test set has a different impact on customers, in the above methodology, they are weighted equally. This methodology has been of concern to us, as it doesn’t take into account the prevalence-based customer impact.

To evolve antimalware test methodologies, this problem can be solved by weighting these samples according to their customer impact – that is, how often a particular malware sample is encountered by customers. The first step is to apply a weighting based on each specific sample’s prevalence; if the sample has impacted a large number of customers, then it will have a relatively large weight. If it’s impacted relatively few customers, then it will have a smaller weight. However, this approach isn’t quite enough.

Different malware families have different behaviors. For example, some malware families use polymorphism: they change their files with every infection, causing many samples within that family to be relatively low prevalence. In this case if the malware family has a high prevalence, but each sample has a low prevalence, then without a family weight these samples are lost in the mix. To address this, a family weight should be included in addition to the specific sample prevalence weight.

After applying the weights described above, it is possible to generate a risk factor that describes how much risk a customer faces depending on which antimalware product they use when exposed to samples in the test set. On top of that, using geographical sample weights and family weights allows for a geographical risk breakout.

This kind of prevalence-weighted test is a game changer. Shifting to a weighted approach will help customers and antimalware vendors understand how their products perform in the real-world, based on real malware prevalence and impact.

There are a few caveats to such a test. The most significant is the prevalence data itself. Where would this prevalence data come from and how would it be validated? Ideally the data can be generated and composed by an antimalware industry collaboration. The MMPC is contributing to this data and is working with other independent testers to validate it.

With the participation of the MMPC and other antimalware vendor collaborators it is possible to produce the best and most meaningful set of on-demand test results yet. This is the next step in our continued journey with independent antimalware testers to drive more relevancy into testing.

Joe Blackbird
MMPC