Recently at CanSecWest 2012, we presented on the technology we use for analyzing malicious samples and PoC files. As malware often actively attempts to exploit software vulnerabilities these days, understanding the internals of these vulnerabilities is essential when writing defense logic.

Out of the many methods that can be used for vulnerability analysis, we presented a method that uses dynamic binary instrumentation and data flow analysis. Dynamic binary instrumentation and data flow analysis are fancy concepts, and they can be a little bit difficult to apply to real world cases.

We showed a case where we used data flow analysis for a simple integer overflow vulnerability. By showing the result in a more visualized way, it helped us to understand the vulnerability. But the real issue we raised was how to use these technologies in more complicated cases, for example, for analyzing an uninitialized memory access vulnerability. We used CVE-2011-2462 (a vulnerability in Adobe Reader and Acrobat - this issue was addressed by Adobe and you can find more information here) as an example to show how to trace back to the root cause of the vulnerability using these techniques. (Note: the Adobe Reader X Protected Mode and Acrobat X Protected View mitigations (the Reader X and Acrobat X sandboxes) would prevent exploits of this vulnerability from executing – this is an exercise in analyzing a vulnerability not an exploit.)

The vulnerability is a little bit complicated, as the data flow does not show the whole picture of the connection between the user data and the crash point.

Data flow analysis for crash case

Figure 1 Data flow analysis for crash case

We performed data flow analysis on the data related to the crash point. As you can see from the above picture, we can clearly see that the data source used in the crash point comes from an area of freed memory. As the execution order is from bottom to top, the free operation is performed first - the data is passed to Adobe Reader and is used for operations later which leads to an uninitialized memory issue.

Data flow analysis for normal case

Figure 2 Data flow analysis for normal case

The above data flow graph is from a good sample file which hits the same area of the code as the crash case. But in this case, we can see that the data comes from an allocated area using malloc API.

Crash case and normal case

Figure 3 Crash case and normal case

By performing data differential analysis between the crash case and the normal case, we can pinpoint the exact instruction that is responsible for the diversion of data flow. The following table shows the difference in the instruction that makes the data flow diversion and you can see that "mov dword ptr [ecx+ebx*4h], eax" is the key instruction that makes the difference.

Crash case and normal case

Figure 4 Crash case and normal case

So we start control flow differential analysis from that specific key instruction.

Key instruction from data flow differential analysis

Figure 5 Key instruction from data flow differential analysis

The following graph shows the control flow differential analysis result.

Control flow differential analysis result

Figure 6 Control flow differential analysis result 

From the graph above, we can see that the instruction at 10009E72 basic block (in red) is the instruction that determines the fate of the control flow. The control flow depends on the value of eax register; it is key to creating the crash condition.

We traced back this eax value from that instruction point in the crash case, and got the following graph. Finally we could locate the exact file location where the eax comes from. And this eax value controls the condition for the crash later.

EAX Control

Figure 7 EAX Control

So the whole point of this post is that data flow analysis is a good tool for vulnerability analysis, but it doesn't solve all the real world vulnerability cases. Real world vulnerabilities are more complicated. So to apply this technology, you need to introduce more strategies and methods. We showed data flow differential analysis and control flow differential analysis as examples that could solve an uninitialized memory access case.

For the full content of the presentation, please visit this page. It should be available soon.

Jeong Wook Oh