Recently I had a moment to review a group of PDF exploit files. Many exploits use various tricks to obfuscate embedded JavaScript. I thought I could de-obfuscate the samples by throwing them into a sandbox environment and enjoying the beautified source code, but these samples required a different method to coax the legible code into view.

In these examples, which come from Exploit:Win32/Pdfjsc.NJ (SHA1 45d04db8617a85f5359fb1a33ad867ef3d43eb7f), the files contained JavaScript that was embedded into an XFA form that allowed Adobe Reader to run the code when the file is opened by setting an event handler for field initialization. This trick is not new, but it is relatively easy to extract the malicious JavaScript code, as visible in this snippet:

extracted JavaScript
Image 1 – extracted JavaScript

The embedded code contains many useless variable assignments and arithmetic operations. After reviewing the block of code, nothing interesting was readily identifiable; most of the code resembled the image shown below:

arithmetic assignments
Image 2 – arithmetic assignments

It is quite different than the obfuscation seen before in other samples. By carefully examining the code, several obfuscations that utilize the reverse process of code optimization are identified.  Look at this piece of code:

dead code
Image 3 – dead code

The local variable ‘dv’ is assigned a value and has a ‘minus’ operation. But after that, the variable is never accessed in its life scope, which means these two lines are dead code that can be removed without affecting the result of the program. Besides this, fake conditions are added in many places to make the code more confusing:

meaningless code
Image 4 – other meaningless code

The variable ‘dc’ is constant when compared to the constant value ‘7266’; the code block inside the ‘true’ branch of the ‘If’ statement will never be executed, which makes the whole ‘If’ statement useless. After removing all of this dead code, other interesting things begin to show up and all of the strings are encoded in two ways:

decoding algorithm
decoding algorithm
Image 5 – decoding algorithm

A function named ‘kop’ is used all in several places to decode the strings using an algorithm. The above example will simply set variable ‘g’ to the value ‘substr’. All external functions are used in the following way so that the strings must decode first before knowing what the function is called.

Therefore, this:

before substituting variables

Is equivalent to this:

after substitution of variables

At this point, based on the decoder algorithm found in function ‘kop’, all the decoded strings can be replaced to its original string and then all JavaScript function calls can be revealed. After the de-obfuscation, the code looks clearer. By examining what each function was doing and renaming the variables and strings accordingly, I derived the following copy of the main function:

de-obfuscated main function
Image 8 – de-obfuscated main function

The exploit checks the PDF reader’s version to select exploit shellcode accordingly, then performs a heap spray to try and exploit a vulnerability discussed in CVE-2010-0188. The malformed TIFF data is concatenated as a base64 string and assigned to ‘rawValue’ of XFA field ‘ska’. After the decoding, the TIFF data is shown as following for PDF reader versions between 8.0 to 8.2.1:

TIFF data as viewed in a hex viewer
Image 9 – TIFF data as viewed in a hex viewer

The shellcode simply downloads and executes additional malware from a remote server. At the time of this writing, the malware was detected as TrojanDownloader:Win32/Epldr.A.

We can see that malware authors are taking obfuscation more seriously to try and evade security software. Although this exploit uses complex obfuscation methods to avoid being detected and analyzed (which makes it more advanced than other exploits) the technique used here is not new.

As always, be safe and use up-to-date security software.

 

-- Shawn Wang, MMPC