​In Windows, the “hosts” file (located in “%SystemRoot%\System32\drivers\etc” directory by default) is often used by malware authors when hijacking websites. The local Hosts file overrides the DNS resolution of a website URL to a particular IP address. Malware authors make changes to affected users’ Hosts files to redirect specified URLs to different IP addresses of the author’s choice. In August last year, I blogged about malware authors using Unicode characters in the hosts file filename, in order to trick users and hide the real hosts file. However, it seems that malware writers never stop doing their malicious work. This time, they’re using another trick to mislead people.

Several days ago, one of my friends wanted to buy something from Taobao, which is one of the most popular online trading platforms in China. When he opened the website by typing its URL “http://www.taobao.com” in the address bar of web browser, he found the URL changed to “http://www.taobao.com.cn” automatically, with some strings embedded in the URL, looking like an identifier, as the following example.

He has a little rough security knowledge, and thought this might be an attempted website hijacking. So he opened the hosts file using notepad. But to his surprise, the file seemed to be filled with garbage, as you can see below.

He couldn’t understand this, because he thought that the hosts file was just a text file, and that he could easily remove the website hijacking by deleting the corresponding entries in the hosts file. So he asked me.
At first, I just wanted to see what the real content of this hosts file was. So I opened it with a hex editor.

When I saw the BOM character (0xFEFF) at the beginning of the file and the ASCII text following it, I realized what it was. This hosts file is just an ASCII text file, but with a Unicode file marker at the beginning of the file, which misleads a Unicode aware text editor, such as notepad, into treating it as a Unicode text file. In the middle of this big hosts file, we can see the entry hijacking www.taobao.com.

But now the question is, how was this malicious Hosts file being interpreted? To figure out this question, I used Process Monitor with the following filters to identify which process in the system interprets the hosts file and uses it.

I made some minor modifications to the hosts file, saved it using notepad, and captured the whole process. After that, using Process Monitor’s stack function, I discovered that the hosts file is interpreted by the “DNS Client” service.

From the picture above, we can see that the “DNS Client” service (dnsrslvr.dll) calls the HostsFile_ReadLine function of dnsapi.dll to get the line from the hosts file, which in turn calls the fgets function of msvcrt.dll to do the real work of getting a line from the hosts file. The function fgets in the CRT library only supports ASCII files. Using this function to read the file means the system only supports hosts files in ASCII format, not Unicode format. The following is a part of a flowchart showing the HostsFile_ReadLine function.

We can easily get the logical process for the hosts file from this picture. The system accepts the hosts file as an ASCII file and tries to get records from it. If any invalid record is found, it just drops the record, and continues to process the next record.

Now we can start to understand the whole trick being used by this hosts file. The first line of this file (the characters before the first CRLF) is useless for the system, and will be dropped when building the hosts file records. The rest of this file will be interpreted correctly by the system, as these records are valid, and these websites will be hijacked/diverted from the affected computer. But the first line will mislead Unicode aware editors, such as notepad, and render the text in an incorrect manner, which in turn prevents users from seeing what’s really going on.

In this sample, the malicious server redirects hijacked websites to a Taobao advertisement website. The website itself is legal, and is similar to Google AdWords. Presumably the author will get illegitimate income when people search using the website. This is a very popular way for malware authors in China to get gray income (and may not be viewed quite as severely as other types of more obviously illegal activity).

It’s a fairly straight-forward procedure to create a clean hosts file if you think yours has been corrupted in this way. Have a look at this KB article for full instructions.

When we “see” a file is filled with garbage, is it really useless? Can we believe our eyes? The answer is... not always.

Zhitao Zhou
Microsoft Malware Protection Center