Security Research & Defense

Information from Microsoft about vulnerabilities, mitigations and workarounds, active attacks, security research, tools and guidance

November, 2013

  • Introducing Enhanced Mitigation Experience Toolkit (EMET) 4.1

    In June 2013, we released EMET 4.0 and customer response has been fantastic. Many customers across the world now include EMET as part of their defense-in-depth strategy and appreciate how EMET helps businesses prevent attackers from gaining access to computers systems. Today, we’re releasing a new version, EMET 4.1, with updates that simplify configuration and accelerate deployment.

    EMET anticipates the most common techniques adversaries might use and shields computer systems against those security threats. EMET uses security mitigation technologies such as Data Execution Prevention (DEP), Mandatory Address Space Layout Randomization (ASLR), Structured Exception Handler Overwrite Protection (SEHOP), Export Address Table Access Filtering (EAF), Anti-ROP, and SSL/TLS Certificate Trust Pinning, to help protect computer systems from new or undiscovered threats. EMET can also protect legacy applications or third party line of business applications where you do not have access to the source code.

    Today’s EMET 4.1 release includes new functionality and updates, such as:

    • Updated default protection profiles, Certificate Trust rules, and Group Policy Object configuration.
    • Shared remote desktop environments are now supported on Windows servers where EMET is installed.
    • Improved Windows Event logging mechanism allows for more accurate reporting in multi-user scenarios.
    • Several application-compatibility enhancements and mitigation false positive reporting.

    EMET built by Microsoft Security Research Center (MSRC) engineering team, brings the latest in security science to your organization. While many EMET users exchange feedback and ideas at TechNet user forums, a less known fact is that Microsoft Premier Support options are also available for businesses that deploy EMET within their enterprise. Many of our customers deploy EMET - at scale - through the Microsoft System Center Configuration manager and apply enterprise application, user and accounts rules through Group Policy. EMET works well with the tools and support options our customers know and use today.

    As we continue to advance EMET, we welcome your feedback on what you like and what additional features would help in protecting your business. If you are attending RSA Conference at San Francisco, or the Blackhat Conference in Las Vegas next year, be sure to stop by the Microsoft booth, and share your feedback with us. We look forward to hearing from you.   

     

    The EMET Team

  • Security Advisory 2868725: Recommendation to disable RC4

    In light of recent research into practical attacks on biases in the RC4 stream cipher, Microsoft is recommending that customers enable TLS1.2 in their services and take steps to retire and deprecate RC4 as used in their TLS implementations. Microsoft recommends TLS1.2 with AES-GCM as a more secure alternative which will provide similar performance.

    Background

    Developed in 1987 by Ron Rivest, RC4 was one of the earliest stream ciphers to see broad use. It was initially used in commercial applications and was faster than alternatives when implemented in software and over time became pervasive because of how cheap, fast and easy it was to implement and use.

    Stream vs. Block

    At a high level, a stream cipher generates a pseudorandom stream of bits of the same length as the plaintext and then XOR's the pseudorandom stream and the plaintext to generate the cipher text. This is different than a block cipher, which chunks plaintext into separate blocks, pads the plaintext to the block size and encrypts the blocks.

    A History of Issues

    RC4 consists of a Key Scheduling Algorithm (KSA) which feeds into a Psuedo-Random Generator (PRG), both of which need to be robust for use of the cipher to be considered secure. Beyond implementation issues with RC4, such as, document encryption and the 802.11 WEP implementation, there are some significant issues that exist in the KSA which lead to issues in the leading bytes of PRG output.

    By definition, a PRG is only secure if the output is indistinguishable from a stream of random data. In 2001, Mantin and Shamir < http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.6198 > found a significant bias in RC4 output, specifically that the second byte of output would be ‘0’. Attacks and research have evolved since 2001, the work of T. Isobe, T. Ohigashi, Y. Watanabe, M. Morii of Kobe University in Japan is especially significant when evaluating the risk of RC4 use. Their findings show additional, significant bias in the first 257 bytes of RC4 output as well as practical plaintext recovery attacks on RC4.

    The plaintext recovery attacks show a passive attacker collecting ciphertexts encrypted with different keys. Given 2^32 ciphertexts with different keys, the first 257 bytes of the plaintext are recovered with a probability of more than .5 < http://home.hiroshima- u.ac.jp/ohigashi/rc4/Full_Plaintext_Recovery%20Attack_on%20Broadcast_RC4_pre-proceedings.pdf >.

    Since early RC4 output cannot be discarded from SSL/TLS implementations without protocol-level changes, this attack demonstrates the practicality of attacks against RC4 in common implementations.

    Internet Use of RC4

    One of the first steps in evaluating the customer impact of new security research and understanding the risks involved has to do with evaluating the state of public and customer environments. Using a sample size of five million sites, we found that 58% of sites do not use RC4, while approximately 43% do. Of the 43% that utilize RC4, only 3.9% require its use. Therefore disabling RC4 by default has the potential to decrease the use of RC4 by over almost forty percent.

    Microsoft's Response

    Today's update provides tools for customers to test and disable RC4. The launch of Internet Explorer 11 (IE 11) and Windows 8.1 provide more secure defaults for customers out of the box.

    IE 11 enables TLS1.2 by default and no longer uses RC4-based cipher suites during the >TLS handshake.

    More detailed information about these changes can be found in the IE 11 blog <http://blogs.msdn.com/b/ie/archive/2013/11/12/ie11-automatically-makes-over-40-of-the-web-more-secure-while-making-sure-sites-continue-to-work.aspx>

    For application developers, we have implemented additional options in SChannel which allow for its use without RC4.

    Today's Updates

    Today's update KB 2868725provides support for the Windows 8.1 RC4 changes on Windows 7, Windows 8, Windows RT, Server 2008 R2, and Server 2012. These updates will not change existing settings and customers must implement changes (which are detailed below) to help secure their environments against weaknesses in RC4.

    Call to Action

    Microsoft strongly encourages customers to evaluate, test and implement the options for disabling RC4 below to increase the security of clients, servers and applications. Microsoft recommends enabling TLS1.2 and AES-GCM. Clients and servers running on Windows with custom SSL/TLS implementations, such as, Mozilla Firefox and Google Chrome will not be affected by changes to SChannel.

    How to Completely Disable RC4

    Clients and Servers that do not wish to use RC4 ciphersuites, regardless of the other party's supported ciphers, can disable the use of RC4 cipher suites completely by setting the following registry keys. In this manner any server or client that is talking to a client or server that must use RC4, can prevent a connection from happening. Clients that deploy this setting will not be able to connect to sites that require RC4 while servers that deploy this setting will not be able to service clients that must use RC4.

    • [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers\RC4 128/128]
      • "Enabled"=dword:00000000
    • [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers\RC4 40/128]
      • "Enabled"=dword:00000000
    • [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Ciphers\RC4 56/128]
      • "Enabled"=dword:00000000

    How Other Applications Can Prevent the Use of RC4 based Cipher Suites

    RC4 is not turned off by default for all applications. Applications that call into SChannel directly will continue to use RC4 unless they opt-in to the security options. Applications that use SChannel can block the use of RC4 cipher suites for their connections by passing the SCH_USE_STRONG_CRYPTO flag to SChannel in the SCHANNEL_CRED structure. If compatibility needs to be maintained, then they can also implement a fallback that does not pass this flag.

    Microsoft recommends that customers upgrade to TLS1.2 and utilize AES-GCM. On modern hardware AES-GCM has similar performance characteristics and is a much more secure alternative to RC4.

    - William Peteroy, MSRC

    I would like to thank the Windows, Internet Explorer and .NET teams for their work in this effort as well as Ali Rahbar and Suha Can of the MSRC Engineering team for their hard work and input. I would also like to thank Matthew Green for the excellent write-ups he has for this and other applied cryptography issues on his blog.

  • Technical details of the targeted attack using IE vulnerability CVE-2013-3918

    Over the weekend we became aware of an active attack relying on an unknown remote code execution vulnerability of a legacy ActiveX component used by Internet Explorer. We are releasing this blog to confirm one more time that the code execution vulnerability will be fixed in today’s UpdateTuesday release and to clarify some details about the second vulnerability reported.

    The attack was disclosed to us by our security partners and it’s the typical targeted attack exploited through a specific “drive-by” legitimate website that was compromised to include an additional piece of code added by the attackers. At the moment we have analyzed samples from the active attack that are targeting only older Internet Explorer versions running on Windows XP (IE7 and 8) because of the lack of additional security mitigations on those platforms (Windows 7 is affected but not under active attack). EMET was able to proactively mitigate this exploit.

    The exploit was created combining two distinct vulnerabilities, but with different impact and severity ratings:

    1. a remote code execution vulnerability (CVE-2013-3918) in the InformationCardSigninHelper ActiveX component used by Internet Explorer;
    2. an information disclosure vulnerability (no CVE assigned yet) used by attackers only to improve the reliability of the exploit and to create ROP payloads specifically targeted for the victim’s machine;

    The remote code execution vulnerability with higher severity rating will be fixed immediately in today’s Patch Tuesday and we advise customers to prioritize the deployment of MS13-090 for their monthly release. As usual, customers with Automatic Updates enabled will not need to take any action to receive the update and will be automatically protected.

    The information disclosure vulnerability does not allow remote code execution and so it has a lower security rating since it will be typically used in combination with other high-severity bug (like it happened with CVE-2013-3918) to improve effectiveness of exploitation. Also, this vulnerability requires attackers to have prior knowledge of path and filenames present on targeted machines in order to be successfully exploited. This vulnerability was not used to bypass ASLR, but simply to remotely determine the exact version of a certain DLL on disk in order to build a more precise ROP payload (it’s a local information disclosure rather than a memory address disclosure).

    We are still investigating the impact and root cause of the information disclosure vulnerability and we may follow up with additional information and mitigations as they become available.

     

    Elia Florio – MSRC Engineering

  • Assessing risk for the November 2013 security updates

    Today we released eight security bulletins addressing 19 CVE’s. Three bulletins have a maximum severity rating of Critical while the other five have a maximum severity rating of Important. We hope that the table below helps you prioritize the deployment of the updates appropriately for your environment.

    Bulletin Most likely attack vector Max Bulletin Severity Max Exploit-ability Likely first 30 days impact Platform mitigations and key notes
    MS13-090

    (ActiveX killbit)

    Victim browses to a malicious webpage. Critical 1 Expect to continue seeing driveby-style attacks leveraging CVE-2013-3918. Addresses the out-of-bounds memory access vulnerability mentioned on the FireEye blog on Friday: http://www.fireeye.com/blog/technical/2013/11/new-ie-zero-day-found-in-watering-hole-attack.html.  More information about this attack can be found on our blog at http://blogs.technet.com/b/srd/archive/2013/11/12/technical-details-of-the-targeted-attack-using-cve-2013-3918.aspx
    MS13-088

    (Internet Explorer)

    Victim browses to a malicious webpage. Critical 1 Likely to see reliable exploits developed within next 30 days.  
    MS13-089

    (Windows GDI)

    Victim opens a malicious .WRI file in Wordpad Critical 1 Likely to see reliable exploits developed within next 30 days. This update addresses a vulnerability in converting a BMP to WMF. While the Wordpad vector would be only “Important” severity, we believe other attack vectors may exists if third party applications are installed. Those attack vectors may not require user interaction. Therefore, out of an abundance of caution, we’ve rated this bulletin “Critical”.
    MS13-091

    (Word)

    Victim opens malicious Word document. Important 1 Likely to see reliable exploits developed within next 30 days.  
    MS13-092

    (Hyper-V)

    Attacker running code inside a virtual machine can cause bugcheck of host hypervisor system; or potentially execute code in another VM running on same hypervisor system. Important 1 Likely to see reliable denial-of-service exploit developed within next 30 days. Guest -> Host is denial-of-service (bugcheck). Guest -> Guest has potential for code execution.
    MS13-093

    (AFD.sys)

    Attacker running code at low privilege runs malicious EXE to reveal kernel memory addresses and contents. Important n/a No chance for direct code execution. Information disclosure only. Affects only 64-bit systems. Does not affect Windows 8.1.
    MS13-094

    (Outlook)

    Attacker sends victim S/MIME email that triggers a number of HTTP requests during S/MIME signature validation. Because requests can be sent to an arbitrary host and port, timing differences can reveal to the attacker which hosts and ports are accessible to the victim’s computer. Important n/a No chance for direct code execution. Information disclosure only. This vulnerability can be leveraged to “port scan” several thousand ports per S/MIME email opened by victim. Signature verification for multiple S/MIME signers in this way will take some time and will block Outlook during the process.
    MS13-095

    (Digital signature parsing denial-of-service)

    Attackers sends malformed X.509 certificate to web service causing temporary resource exhaustion denial-of-service condition. Important n/a No chance for direct code execution. Denial of service only.  

    - Jonathan Ness, MSRC Engineering

  • Security Advisory 2880823: Recommendation to discontinue use of SHA-1

    Microsoft is recommending that customers and CA’s stop using SHA-1 for cryptographic applications, including use in SSL/TLS and code signing. Microsoft Security Advisory 2880823 has been released along with the policy announcement that Microsoft will stop recognizing the validity of SHA-1 based certificates after 2016.

    Background

    Secure Hashing Algorithm 1 (SHA-1) is a message digest algorithm published in 1995 as part of NIST’s Secure Hash Standard. A hashing algorithm is considered secure only if it produces unique output for any given input and that output cannot be reversed (the function only works one-way).

    Since 2005 there have been known collision attacks (where multiple inputs can produce the same output), meaning that SHA-1 no longer meets the security standards for a producing a cryptographically secure message digest.

    For attacks against hashing algorithms, we have seen a pattern of attacks leading up to major real-world impacts:

    Short history of MD5 Attacks

    Source: Marc Stevens, Cryptanalysis of MD5 and SHA-1

    • 1992: MD5 published
    • 1993: Pseudo-collision attack
    • 2004: Identical-prefix collision found in 2^40 calls
    • 2006: chosen-prefix collision found in 2^49 calls
    • 2009: identical-prefix and chosen prefix optimized to 2^16 and 2^39 calls respectively, Rouge CA practical attacks implemented

    It appears that SHA-1 is on a similar trajectory:

    • 1995: SHA-1 published
    • 2005: SHA-1 collision attack published in 2^69 calls
    • 2005: NIST recommendation for movement away from SHA-1
    • 2012: Identical-prefix collision 2^61 calls presented
    • 2012: Chosen-prefix collision 2^77.1 calls presented

    Current Issues

    Microsoft is actively monitoring the situation and has released a policy for deprecating SHA-1 by 2016.

    Microsoft Recommendations

    Microsoft recommends that Certificate Authorities (CA’s) stop using SHA-1 for digital signatures and that consumers request SHA-2 certificates from CA’s.

    Microsoft Policy

    Microsoft has publicized a new policy that calls for users and CA’s to stop using SHA1-based certificates by 2016.

    - William Peteroy, MSRC

    I would like to thank the Microsoft PKI team as well as Ali Rahbar of the MSRC Engineering team for their hard work and input.

  • CVE-2013-3906: a graphics vulnerability exploited through Word documents

    Recently we become aware of a vulnerability of a Microsoft graphics component that is actively exploited in targeted attacks using crafted Word documents sent by email. Today we are releasing Security Advisory 2896666 which includes a proactive Fix it workaround for blocking this attack while we are working on the final update. In this blog, we’ll share details of the vulnerability and the Fix It workaround and provide mitigations and suggestions to layer protections against the attack.

     

    The exploit

    The attacks observed are very limited and carefully carried out against selected computers, largely in the Middle East and South Asia. The exploit needs some user interaction since it arrives disguised as an email that entices potential victims to open a specially crafted Word attachment. This attachment will attempt to exploit the vulnerability by using a malformed graphics image embedded in the document itself.

    In order to achieve code execution, the exploit combines multiple techniques to bypass DEP and ASLR protections. Specifically, the exploit code performs a large memory heap-spray using ActiveX controls (instead of the usual scripting) and uses hardcoded ROP gadgets to allocate executable pages. This also means the exploit will fail on machines hardened to block ActiveX controls embedded in Office documents (e.g. Protected View mode used by Office 2010) or on computers equipped with a different version of the module used to build the static ROP gadgets.

     

     

     Heap-Spray of memory

     

     

     Initial ROP gadgets

     

    Affected software

    Our initial investigations show that the vulnerability will not affect Office 2013 but will affect older versions such as Office 2003 and 2007. Due to the way Office 2010 uses the vulnerable graphic library, it is only affected only when running on older platforms such as Windows XP or Windows Server 2003, but it is not affected when running on newer Windows families (7, 8 and 8.1). This is another example that demonstrates the benefits of running recent versions of software in terms of security improvements (consider also that Windows XP support will end in April 2014). For more information and for the complete list of affected software, please refer to Security Advisory 2896666.

     

     Office 2003

     Affected

     Office 2007

     Affected

     Office 2010

     Affected only on Windows XP/Windows Server2003

     Office 2013

     Not affected

     

     

    Fix it workaround

    We created a temporary Fix it workaround that can block this attack. This temporary workaround doesn’t address the root cause of the vulnerability but simply changes the configuration of the computer to block rendering of the vulnerable graphic format that can trigger the bug. The change made by the Fix it consists in adding the following key to the local registry:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Gdiplus\DisableTIFFCodec = 1

    We advise customers to evaluate usage of TIFF images in their environment before applying this workaround.

     

    Other layers of defense

    Users who are not able to deploy the Fix it workaround can still take some important steps to raise the bar for attackers and protect themselves.

    • Install EMET (the Enhanced Mitigation Experience Toolkit)

    Our tests shows that EMET is able to mitigate this exploit in advance when any of the following mitigations are enabled for Office binaries:

      1. multiple ROP mitigations (StackPointer, Caller, SimExec, MemProt) available in EMET 4.0;
      2. other mitigations (MandatoryASLR, EAF, HeapSpray ) included in EMET 3.0 and 4.0;

     

     

     

    • Use Protected View and block ActiveX controls in Office documents

    Even if the vulnerability relies in a graphic library, attackers deeply rely on other components to bypass DEP/ASLR and execute code, so users can still makes exploitation more difficult and unreliable by using Protected View to open attachments (default for Office 2010) or simply by blocking the execution of ActiveX controls embedded in Office documents. These general recommendations for Office hardening and better protection against attacks have been already suggested in the past in the following blogs which include examples and more details:

    http://blogs.technet.com/b/srd/archive/2012/04/10/ms12-027-enhanced-protections-regarding-activex-controls-in-microsoft-office-documents.aspx

    http://blogs.technet.com/b/mmpc/archive/2012/08/31/a-technical-analysis-on-cve-2012-1535-adobe-flash-player-vulnerability-part-2.aspx

     

    Finally, we are working with our MAPP partners to provide information that will help to detect samples related to this attack and improve overall coverage of antimalware and security products.

    We’d like to thank Haifei Li of McAfee Labs IPS Team for reporting this vulnerability in a coordinated manner and for collaborating with us.

     

    - Elia Florio, MSRC Engineering

     

  • Software defense: safe unlinking and reference count hardening

    Object lifetime management vulnerabilities represent a very common class of memory safety vulnerability.  These vulnerabilities come in many shapes and sizes, and are typically quite difficult to mitigate generically.  Vulnerabilities of this type result commonly from incorrect accounting with respect to reference counts describing active users of an object, or improper handling of certain object states or error conditions.  While generically mitigating these issues represents an ongoing challenge, Microsoft has taken steps towards mitigating certain, specific classes of these issues in Windows 8 and Windows 8.1.  These mitigations typically involve widespread instrumentation of code to reduce the impact of specific classes of issues.

    Introducing fast fail

    Before we further detail some of the mitigations discussed in this post, it’s important to take a brief moment to outline the mechanism by which these upcoming mitigations report their failures.

    When it comes to memory safety mitigations, one of the most basic (but sometimes overlooked) aspects of a mitigation is what to do when corruption has been detected.  Typical memory safety mitigations attempt to detect some sort of indication that a program has “gone off the guard rails” and severely corrupted some form of its internal state; consequently, it is valuable for the code that detects the corruption to assume the minimum possible about the state of the process, and to depend on as little as possible when dealing with the error condition (often leading to a crash dump being captured, and the faulting program being terminated).

    The mechanisms for dealing with triggering crash dump capture and program termination have historically been very environment-specific.  The APIs often used to do so in user mode Windows, for example, do not exist in the Windows kernel; instead, a different set of APIs must be used.  Furthermore, many existing mechanisms have not been designed to absolutely minimize dependencies on the state of the corrupted program at the time of error reporting.

    Environment-specific mechanisms for critical failure reporting are also problematic for compiler generated code, or code that is compiled once and then linked in to programs that might run in many different environments (such as user mode, kernel mode, early boot, etc.).  Previously, this problem has typically been addressed by providing a small section of stub code that is linked in to a program and which provides an appropriate critical failure reporting abstraction.  However, this approach becomes increasingly problematic as the scope of programs that take dependencies on said stub library increases.  For security features whose error reporting facilities are linked in to vast numbers of programs, the stub code must be extremely circumspect with respect to which APIs it may take dependencies on.

    Take the case of /GS as an example; directly linking to the crash dump writing code would pull that code in to nearly every program built with /GS enabled, for example; this would clearly be highly undesirable.  Some programs might need to run on OS’s before those facilities were even introduced, and even if that were not the case, pulling in additional dependent DLLs (or static linked library code) across such a wide scope of programs would incur unacceptable performance implications.

    To address the needs of both future compiler-based (code generation altering) mitigations, which would strongly prefer to be as environment, as well as common framework/library-based mitigations, we introduced a facility called fast fail (sometimes referred to as fail fast) to Windows 8 and Visual Studio 2012.  Fast fail represents a uniform mechanism for requesting immediate process termination in the context of a potentially corrupted process that is enabled by a combination of support in various Microsoft runtime environments (boot, kernel, user, hypervisor, etc.) as well as a new compiler intrinsic, __fastfail.  Code using fast fail has the advantage of being inlineable, compact (from a code generation perspective), and binary portable across multiple runtime environments.

    Internally, fast fail is implemented by the several architecture-specific mechanisms:

    Architecture

    Instruction

    Location of “Code” argument

    AMD64

    int 0x29

    rcx

    ARM

    Opcode 0xDEFB*

    r0

    x86

    int 0x29

    ecx

    * ARM defines a range of Thumb2 opcode space that is permanently undefined, and which will never allocated for processor use.  These opcodes can be used for platform-specific purposes.

    A single, Microsoft-defined code argument (assigned symbolic constants prefixed with FAST_FAIL_<description> in winnt.h and wdm.h) is provided to the __fastfail intrinsic.  The code argument, intended for use in classifying failure reports, describes the type of failure condition and is incorporated into failure reports in an environment-specific fashion.

    A fast fail request is self-contained and typically requires just two instructions to execute.  The kernel, or equivalent, then takes the appropriate action once a fast fail request has been executed.  In user mode code, there are no memory dependencies involved (beyond the instruction pointer itself) when a fast fail event is raised, maximizing its reliability even in the case of severe memory corruption.

    User mode fast fail requests are surfaced as a second chance non-continuable exception with exception code 0xC0000409 and with at least one exception code (the first exception parameter being the fast fail code that was supplied as an argument to the __fastfail intrinsic).  This exception code, previously used exclusively to report /GS stack buffer overrun events, was selected as it is already known to the Windows Error Reporting (WER) and debugging infrastructure as an indication that the process is corrupt and minimal in-process actions should be taken in response to the failure.  Kernel mode fast fail requests are implemented with a dedicated bugcheck code, KERNEL_SECURITY_CHECK_FAILURE (0x139).  In both cases, no exception handlers are invoked (as the program is expected to be in a corrupted state).  The debugger (if present) is given an opportunity to examine the state of the program before it is terminated.

    Pre-Windows 8 operating systems that do not support the fast fail instruction natively will typically treat a fast fail request as an access violation, or UNEXPECTED_KERNEL_MODE_TRAP bugcheck.  In these cases, the program is still terminated, but not necessarily as quickly.

    The compact code-generation characteristics and support across multiple runtime environments without additional dependencies make fast fail ideal for use by mitigations that involve program-side code instrumentation, whether these be compiler-based or library/framework-based.  Since the failure reporting logic can be embedded directly in application code in an environment-agnostic fashion, at the specific point where the corruption or inconsistency was detected, there is minimal disturbance to the active program state at the time of failure detection.  The compiler can also implicitly treat a fast fail site as “no-return”, since the operating system does not allow the program to be resumed after a fast fail request (even in the face of exception handlers), enabling further optimizations to minimize the code generation impact of failure reporting.  We expect that future compiler-based mitigations will take advantage of fast fail to report failures inline and in-context (where possible).

    Safe unlinking retrospective

    Previously, we discussed the targeted addition of safe unlinking integrity checks to the executive pool allocator in the Windows 7 kernel.  Safe unlinking (and safe linking) are a set of general techniques for validating the integrity of a doubly-linked list when a modifying operation, such as a list entry unlink or link, occurs.  These techniques operate by verifying that the neighboring list links for a list entry being acted upon actually still point to the list entry being linked or unlinked into the list.

    Safe unlinking operations have historically been an attractive defense to include to the book-keeping data structures of memory allocators as an added defense against pool overrun or heap overrun vulnerabilities.  Windows XP Service Pack 2 first introduced safe unlinking to the Windows heap allocator, and Windows 7 introduced safe unlinking to the executive pool allocator in the kernel.  To understand why this is a valuable defensive technique, it is helpful to examine how memory allocators are often implemented.

    It is common for a memory allocator to include a free list of available memory regions that may be utilized to satisfy an allocation request.  Frequently, the free list is implemented by embedding a doubly linked list entry inside of an available memory block that is logically located on the free list of the allocator, in addition to other metadata about the memory block (such as its size).  This scheme allows an allocator to quickly locate and return a suitable memory block to a caller in response to an allocation request.

    Now, when a memory block is returned to a caller to satisfy an allocation, it is unlinked from the free list.  This involves updating the neighboring list entries (located within the list entry embedded in the free allocation block) to point to one another, instead of the block that has just been freed.  In the context of an overrun scenario, where an attacker has managed to overrun a buffer and overwrite the contents of a neighboring, freed memory block header, the attacker may have the opportunity to supply arbitrary values for the next and previous pointers, which will then be written through when the (overwritten) freed memory block is next allocated.

    This yields what is commonly called a “write-what-where” or “write anywhere” primitive that lets an attacker choose a specific value (what) and a specific address (where) to store said value.  This is a powerful primitive from an exploitation perspective, and affords an attacker a high degree of freedom [2].

    In the context of memory allocators, safe unlinking helps mitigate this class of vulnerability by verifying that the list neighbors still point to the elements that the list entry embedded within the freed block says they should.  If the block’s list entry has been overwritten and an attacker has commandeered its list entries, this invariant will typically fail (provided that the logically previous and next list entries are not corrupted as well), enabling the corruption to be detected.

    Safe unlinking in Windows 8

    Safe unlinking is broadly applicable beyond simply the internal linked lists of memory allocators; many applications and kernel mode components utilize linked lists within their own data structures.  These data structures also stand to benefit from having safe unlinking (and safe linking) integrity checks inserted; beyond simply providing protection against heap-based overruns overwriting list pointers in application-specific data on the heap[1], linked list integrity checks in application-level code often provide a means to better protect against conditions where an application might erroneously delete an application-specific object containing a linked list entry twice (due to an application-specific object lifetime mismanagement issue), or might otherwise incorrectly use or synchronize access to a linked list.

    Windows provides a generalized library for manipulating doubly-linked lists, in the form of a set of inline functions provided in common Windows headers that are both exposed to third party driver code as well as heavily used internally.  This library is well-suited as a central location instrument code throughout the Microsoft code base, as well as third party driver code by extension, with safe unlinking (and safe linking) list integrity checks.

    Starting with Windows 8, the “LIST_ENTRY” doubly linked list library is instrumented with list integrity checks that protect code using the library against list corruption.  All list operations that write through a list entry node’s list link pointer will first check that the neighboring list links still point back to the node in question, which enables many classes of issues to be caught before they cause further corruption (for example, a double-remove of a list entry is typically immediately caught at the second remove).  Since the library is designed as an operating-environment-agnostic, inline function library, the fast fail mechanism is used to report failures.

    Within Microsoft, our experience has been that the safe linking (and safe unlinking) instrumentation has been highly effective at identifying linked list misuse, with in excess of over 100 distinct bugs fixed in the Windows 8 development cycle on account of the list integrity checks.  Many Windows components leverage the same doubly linked list library, leading to widespread coverage throughout the Windows code base [1].

    We have also enabled third party code to take advantage of these list integrity checks; drivers that build with the Windows 8 WDK will get the integrity checks by default, no matter what OS is targeted at build time.  The integrity checks are backwards compatible to previous OS’s; however, previous OS releases will react to a list entry integrity check failure in a driver with a more generic bugcheck code such as UNEXPECTED_KERNEL_MODE_TRAP, rather than the dedicated KERNEL_SECURITY_CHECK_FAILURE bugcheck code introduced in Windows 8.

    With any broad form of code instrumentation, one concern that is nearly omnipresent naturally relates to the performance impact of the instrumentation.  Our experience has been that the performance impact of safe unlinking (and safe unlinking) is minimal, even in workloads that involve large number of list entry manipulation operations.  Since the list entry manipulation operations already inherently involve following pointers through to the neighboring list entries, simply adding an extra comparison (with a branch to a common fast fail reporting label) has proven to be quite inexpensive.

    Reference count hardening

    It is common for objects that have non-trivial lifetime management to utilize reference counts to manage responsibility for keeping a particular object alive, and cleaning the object up once there are no active users of the object.  Given that object lifetime mismanagement is one of the most common situations where memory corruption vulnerabilities come in to play, it is thus no particular surprise that reference counts are often center stage when it comes to many of these vulnerabilities.

    While there has been research into this area (for example, Mateusz “j00ru” Jurczyk’s November 2012 case study on reference count vulnerabilities [5]), generically mitigating all reference count mismanagement issues remains a difficult problem.  Reference count-related vulnerabilities can generally be broken down into several broad classes:

    • Under-referencing an object (such as forgetting to increase the reference count when taking out a long-lived pointer to an object, or decrementing the reference count of an object improperly).  These vulnerabilities are difficult to cheaply mitigate as the information available to ascertain whether a reference count should be decremented at a certain time based on the lifetime model of a particular object is often not readily available at the time when the reference count is decremented.  This class of vulnerability can lead to an object being deleted while another user of the object still holds what they believe to be a valid pointer to the object; the object could then be replaced with potentially attacker-controlled data if the attacker can allocate memory on the heap at the same location as the just-deleted object.
    • Over-referencing an object (such as forgetting to decrement a reference count in an error path).  This class of vulnerability is common in situations where a complex section of code has an early-exit path that does not clean up entirely.  Similar to under-referencing, this class of vulnerability can also eventually lead to an object being prematurely deleted should the attacker be able to force the reference count to “wrap around” to zero after repeatedly exercising the code path that obtains (but then forgets to release) a reference to a particular object.  Historically, this class of vulnerability has most often had an impact in the local kernel exploitation arena, where there is typically a rich set of objects exposed to untrusted user mode code, along with a variety of APIs to manipulate the state of said objects.

    Starting with Windows 8, the kernel object manager has started enforcing protection against reference count wrap in its internal reference counts.  If a reference count increment operation detects that the reference count has wrapped, then an immediate REFERENCE_BY_POINTER bugcheck is raised, preventing the wrapped reference count condition from being exploited by causing a subsequent use-after-free situation.  This enables the over-referencing class of vulnerabilities to be strongly mitigated in a robust fashion.  We expect that with this hardening in place, it will not be practical to exploit an over-reference condition of kernel object manager objects for code execution, provided that all of the add-reference paths are protected by the hardening instrumentation.

    Furthermore, the object manager also similarly protects against transitions from <= 0 references to a positive number of references, which may make attempts to exploit other classes of reference count vulnerabilities less reliable if an attacker cannot easily prevent other reference count manipulation “traffic” from occurring while attempting to leverage the use after free condition.  However, it should still be noted that this is not a complete mitigation for under-referencing issues.

    In Windows 8.1, we have stepped up reference count hardening in the kernel by adding this level of hardening to certain other portions of the kernel that maintain their own, “private” reference counts for objects not managed by the object manager.  Where possible, code has additionally been converted to use a common set of reference count management logic that implements the same level of best practices that the object manager’s internal reference counts do, including usage of pointer-sized reference counts (which further helps protect against reference count wrap issues, particularly on 64-bit platforms or conditions where an attacker must allocate memory for each leaked reference).  Similar to the list entry integrity checks introduced in Windows 8, where reference count management is provided as an inline function library, fast fail is used as a convenient and low-overhead mechanism to quickly abort the program when a reference count inconsistency is detected.

    A concrete example of a vulnerability that would have been strongly mitigated by the broader adoption of reference count hardening in Windows 8.1 is CVE-2013-1280 (MS13-017), which stemmed from an early-exit code path (in response to an error condition) in the I/O manager, within which the code did not properly release reference count to an internal I/O manager object that was previously obtained earlier in the vulnerable function.  If an attacker were able to exercise the code path in question repeatedly, then they may have been able to cause the reference count to wrap around and thus later trigger a use after free condition.  With the reference count hardening in place, an attempt to exploit this vulnerability would have resulted in an immediate bugcheck instead of a potential use after free situation arising.

    Conclusion

    The reference count and list entry hardening changes introduced during Windows 8 and expanded on during Windows 8.1 are designed to drive up the cost of exploitation of certain classes of object lifetime management vulnerabilities.  Situations such as over-referencing or leaked references can be strongly mitigated when protected by the reference count hardening deployed in Windows 8 and Windows 8.1, making it extremely difficult to practically exploit them for code execution.  Pervasively instrumenting list entry operations throughout the Microsoft code base (and increasingly through third party drivers that use the Windows 8, or above, WDK) makes exploiting certain lifetime mismanagement issues less reliable, and improves reliability by catching corruption closer to the cause (and in some cases before corruption can impact other parts of the system).

    That being said, there continue to be future opportunities to increase adoption of these classes of mitigations throughout Microsoft’s code base (and third parties, by extension), as well as potential opportunities for future compiler-based or framework-based broad instrumentation to catch and detect other classes of issues.  We expect to continue to research and invest further in compiler-based and framework-based mitigations for object lifetime management (and other vulnerability classes) in the future.

    - Ken Johnson

    References

    [1] Ben Hawkes.  Windows 8 and Safe Unlinking in NTDLL.  July, 2012.

    [2] Kostya Kortchinsky.  Real World Kernel Pool Allocation.  SyScan.  July, 2008.

    [3] Chris Valasek. Modern Heap Exploitation using the Low Fragmentation Heap. SyScan Taipei. Nov, 2011.

    [4] Adrian Marinescu.  Windows Vista Heap Management Enhancements.  Black Hat USA.  August, 2006.

    [5] Mateusz “j00ru” Jurczyk.  Windows Kernel Reference Count Vulnerabilities – Case Study.  November 2012.