Maintenance (updates, bugfixes) for Windows has typically been through hotfix packages, but these are not as “hot” as one might expect – very often we see prompts to restart the computer after a package is installed.
The reason is fairly simple – the update replaced a file on disk but does not have any control over the processes which may have the old version loaded, and handles to those files, so by copying the new version in and setting a registry flag to do a post-reboot operation to handle cleaning up the old files, a restart is the easiest way to ensure everything is in sync.
To allow for more dynamic updates that don’t require an immediate restart, hot patching was introduced, this required a bit of a reworking of how the modules appear in memory once they are compiled.
Even though entire modules are present in hotfix packages, often it is just 1 or 2 functions inside that have altered. Each function has an entry point – an address which we jump to when we call it.
The first instruction in any hot-patchable function is a 2-bye instruction: MOV EDI,EDI. This says to copy the contents of register EDI into register EDI - i.e. do nothing.
Immediately before the entry point are 5 single-byte commands that also “do nothing” – but as they are between functions they are not ever going to be executed so they are padding between functions in the module.
Here is an example output of debugging CALC.EXE on Windows Server 2008 x86 and unassembling around the function SetRadix: 0:000> u calc!SetRadix-8 calc!SwitchModes+0x133 00476c76 c20c00 ret 0Ch 00476c79 cc int 3 00476c7a cc int 3 00476c7b cc int 3 00476c7c cc int 3 00476c7d cc int 3 calc!SetRadix: 00476c7e 8bff mov edi,edi 00476c80 55 push ebp
You can see the previous function was SwitchModes, its exit point is at address 00476c76 – a 3-byte instruction to return to the caller. From 00476c79 thru 00476c7d are the 5 bytes making up the padding between the functions. At 00476c7e you can see the entry point for the function SetRadix, and the debugger has conveniently used the symbols to reflect this.
So why the 2-byte instruction that “does nothing” after 5 1-byte instructions that “do nothing”?
The reason is the instruction pointer used by the CPU – once it has executed the current instruction it increments the pointer by the length of the instruction – so after executing MOV EDI,EDI the instruction pointer is incremented by 2 (rather than 1 for a NOP).
Okay, great, so why is the “do nothing” instruction there at all? To allow us to dynamically load a modified version of the function somewhere else in memory, then hook the original function.
The hot patch mechanism allows us to copy a fixed version of the function somewhere in the virtual address space of the module being patched – the problem is that we don’t know where this will be, and it could be further than a near (2-byte) jump allows, we would need 5 bytes to perform a far jump.
After the fixed version of the function is in memory, we then replace the 5 bytes with a far jump to the location of our fixed function – this is safe as the instructions are never executed in normal operation, so the instruction pointer can never be looking at code we are modifying.
Then, we replace the MOV EDI,EDI command with a near jump 5 bytes backwards – from now on, future function calls will be trampolined to our fixed version seamlessly. This is safe even if a context switch occurred immediately after a thread made a call to the original function – the instruction pointer will either be saved pointing to the original location, in which case when we resume the thread the new, fixed version of the function is called, or it gets chance to execute the dummy instruction and advance by 2, which which case the thread would resume in the original version of the function.
If we used 2 NOPs (1-byte “no operation” instructions) instead of MOV EDI,EDI then the instruction pointer could be incremented by 1, then we replace the 2 bytes and the instruction pointer is now invalid when the thread resumes, as it would be pointing to half-way through an instruction.
So in the above example from CALC.EXE, pretending we loaded a fixed version of SetRadix at address 12345678, we would then replace the 5 bytes in address range 00476c79-00476c7d with a jump to the explicit address, so it would look like this: 00476c79 e9fae9ec11 jmp 12345678
e9 is the opcode for a FAR jump, followed by 32-bits to indicate the (signed) number of bytes to increase the instruction pointer after completing the current (5-byte) instruction. fae9ec11 comes from the calculation: destination address - current address - size of current instruction = 12345678 – 00476c76 - 5 = 11ece9fa
(Intel stores the data in reverse order which is why it gets flipped on building the instruction.)
Once the 5 bytes have been patched to do the far jump, the 2-byte instruction at 00476c7e gets replaced with a NEAR jump to our FAR jump: 00476c7e ebf9 jmp calc!SwitchModes+0x136 (00476c79)
eb is the opcode for a NEAR jump, followed by 8 bits to indicate the (signed) number of bytes to increase the pointer after completing the current (2-byte) instruction. f9 comes from the calculation: destination address - current address - size of current instruction = 00476c79 – 00476c7e - 2 = f9
Unassembling as we did before, we can see how the code has been altered to insert the trampoline: 0:000> u calc!SetRadix-8 calc!SwitchModes+0x133 00476c76 c20c00 ret 0Ch 00476c79 e9fae9ec11 jmp 12345678 calc!SetRadix: 00476c7e ebf9 jmp calc!SwitchModes+0x136 (00476c79) 00476c80 55 push ebp
(The debugger shows the offset of the jump relative to the previous function as the destination is before the entry point for the current one, this is just a display quirk as the debugger is trying to make sense of something we deliberately hacked.)
So now someone makes a call to calc!SetRadix and the following occurs: - the instruction pointer is set to the entry point for the function: 00476c7e - the instruction at this address is executed (near jump to our trampoline), changing the instruction pointer to 00476c79 - the instruction at this address is executed (far jump to our modified code), changing the instruction pointer to 12345678 - our modified code is now executed at the new location, which needs to ensure it handles the same input and output as the original function
(Opcodes are well-defined, which is how the processor knows eb should be followed by 1 byte while e9 is followed by 4 bytes.)
This is the methodology behind the hot patching technique – a hotfix installer package would need to contain the details of what to change in memory in order to achieve this, and be instructed to perform a hot patch in addition to replacing the module on disk with the updated version.