This entry has nothing to do with malware. Just so you know.

Some people know that I like the demo scene. I've been following it for more than 20 years now, but it's even older than that. I like the size-optimisation competitions best, and I've even participated in a few - most recently, smallest downloader on 32-bit Windows XP: 233 bytes (255 bytes on Vista and later), print the EICAR test string: 56 bytes. Of particular interest to me are the demos in 512 bytes or less. They are so small that in order to have cool effects, a structured file is unusable, so only a .com file works here. As a result, they only run in DOS or a 32-bit console window (or via an emulator). No 64-bit systems here. Even now, in 2011, there was a 128-bytes competition, and the year is not over yet.

How do you make a file that small? Mostly it's just amazing code, but to save a few bytes it's also quite common to rely on the initial register values instead of initialising them manually.

The question, though, is which registers hold what values... and why? This is something that I have never seen written down. I suspect that it's just something that "everybody knows".

Let's take a look at a few versions of DOS, to see what I mean:

version
reg 3.3 4.01 5.0 6.0 7.0
ax 0000 0000 0000 0000 0000
cx 00ff 00ff 00ff 00ff 00ff
dx cs cs cs cs cs
bx 0000 0000 0000 0000 0000
bp 0882 091c 091c 091c 091c
si ip ip ip ip 0100
di sp sp sp sp sp

Note that these values are for real DOS. For certain versions of the Windows console, the bp register value is 091e.

So that's the which and the what. As for the why...

bp:
0019:000041DA BC 20 09 MOV SP, 0920
...
0019:000041F9 36 FF 16 EA 05 CALL NEAR WORD PTR SS:[05EA]

Now the sp register value is 091e.

0019:00009B6E 55 PUSH BP

Now the sp register value is 091c.

0019:00009B6F 8B EC MOV BP, SP

And now so is the bp register value.

dx:
0019:00009FA6 8B 56 EE MOV DX, WORD PTR SS:[BP - 12]

This value is the result of a memory allocation, and depends on the size and structure of the image being loaded.

cx:
0019:0000A02F F3 A4 REPE MOVS BYTE PTR ES:[DI], BYTE PTR DS:[SI]

Now the cx register value is 0000.

0019:0000A031 FE C9 DEC CL

And now it's 00ff.

bx:
0019:0000A035 32 FF XOR BH, BH
...
0019:0000A040 32 DB XOR BL, BL

Now the bx register value is 0000.

si:
0019:0000A0AC 36 C5 36 C4 0F LDS SI, DWORD PTR SS:[0FC4]

Now the si register value is assigned, and depends on the structure of the image being loaded (0100 for .com files).

di:
0019:0000A0B1 36 C4 3E C0 0F LES DI, DWORD PTR SS:[0FC0]

Now the di register value is assigned, and depends on the structure of the image being loaded (fffe for .com files).

ss:
0019:0000A0B6 8C C0 MOV AX, ES
...
0019:0000A0E1 8E D0 MOV SS, AX

Here we see that the dx register is not the source of the ss register value, as is commonly assumed.

sp:
0019:0000A0E3 8B E7 MOV SP, DI

Now the sp register is assigned, and we see that the di register is its source.

0019:0000A0E6 1E PUSH DS
0019:0000A0E7 56 PUSH SI

Aliases for the cs and ip registers are pushed onto the stack, and we see that the dx register is not the source of the cs register value, either.

ds, es:
0019:0000A0E8 8E C2 MOV ES, DX
0019:0000A0EA 8E DA MOV DS, DX

ax:
0019:0000A0EC 8B C3 MOV AX, BX

Now the ax register value is 0000.

0019:0000A0EE CB RETFW

The file runs, and the mystery is solved.

- Peter Ferrie