A brief introduction to modern x86 assembly language

Several people have personally requested that I give a brief
introduction to modern x86 (sometimes called IA32) assembly language.
For simplicity’s sake, I’ll stick with the 32-bit version with a flat
memory model. AMD64
(sometimes called x64) just isn’t as popular as x86 yet
, so this seems safe.

For some reason, there’s a mythos around assembly language. People associate it with bearded gurus, assuming only ninjas can program in it, when, in principle, assembly language
is one of the simplest programming languages there is. Any complexity
stems from a particular architecture’s oddities, and even though x86 is one of the
oddest of them all, I’ll show you that it can be easy to read and write.

First, I’ll describe the basic architecture. When programming in assembly,
there are three main concepts:

Instructions are the individual commands that tell the
computer to perform an operation. These include instructions for
adding, multiplying, comparing, copying, performing bit-wise operations,
accessing memory, and communicating with external devices. The
computer executes instructions sequentially.

Registers are where temporary values go. There is a
small, fixed set of registers available for use. Since there aren’t many registers, nothing stays in
them for very long, as they ar soon needed for other purposes.

Memory is where longer-lived data goes. It’s a
giant, flat array of bytes (8-bit quantities). It’s much slower to
access than registers, but there’s a lot of it.

Before I get into some examples, let me describe the registers
available on x86. There are only 8 general-purpose registers, each of
which is 32 bits wide. They are:

  • EAX
  • EBX
  • ECX
  • EDX
  • ESI
  • EDI
  • EBP – used when accessing local variables or function arguments
  • ESP – used when calling functions

On x86, most instructions have two operands, a destination and a
source. For example, let’s add two and three:

mov eax, 2   ; eax = 2
mov ebx, 3   ; ebx = 3
add eax, ebx ; eax = 2 + 3 = 5

add eax, ebx adds the values in registers eax and ebx, and stores
the result back in eax. (BTW, this is one of the oddities of x86.
Other modern architectures differentiate between destination and
source operands, which would look like add eax, ebx, ecx
meaning eax = ebx + ecx. On x86, the first operand is read and written in the same instruction.)

mov is the data movement instruction. It copies values
from one register to another, or from a constant to a register, or
from memory to a register, or from a register to memory.

Speaking of memory, let’s say we want to add 2 and 3, storing the
result at address 32. Since the result of the addition is 32 bits, the result will
actually use addresses 32, 33, 34, and 35. Remember, memory is
indexed in bytes.

mov eax, 2
mov ebx, 3
add eax, ebx
mov edi, 32
mov [edi], eax ; copies 5 to address 32 in memory

What about loading data from memory? (Reads from memory are called
loads. Writes are called stores.) Let’s write a program that copies
1000 4-byte quantities (4000 bytes) from address 10000 to address
20000.

mov esi, 10000 ; by convention, esi is often used as the 'source' pointer
mov edi, 20000 ; similarly, edi often means 'destination' pointer
mov ecx, 1000 ; let's copy 1000 32-bit items

begin_loop:
mov eax, [esi] ; load from source
mov [edi], eax ; store to destination
add esi, 4
add edi, 4

sub ecx, 1 ; ecx -= 1
cmp ecx, 0 ; is ecx 0?

; if ecx does not equal 0, jump to the beginning of the loop
jne begin_loop
; otherwise, we're done

This is how the C memcpy function works. Not so bad, is
it? For reference, this is what our x86 code would look like in C:

int* src = (int*)10000;
int* dest = (int*)20000;
int count = 1000;
while (count--) {
    *dest++ = *src++;
}

From here, all it takes is a good instruction
reference
, some memorization, and a bit of practice. x86 is full
of arcane details (it’s 30 years old!), but once you’ve got the basic
concepts down, you can mostly ignore them. I hope I’ve shown you that writing x86
is easy. Perhaps more importantly, I hope you won’t be intimidated the next time Visual Studio
shows you the assembly for your program. Understanding how the machine is executing your code
can be invaluable when debugging.

2 thoughts on “A brief introduction to modern x86 assembly language”

  1. This is pretty good stuff to know, but any time I’m tempted to play with asm I get caught up in little details like what assembler do I use, how do I call built in OS functions from it, and where do I find libraries for asm rather than C. Just so I know that I can produce some results. I never did decide on where to start with that.

Leave a Reply

Your email address will not be published. Required fields are marked *