Lecture 2: x86 Assembly and Call Stack¶
Compiler, Assembler, Linker, Loader¶
- Compiler: Converts C code into assembly code (RISC-V, x86)
- Assembler: Converts assembly code into machine code (raw bits)
- Think 61C’s RISC-V “green sheet”
- Linker: Deals with dependencies and libraries
- You can ignore this part for 161
- Loader: Sets up memory space and runs the machine code
Endianness¶
Word (32-bit machine)¶
On each row of the grid, we put 4 bytes = 1 word.
We mainly focus on 32-bits machine in this class :)
We can combine all the bytes on a row to form a word.
We need to care 2 things:
- depict a byte
- depict a word
To be specific:
- If I ask for the byte at address
0x00000000
, you should say0x11
. - If I ask for the word at that same address, you should say
0x44332211
.
Little-endian words¶
- We can combine four bytes on a row to form a word.
- However, x86 is little-endian, which means the word formed from the first four bytes is actually
0x44332211
! - This is just like the dates: each group of 4 bytes is a word. The only difference is how you interpret those bytes (the order they appear).
Why called Little-Endian word?
The least significant byte is stored at the smallest address.
0x44332211
LSB is 0x11
, and it's stored in the smallest address.
You can understand "Smallest Address" in two ways:
1) The picture above is a memory, and the smallest address is the leftmost one.
2) The Digital table
Memory Layout¶
Register
Registers are located on the CPU
This is different from the memory layout
Memory: addresses are 32-bit numbers
Registers are referred to by names (ebp, esp, eip), not addresses
Why Struct is so weird???
Ask David for help :(
X86 Architecture¶
-
Little-endian
- The least-significant byte of multi-byte numbers is placed at the first/lowest memory address
- Same as RISC-V
-
Variable-length instructions
- When assembled into machine code, instructions can be anywhere from 1 to 16 bytes long
- Contrast with RISC-V, which has fixed-length, 4-byte instructions
Register¶
Syntax¶
- Register references are preceded with a percent sign
%
- Example:
%eax, %esp, %edi
- Example:
- Immediates (constant values) are preceded with a dollar sign
$
- Example:
$1, $161, $0x4
- Example:
- Memory references use parentheses and can have immediate offsets
- Example:
(%esp)
dereferences memory at the address contained in ESP - Example:
8(%esp)
dereferences memory 8 bytes above the address contained in ESP
- Example:
Stack Layout¶
You can see the whole process here :)
One thing to note:
The sequence of coming into Stack:
- local variables: ...
Struct
Objects: ...
Steps to Function Calling
- Push arguments on the stack
- Push old eip (rip) on the stack
- Push old ebp (sfp) on the stack
- Adjust the stack frame
- Execute the function
- Restore everything