跳转至

Lecture 2: x86 Assembly and Call Stack

Compiler, Assembler, Linker, Loader

alt text

  • Compiler: Converts C code into assembly code (RISC-V, x86)
  • Assembler: Converts assembly code into machine code (raw bits)
    • Think 61C’s RISC-V “green sheet”
  • Linker: Deals with dependencies and libraries
    • You can ignore this part for 161
  • Loader: Sets up memory space and runs the machine code

Endianness

Word (32-bit machine)

On each row of the grid, we put 4 bytes = 1 word.

We mainly focus on 32-bits machine in this class :)

alt text

We can combine all the bytes on a row to form a word.

We need to care 2 things:

  1. depict a byte
  2. depict a word

To be specific:

  • If I ask for the byte at address 0x00000000, you should say 0x11.
  • If I ask for the word at that same address, you should say 0x44332211.

Little-endian words

  1. We can combine four bytes on a row to form a word.
  2. However, x86 is little-endian, which means the word formed from the first four bytes is actually 0x44332211!
  3. This is just like the dates: each group of 4 bytes is a word. The only difference is how you interpret those bytes (the order they appear).

alt text

Why called Little-Endian word?

The least significant byte is stored at the smallest address.

0x44332211

LSB is 0x11, and it's stored in the smallest address.

You can understand "Smallest Address" in two ways:

1) The picture above is a memory, and the smallest address is the leftmost one.

2) The Digital table

Memory Layout

alt text

Register

Registers are located on the CPU

This is different from the memory layout

Memory: addresses are 32-bit numbers

Registers are referred to by names (ebp, esp, eip), not addresses

alt text

alt text

alt text

Why Struct is so weird???

Ask David for help :(

X86 Architecture

  • Little-endian

    • The least-significant byte of multi-byte numbers is placed at the first/lowest memory address
    • Same as RISC-V
  • Variable-length instructions

    • When assembled into machine code, instructions can be anywhere from 1 to 16 bytes long
    • Contrast with RISC-V, which has fixed-length, 4-byte instructions

alt text

Register

alt text

Syntax

  • Register references are preceded with a percent sign %
    • Example: %eax, %esp, %edi
  • Immediates (constant values) are preceded with a dollar sign $
    • Example: $1, $161, $0x4
  • Memory references use parentheses and can have immediate offsets
    • Example: (%esp) dereferences memory at the address contained in ESP
    • Example: 8(%esp) dereferences memory 8 bytes above the address contained in ESP

alt text

alt text

Stack Layout

You can see the whole process here :)

alt text

One thing to note:

The sequence of coming into Stack:

  • local variables: ...
  • Struct Objects: ...

alt text

Steps to Function Calling

  1. Push arguments on the stack
  2. Push old eip (rip) on the stack
  3. Push old ebp (sfp) on the stack
  4. Adjust the stack frame
  5. Execute the function
  6. Restore everything

alt text