1.13 Worth noting: global vs. local variables
As we knew before that Global Variables that the OS zeros them automatically unlike Local Variables
Sometimes, you have a global variable and forgot (initialize) and your program depends on it being zero from the beginning. After that you modify the program and move this global variable inside a function to make it local. Then it won't be zeroed during initialization again, and this can bring annoying Bugs
1.14 Accessing passed arguments
The author started saying that we understood that the function that calls another function sends the parameters through the stack. But how does the called function (callee) access these parameters?
A simple example:
#include <stdio.h>
int f (int a, int b, int c) { // defines function f that takes three integers and returns an integer return a * b + c; // returns the result of a multiplied by b plus c}
int main() { // program entry point; defines the main function printf("%d\\n", f(1, 2, 3)); // calls printf to print the result of f(1,2,3) followed by a newline return 0; // returns success; indicates successful execution}1.14.1 x86
MSVC
Here this is what we see after the Compilation (MSVC 2010 Express):
_TEXT SEGMENT ; starts the text segment for code_a$ = 8 ; size = 4 ; defines offset for parameter a (8 bytes from EBP)_b$ = 12 ; size = 4 ; defines offset for parameter b (12 bytes from EBP)_c$ = 16 ; size = 4 ; defines offset for parameter c (16 bytes from EBP)
_f PROC ; starts the procedure for function f push ebp ; pushes EBP onto the stack to save the previous base pointer mov ebp, esp ; sets EBP to current ESP, establishing the stack frame mov eax, DWORD PTR _a$[ebp] ; moves the value of a (at [EBP+8]) into EAX imul eax, DWORD PTR _b$[ebp] ; multiplies EAX by the value of b (at [EBP+12]) and stores in EAX add eax, DWORD PTR _c$[ebp] ; adds the value of c (at [EBP+16]) to EAX pop ebp ; pops the saved EBP to restore the previous stack frame ret 0 ; returns from the function, with 0 bytes to clean up (callee cleans)_f ENDP ; ends the procedure for f
main PROC ; starts the procedure for main push ebp ; pushes EBP onto the stack mov ebp, esp ; sets EBP to ESP push 3 ; third parameter ; pushes 3 onto the stack as third argument push 2 ; second parameter ; pushes 2 onto the stack as second argument push 1 ; first parameter ; pushes 1 onto the stack as first argument call _f ; calls function f add esp, 12 ; cleans up 12 bytes from the stack (3 arguments * 4 bytes) push eax ; pushes the return value from f (in EAX) onto the stack for printf push OFFSET $SG2463 ; '%d', 0aH, 00H ; pushes the address of the format string for printf call _printf ; calls printf add esp, 8 ; cleans up 8 bytes from the stack (2 arguments * 4 bytes)
; return 0 ; comment for returning 0 xor eax, eax ; sets EAX to 0 using XOR (faster than mov eax, 0) pop ebp ; restores EBP ret 0 ; returns from main_main ENDP ; ends the main procedure
What we see here is that main() pushes 3 numbers onto the stack and calls f(int, int, int).
Accessing the parameters inside f() is done using macros like: _a$ = 8, in the same way we handle local variables but with positive offsets (addresses are determined by addition).
Meaning here, we point to the outer stack frame by adding the macro _a$ to the value in register EBP.
After that, the value of A is stored in register EAX. After executing IMUL instructions, the value in EAX will be the result of multiplying the value in it with the content of _b.
Then, using ADD we add the value of _c to EAX.
The value in EAX doesn't need to be moved, it's in the place it needs to be.
And upon returning to the caller, the value in EAX goes in the next call as a parameter to printf().
MSVC + OllyDbg
We will do this on the x32dbg as every time
We start first to compile the C with this Command:
cl /Zi /Od test.c ; compiles the C file test.c with debug information (/Zi) and optimization disabled (/Od)After that, we start running it on x32 dbg and we set Breakpoint at Main
The first element in the stack frame is the stored old value of EBP, the second is RA (return address), the third is the first parameter in the function, then the second, then the third.
And I said to do another experiment to confirm my understanding of the Assembly code, which is that I wanted to make it output finally like 13 for example, and the supposed equation is a*b+c, and to return number 13, I had to change the b and make it 10 so that it becomes 1*10+3 = 13, and I changed it in this Instruction imul eax, dword ptr ss:[ebp+0x0C] and made it imul eax, eax,A and indeed it gave me the result I want
GCC
Let’s compile the same in GCC 4.4.1 and see the results in IDA:
;-------------------------; Function: f;------------------------- ; header comment for function fpublic f ; declares f as publicf proc near ; starts the near procedure for f
arg_0 = dword ptr 8 ; 1st argument ; defines offset for first argument (at [EBP+8])arg_4 = dword ptr 0Ch ; 2nd argument ; defines offset for second argument (at [EBP+0Ch])arg_8 = dword ptr 10h ; 3rd argument ; defines offset for third argument (at [EBP+10h])
push ebp ; pushes EBP onto the stack to save previous base pointer mov ebp, esp ; moves ESP to EBP, setting up stack frame
mov eax, [ebp+arg_0] ; load 1st argument ; moves first argument into EAX imul eax, [ebp+arg_4] ; multiply by 2nd argument ; multiplies EAX by second argument add eax, [ebp+arg_8] ; add 3rd argument ; adds third argument to EAX
pop ebp ; pops EBP to restore previous stack frame retn ; returns from function
f endp ; ends the procedure for f
;-------------------------; Function: main;------------------------- ; header comment for function mainpublic main ; declares main as publicmain proc near ; starts the near procedure for main
var_10 = dword ptr -10h ; defines local variable at [ESP-10h]var_C = dword ptr -0Ch ; defines local variable at [ESP-0Ch]var_8 = dword ptr -8 ; defines local variable at [ESP-8]
push ebp ; pushes EBP onto stack mov ebp, esp ; sets EBP to ESP and esp, 0FFFFFFF0h ; align stack ; aligns ESP to 16-byte boundary by ANDing with 0xFFFFFFF0 sub esp, 10h ; allocate 16 bytes ; subtracts 16 from ESP to allocate space
mov [esp+10h+var_8], 3 ; 3rd argument ; stores 3 at [ESP+8] (third argument for f) mov [esp+10h+var_C], 2 ; 2nd argument ; stores 2 at [ESP+4] (second argument for f) mov [esp+10h+var_10], 1 ; 1st argument ; stores 1 at [ESP] (first argument for f) call f ; calls function f
mov edx, offset aD ; "%d\\n" ; moves address of format string to EDX mov [esp+10h+var_C], eax ; result from f() ; stores result from f (in EAX) at [ESP+4] for printf mov [esp+10h+var_10], edx ; stores format string address at [ESP] for printf call _printf ; calls printf
mov eax, 0 ; sets EAX to 0 (return value) leave ; leaves stack frame (mov esp, ebp; pop ebp) retn ; returns from function
main endp ; ends the procedure for mainThe result is almost like the previous one with some small differences we talked about before.
The stack pointer doesn't return to its place after the 2 functions (f and printf),
Because the LEAVE before the last one takes care of that in the end
1.14.2 x64
The story is a bit different in x86-64.
The arguments (first 4 or 6 of them) are sent through registers, meaning the callee reads them from registers not from the stack.
Listing 1.93: Optimizing MSVC 2012 x64 ; listing header
$SG2997 DB '%d', 0aH, 00H ; defines format string "%d\n" with null terminator
main PROC ; starts main procedure sub rsp, 40 ; allocates 40 bytes on stack mov edx, 2 ; sets EDX to 2 (second argument) lea r8d, QWORD PTR [rdx+1] ; R8D = 3 ; loads 3 into R8D (third argument) using LEA (rdx+1) lea ecx, QWORD PTR [rdx-1] ; ECX = 1 ; loads 1 into ECX (first argument) using LEA (rdx-1) call f ; calls f lea rcx, OFFSET FLAT:$SG2997 ; '%d' ; loads format string address into RCX mov edx, eax ; moves result from f (in EAX) to EDX (second argument for printf) call printf ; calls printf xor eax, eax ; sets EAX to 0 add rsp, 40 ; restores stack by adding 40 ret 0 ; returnsmain ENDP ; ends main
f PROC ; starts f procedure ; ECX - 1st argument ; comment: first arg in ECX ; EDX - 2nd argument ; comment: second arg in EDX ; R8D - 3rd argument ; comment: third arg in R8D imul ecx, edx ; multiplies ECX by EDX, result in ECX lea eax, DWORD PTR [r8+rcx] ; loads (r8 + rcx) into EAX using LEA ret 0 ; returnsf ENDP ; ends fLet's explain things in a simpler way
In x64 (64-bit) there is a new calling convention in Windows called Microsoft x64 calling convention.
This convention says first 4 arguments → are sent in registers
Meaning function f(a, b, c, d) receives them like this:
- a in ECX
- b in EDX
- c in R8D
- d in R9D
And this is what appears in the Assembly code above
And the LEA instruction here is used for addition, and the compiler clearly saw it as faster than ADD.
And also LEA was used in main() to prepare the first and third argument for function f(). The compiler probably decided that this would be faster than doing MOV.
Now let's see the non-optimizing version from MSVC:
Listing 1.94: MSVC 2012 x64 ; listing header
f proc near ; starts f ; shadow space: ; comment for shadow space arg_0 = dword ptr 8 ; defines arg_0 at [RSP+8] arg_8 = dword ptr 10h ; defines arg_8 at [RSP+10h] arg_10 = dword ptr 18h ; defines arg_10 at [RSP+18h]
; ECX - 1st argument ; comment ; EDX - 2nd argument ; comment ; R8D - 3rd argument ; comment
mov [rsp+arg_10], r8d ; stores third arg (R8D) at [RSP+18h] mov [rsp+arg_8], edx ; stores second arg (EDX) at [RSP+10h] mov [rsp+arg_0], ecx ; stores first arg (ECX) at [RSP+8]
mov eax, [rsp+arg_0] ; loads first arg into EAX imul eax, [rsp+arg_8] ; multiplies EAX by second arg add eax, [rsp+arg_10] ; adds third arg to EAX retn ; returnsf endp ; ends f
main proc near ; starts main sub rsp, 28h ; allocates 40 bytes (0x28h) on stack mov r8d, 3 ; 3rd argument ; sets R8D to 3 mov edx, 2 ; 2nd argument ; sets EDX to 2 mov ecx, 1 ; 1st argument ; sets ECX to 1 call f ; calls f mov edx, eax ; moves result to EDX for printf lea rcx, $SG2931 ; "%d\\n" ; loads format string into RCX call printf ; calls printf
; return 0 ; comment xor eax, eax ; sets EAX to 0 add rsp, 28h ; restores stack retn ; returnsmain endp ; ends mainThe view is a bit confusing because the 3 arguments coming from registers were stored in the stack for some reason.
This is called "Shadow Space":
shadow space = 4 places reserved on the stack always before any call in x64
Why? For two things?
- Every function in Windows must expect that there is space on the stack ready to receive the 4 arguments even if there are no arguments at all and this makes it easier for the debugger and ABI.
- The function can copy the parameters from registers to the stack if it needs to use them as variables
And this we saw here:
mov [rsp+arg_10], r8d ; copy 3rd arg ; stores R8D (third arg) to shadow spacemov [rsp+arg_8], edx ; copy 2nd arg ; stores EDX (second arg) to shadow spacemov [rsp+arg_0], ecx ; copy 1st arg ; stores ECX (first arg) to shadow spaceGCC
The GCC which is made for Optimization outputs somewhat understandable code:
f: ; label for function f ; EDI - 1st argument ; comment: first arg in EDI ; ESI - 2nd argument ; comment: second arg in ESI ; EDX - 3rd argument ; comment: third arg in EDX imul esi, edi ; multiplies ESI by EDI, result in ESI lea eax, [rdx + rsi] ; loads (RDX + RSI) into EAX using LEA ret ; returns
main: ; label for main sub rsp, 8 ; allocates 8 bytes on stack mov edx, 3 ; sets EDX to 3 (third arg) mov esi, 2 ; sets ESI to 2 (second arg) mov edi, 1 ; sets EDI to 1 (first arg) call f ; calls f
mov edi, OFFSET FLAT:.LC0 ; "%d\\n" ; sets EDI to format string address mov esi, eax ; sets ESI to result from f xor eax, eax ; number of vector registers passed ; zeros EAX (for vector registers count) call printf ; calls printf
xor eax, eax ; zeros EAX (return 0) add rsp, 8 ; restores stack ret ; returnsNon-optimizing GCC
f: ; label for f; EDI - Argument ; comment: first arg in EDI; ESI - Argument ; comment: second arg in ESI; EDX - Argument ; comment: third arg in EDXpush rbp ; pushes RBPmov rbp, rsp ; sets RBP to RSPmov DWORD PTR [rbp-4], edi ; stores first arg at [RBP-4]mov DWORD PTR [rbp-8], esi ; stores second arg at [RBP-8]mov DWORD PTR [rbp-12], edx ; stores third arg at [RBP-12]mov eax, DWORD PTR [rbp-4] ; loads first arg into EAXimul eax, DWORD PTR [rbp-8] ; multiplies EAX by second argadd eax, DWORD PTR [rbp-12] ; adds third arg to EAXleave ; leaves frameret ; returns
main: ; label for mainpush rbp ; pushes RBPmov rbp, rsp ; sets RBP to RSPmov edx, 3 ; sets EDX to 3mov esi, 2 ; sets ESI to 2mov edi, 1 ; sets EDI to 1call f ; calls fmov edx, eax ; moves result to EDXmov eax, OFFSET FLAT:.LC0 ; "%d\\n" ; sets EAX to format stringmov esi, edx ; sets ESI to resultmov rdi, rax ; sets RDI to format string (from EAX)mov eax, 0 ; sets EAX to 0 (vector registers)call printf ; calls printfmov eax, 0 ; sets EAX to 0leave ; leavesret ; returnsThere is no such thing as Shadow Space in System V of UNIX systems, but the callee (the function that was called) can save the arguments in the place it wants if there is a shortage in registers.
GCC: uint64_t instead of int
Our example works on int 32-bit, that's why the code uses 32-bit parts of registers that start with (E).
We can change it a bit to work with 64-bit values:
#include <stdio.h> ; includes stdio.h#include <stdint.h> ; includes stdint.h for uint64_t
uint64_t f (uint64_t a, uint64_t b, uint64_t c) ; defines f with uint64_t types{ return a*b+c; ; returns a*b + c};
int main() ; main function{ printf ("%lld\\n", f(0x1122334455667788, ; calls printf with f result 0x1111111122222222, 0x3333333344444444)); return 0; ; returns 0};Listing 1.97: Optimizing GCC 4.4.6 x64
f proc near ; starts fimul rsi, rdi ; multiplies RSI by RDIlea rax, [rdx+rsi] ; loads (RDX + RSI) into RAXretn ; returnsf endp ; ends f
main proc near ; starts mainsub rsp, 8 ; allocates 8 bytesmov rdx, 3333333344444444h ; 3rd argument ; sets RDX to 0x3333333344444444mov rsi, 1111111122222222h ; 2nd argument ; sets RSI to 0x1111111122222222mov rdi, 1122334455667788h ; 1st argument ; sets RDI to 0x1122334455667788call f ; calls fmov edi, offset format ; "%lld\\n" ; sets EDI to format stringmov rsi, rax ; sets RSI to resultxor eax, eax ; number of vector registers passed ; zeros EAXcall _printf ; calls printfxor eax, eax ; zeros EAXadd rsp, 8 ; restores stackretn ; returnsmain endp ; ends mainThe code is the same,
The only difference is that this time the full registers (that start with R-) are the ones used
1.14.3 ARM
Non-optimizing Keil 6/2013 (ARM mode)
.text:000000A4 00 30 A0 E1 MOV R3, R0 ; moves the value from R0 (first argument) to R3.text:000000A8 93 21 20 E0 MLA R0, R3, R1, R2 ; multiplies R3 by R1, adds R2, and stores the result in R0.text:000000AC 1E FF 2F E1 BX LR ; branches to the address in LR (return), possibly switching mode
.text:000000B0 main ; starts the main function.text:000000B0 10 40 2D E9 STMFD SP!, {R4, LR} ; stores R4 and LR on the stack (decrement before).text:000000B4 03 20 A0 E3 MOV R2, #3 ; sets R2 to 3 (third argument).text:000000B8 02 10 A0 E3 MOV R1, #2 ; sets R1 to 2 (second argument).text:000000BC 01 00 A0 E3 MOV R0, #1 ; sets R0 to 1 (first argument).text:000000C0 F7 FF FF EB BL f ; branches to f and links (calls f).text:000000C4 00 40 A0 E1 MOV R4, R0 ; moves the result from R0 to R4.text:000000C8 04 10 A0 E1 MOV R1, R4 ; moves R4 (result) to R1 (argument for printf).text:000000CC 5A 0F 8F E2 ADR R0, aD_0 ; "%d\\n" ; gets the address of the format string into R0.text:000000D0 E3 18 00 EB BL __2printf ; calls printf.text:000000D4 00 00 A0 E3 MOV R0, #0 ; sets R0 to 0 (return value).text:000000D8 10 80 BD E8 LDMFD SP!, {R4, PC} ; loads R4 and PC from stack (return)
The main() function simply calls two functions, and sends 3 values to the first function — which is f().
As we said before, in ARM the first 4 values are usually sent in the first 4 registers (R0-R3).
And function f(), as seen, uses the first 3 registers (R0–R2) as arguments.
The MLA (Multiply Accumulate) instruction multiplies the first two operands (R3 and R1), then adds the third (R2), and places the result in the zero register (R0), and it's like this:
R0 = R3 * R1 + R2
Multiplication and addition at once (Fused multiply–add) is a very useful operation. By the way, there was no such instruction in x86 before the FMA-instructions appeared in SIMD.
The first instruction MOV R3, R0 seems redundant (could do one MLA and done).
The compiler didn't do Optimization for it because we are here in non-optimizing compilation.
The BX instruction returns control to the address stored in LR, and if needed, changes the processor mode from Thumb to ARM or vice versa.
And this may be necessary, because as you see, the function f() doesn't know it might be called from ARM code or Thumb code.
So if someone called it from Thumb, the BX instruction not only returns control, it also returns the mode to Thumb.
And if it's called from ARM then it doesn't change anything.
Optimizing Keil 6/2013 (ARM mode)
.text:00000098 f ; starts function f.text:00000098 91 20 20 E0 MLA R0, R1, R0, R2 ; multiplies R1 by R0, adds R2, stores in R0.text:0000009C 1E FF 2F E1 BX LR ; returns to LR, possibly switching modeIn the -O3 the function f() was assembled like this:
Here MOV was removed (or reduced), and MLA now uses all the input registers and places the result directly in R0 — which is the place the caller will read it from directly
Optimizing Keil 6/2013 (Thumb mode)
.text:0000005E 48 43 MULS R0, R1 ; multiplies R0 by R1, sets flags, result in R0.text:00000060 80 18 ADDS R0, R0, R2 ; adds R2 to R0, sets flags.text:00000062 70 47 BX LR ; returns to LRThe MLA instruction is not available in Thumb mode, so compiler generates code that does the two operations (multiplication and addition) each separately.
The first instruction MULS multiplies R0 × R1, and places the result in R0.
The second instruction ADDS adds the result with R2 and leaves the result in R0.
ARM64
Optimizing GCC (Linaro) 4.9
Everything here is simple.
The MADD instruction is just an instruction that does multiplication + addition at the same time (similar to the MLA we saw before). The three arguments are sent in the 32-bit part of the X-registers. Indeed, the variable types are 32-bit ints.
The result is returned in W0.
f: ; starts function f madd w0, w0, w1, w2 ; multiplies w0 by w1, adds w2, stores in w0 ret ; returns
main: ; starts main ; save FP and LR to stack frame ; comment stp x29, x30, [sp, -16]! ; stores x29 (FP) and x30 (LR) on stack, decrements SP by 16
mov w2, 3 ; sets w2 to 3 (third argument) mov w1, 2 ; sets w1 to 2 (second argument) add x29, sp, 0 ; sets x29 to current SP (frame pointer) mov w0, 1 ; sets w0 to 1 (first argument)
bl f ; branches to f and links (calls f) mov w1, w0 ; moves result from w0 to w1 (for printf)
adrp x0, .LC7 ; gets page address of .LC7 into x0 add x0, x0, :lo12:.LC7 ; adds low 12 bits to get full address of format string bl printf ; calls printf
; return 0 ; comment mov w0, 0 ; sets w0 to 0
; restore FP & LR ; comment ldp x29, x30, [sp], 16 ; loads x29 and x30 from stack, increments SP by 16 ret ; returns
.LC7: ; label for string .string "%d\\n" ; defines the format string "%d\n"Now we extend all data types to 64-bit uint64_t and try:
f: ; starts f madd x0, x0, x1, x2 ; multiplies x0 by x1, adds x2, stores in x0 ret ; returns
main: ; starts main mov x1, 13396 ; moves lower part of value to x1 adrp x0, .LC8 ; gets page address of .LC8 stp x29, x30, [sp, -16]! ; saves FP and LR
movk x1, 0x27d0, lsl 16 ; moves next 16 bits to x1 shifted left by 16 add x0, x0, :lo12:.LC8 ; adds low bits for format string movk x1, 0x122, lsl 32 ; moves next 16 bits shifted left by 32 add x29, sp, 0 ; sets FP movk x1, 0x58be, lsl 48 ; moves upper 16 bits shifted left by 48
bl printf ; calls printf
mov w0, 0 ; sets w0 to 0 ldp x29, x30, [sp], 16 ; restores FP and LR ret ; returns
.LC8: ; label .string "%lld\\n" ; defines "%lld\n"Non-optimizing GCC (Linaro) 4.9
f: ; starts f sub sp, sp, #16 ; subtracts 16 from SP (allocates space)
str w0, [sp,12] ; stores w0 (first arg) at [SP+12] str w1, [sp,8] ; stores w1 (second arg) at [SP+8] str w2, [sp,4] ; stores w2 (third arg) at [SP+4]
ldr w1, [sp,12] ; loads first arg into w1 ldr w0, [sp,8] ; loads second arg into w0 mul w1, w1, w0 ; multiplies w1 by w0, result in w1
ldr w0, [sp,4] ; loads third arg into w0 add w0, w1, w0 ; adds w1 to w0, result in w0
add sp, sp, 16 ; adds 16 to SP (deallocates) ret ; returnsThe code here saves the input arguments in the local stack, in case someone (or something) inside the function needs to use the registers W0…W2.
This prevents the original arguments from being overwritten because they might be needed again later.
And this is called Register Save Area.
From Procedure Call Standard for ARM64 (AArch64), 2013.
But the callee is not obligated to save them.
And this is a bit like the “Shadow Space” in: 1.14.2 page 129.
Why did the optimizing GCC 4.9 remove all the argument saving code?
Because it did extra optimization and concluded that arguments won't be needed again, and the registers W0…W2 won't be used.
And also we saw a pair of MUL / ADD instructions instead of one MADD.
1.14.4 MIPS
Optimizing GCC 4.4.5
.text:00000000 f: ; starts function f ; $a0 = a ; comment: first arg in $a0 ; $a1 = b ; comment: second arg in $a1 ; $a2 = c ; comment: third arg in $a2
mult $a1, $a0 ; multiply a0 * a1 ; multiplies $a1 by $a0, result in HI:LO mflo $v0 ; get low 32 bits of result into v0 ; moves low part of multiplication result to $v0 jr $ra ; return ; jumps to return address in $ra addu $v0, $a2, $v0 ; branch delay slot: v0 = v0 + c ; adds $a2 to $v0 (in delay slot)
; result returned in $v0 ; comment
.text:00000010 main: ; starts main
var_10 = -0x10 ; defines local var_10var_4 = -4 ; defines local var_4
lui $gp, (__gnu_local_gp >> 16) ; loads upper 16 bits of __gnu_local_gp into $gp addiu $sp, -0x20 ; subtracts 0x20 from $sp (allocates stack) la $gp, (__gnu_local_gp & 0xFFFF) ; loads lower part of __gnu_local_gp into $gp
sw $ra, 0x20+var_4($sp) ; stores $ra at [SP + 0x1C] sw $gp, 0x20+var_10($sp) ; stores $gp at [SP + 0x10]
; set c ; comment li $a2, 3 ; loads 3 into $a2 (third arg)
; set a ; comment li $a0, 1 ; loads 1 into $a0 (first arg)
jal f ; call f() ; jumps to f and links li $a1, 2 ; branch delay slot: set b ; loads 2 into $a1 (second arg) in delay slot
; result now in $v0 ; comment
lw $gp, 0x20+var_10($sp) ; loads $gp from [SP + 0x10] lui $a0, ($LC0 >> 16) ; loads upper bits of $LC0 into $a0 lw $t9, (printf & 0xFFFF)($gp) ; loads printf address into $t9 la $a0, ($LC0 & 0xFFFF) ; loads lower part of $LC0 into $a0
jalr $t9 ; jumps to printf address in $t9 move $a1, $v0 ; branch delay slot: pass result to printf ; moves $v0 to $a1 (argument for printf)
lw $ra, 0x20+var_4($sp) ; loads $ra from [SP + 0x1C] move $v0, $zero ; sets $v0 to 0 jr $ra ; jumps to $ra (return) addiu $sp, 0x20 ; branch delay slot: restore stack ; adds 0x20 to $spThe first four arguments for the function are sent in 4 registers, which are the registers that start with A-.
There are two special registers in the MIPS architecture:
HI and LO — these are where the 64-bit multiplication result is placed while MULT is running.
These registers can only be accessed using MFLO and MFHI instructions.
Here MFLO takes the low part of the multiplication result and places it in $v0.
The high part of the multiplication result (in HI) is discarded.
And this is normal because we are dealing with 32-bit int.
And then the ADDU instruction (“addition without sign”) adds the value of the third argument to the result.
There are two addition instructions in MIPS: ADD and ADDU.
The difference is not in signed or unsigned…
The difference is that ADD can do exception if overflow occurs, and this is sometimes useful.
But ADDU doesn't do exceptions.
And since C/C++ doesn't do exceptions in overflow,
That's why they used ADDU.
The 32-bit result is left in $v0.
In main() there is a new instruction: JAL (“Jump and Link”).
The difference between it and JALR:
- The JAL has a relative offset (relative address).
- The JALR jumps to the address inside a register.
And since f() and main() are in the same file, then the relative address of f is fixed and known.
If this article helped you, please share it with others!
Some information may be outdated





