Reverse Engineering for Beginners (CH1.5 Hello, world!) {Part_2}

Posted Oct 21, 2025 Updated Oct 24, 2025

By 0X_V3n0m

23 min read

ARM

The author mentioned that in all the experiments we conducted on ARM, several compilers were used:

The famous one in the Embedded Systems field: Keil version 6/2013.
Apple Xcode 4.6.3 IDE with the LLVM-GCC 4.2 compiler.
GCC 4.9 (Linaro) (for ARM64 architectures)

In all the examples in this book, ARM 32-bit code is used (including Thumb and Thumb-2 modes), unless stated otherwise.

When we talk about ARM 64-bit, we call it ARM64.

Keil 6/2013 Without Optimization (in ARM Mode)

Let’s start compiling our example on Keil:

C/C++


armcc.exe --arm --c90 -O0 1.c

The armcc compiler outputs Assembly in Intel Syntax, but with ARM-specific macros. What matters more to us is to see the instructions exactly as they are, so let’s look at the result in IDA:

Assembly


.text:00000000                         main
.text:00000000 10 40 2D E9             STMFD             SP!, {R4,LR}
.text:00000004 1E 0E 8F E2             ADR               R0, aHelloWorld ; "hello, world"
.text:00000008 15 19 00 EB             BL                __2printf
.text:0000000C 00 00 A0 E3             MOV               R0, #0
.text:00000010 10 80 BD E8             LDMFD             SP!, {R4,PC}
.text:000001EC 68 65 6C 6C+aHelloWorld DCB "hello, world",0

In this example, we can easily see that each instruction is 4 bytes in size. This is because we compiled it in ARM mode, not Thumb.

The first instruction:

Assembly


STMFD SP!, {R4, LR}

This means:

"Store Multiple Full Descending"

1. The SP (Stack Pointer) is decreased enough to save the values.
2. The values in R4 and LR are written into the Stack.

It’s similar to the `PUSH` command in x86,

but the difference is that here you can push multiple registers at the same time.

But be careful, the armcc compiler outputs PUSH {r4,lr} for simplicity, but this is not accurate. The PUSH command is only available in Thumb mode.

That’s why in IDA we see it in the real form: STMFD.

The second instruction:

Assembly


ADR R0, aHelloWorld

This adds or subtracts the value in the PC (Program Counter) to get the offset for the string `"hello, world"`.

This is called Position-Independent Code (code that is not tied to a fixed address).

This code can be executed anywhere in memory because it relies on the difference (offset) between the code’s location and the data’s location.

This offset is calculated at runtime, so once we add the current instruction address (from PC), we can reach the actual address of our string.

The next instruction:

Assembly


BL __2printf

This makes a call to the `printf()` function, and here’s how it works:

It saves the address after the BL (0xC) in the LR (Link Register).
Then, it transfers control to the address of printf() by writing it into the PC.

When printf() finishes, it needs to return, and that’s done by the address stored in LR.

The difference here is that ARM processors (which are RISC) store the return address in LR, while x86 processors (CISC) place it on the stack.

The author will explain this in more detail in another section.

By the way, the BL instruction cannot store a full 32-bit address because it only has 24 bits for this space.

Since each instruction in ARM is 4 bytes (32-bit), it is placed on an address that is divisible by 4 (i.e., the last 2 bits are zero). These bits are ignored, leaving us with 26 bits to use as an offset.

This is enough to cover about ±32 MB around the current PC.

Afterwards, MOV R0, #0 writes the value 0 into register R0.

This is because the main function returns 0 at the end, and this return value is stored in R0.

The final instruction:

Assembly


LDMFD SP!, {R4,PC}

This reads the values from the stack and stores them in R4 and PC, then increments SP — essentially performing a POP.

Note:

The first STMFD instruction stored (R4 and LR) on the stack.
The LDMFD instruction returns (R4 and PC).

This is logical because initially, LR points to the return address from printf, and later it’s moved to PC to return control to the main caller.

After main finishes, the control is returned to the OS or CRT.

That’s why there’s no need to write BX LR at the end of the function.

Finally, DCB is a directive in assembly used to define an array of bytes or strings, similar to DB in x86 assembly.

Non-optimizing Keil 6/2013 (Thumb mode)

Let’s compile the same example, but this time in Thumb mode:

Terminal


armcc.exe --thumb --c90 -O0 1.c

The result in IDA looked like this:

Assembly


.text:00000000             main
.text:00000000 10 B5         PUSH        {R4,LR}
.text:00000002 C0 A0         ADR         R0, aHelloWorld ; "hello, world"
.text:00000004 06 F0 2E F9   BL          __2printf
.text:00000008 00 20         MOVS        R0, #0
.text:0000000A 10 BD         POP         {R4,PC}
.text:00000304 68 65 6C 6C+aHelloWorld DCB "hello, world",0

We can easily notice that the opcodes are all 2 bytes (16 bits) in size, which means this code is indeed in Thumb mode.

But keep in mind that the BL instruction here is composed of two instructions (each 16 bits), as it’s not possible to fit the full offset for printf() within a single 16-bit space.

Here’s what happens:

The first 16 bits load the top 10 bits of the offset.
The second instruction loads the lower 11 bits of the offset.

Since every Thumb instruction is 2 bytes, it means no instruction can be placed at an odd address. Therefore, the last bit of the address is discarded during encoding.

In the end, the BL in Thumb mode can encode an offset of around:

current_PC ± approximately 2 MB.

As for the other instructions in the example:

PUSH / POP work here just like STMFD / LDMFD we explained earlier,
but the difference is that the SP is not explicitly remembered.
ADR does the same job.
MOVS R0, #0 stores 0 in R0 so the function returns 0 at the end.

Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

The author here used Xcode 4.6.3, but this time enabled Optimization (maximum level) using the switch:

Terminal

-O3

Without optimization, unnecessary code is generated, so the author opted for the version with the fewest possible instructions.

Listing 1.27: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

Assembly


__text:000028C4                       _hello_world
__text:000028C4 80 40 2D E9   STMFD    SP!, {R7,LR}
__text:000028C8 86 06 01 E3   MOV      R0, #0x1686
__text:000028CC 0D 70 A0 E1   MOV      R7, SP
__text:000028D0 00 00 40 E3   MOVT     R0, #0
__text:000028D4 00 00 8F E0   ADD      R0, PC, R0
__text:000028D8 C3 05 00 EB   BL       _puts
__text:000028DC 00 00 A0 E3   MOV      R0, #0
__text:000028E0 80 80 BD E8   LDMFD    SP!, {R7,PC}

__cstring:00003F62 48 65 6C 6C+aHelloWorld_0 DCB "Hello world!",0

The STMFD and LDMFD instructions are already familiar to us.

As for MOV, here it writes the value 0x1686 into R0, which is an offset pointing to the location of the string `"Hello world!"`.

The R7 register (as mentioned in the iOS ABI Function Call Guide - 2010) is used as the Frame Pointer (which will be explained later).

The MOVT R0, #0 (MOVe Top) writes zero into the upper 16 bits of the register.

Why?

In ARM mode, a regular MOV only writes to the lower 16 bits (the first 16 bits).

But if you want to write to the higher part of the register, you must use MOVT.

However, in this case, using MOVT was unnecessary, because the initial MOV already set the upper part to zero.

This is a small mistake from the compiler (an unnecessary addition).

Then:

Assembly


ADD R0, PC, R0

This adds the value of PC to R0 to get the full address of the string.

This follows the same idea of position-independent code that we discussed earlier.

The BL _puts instruction calls the puts() function instead of printf().

LLVM here replaced printf with puts, and this makes sense because printf("Hello world!") and puts("Hello world!") produce the same result as long as there are no format specifiers like %d or %s.

But if there is a %, the result will differ.

The reason for this replacement?

Because puts is faster — it simply prints the string without looking for format specifiers within it.

At the end:

Assembly


MOV R0, #0

This writes zero into R0 as the return value.

Optimizing Xcode 4.6.3 (LLVM) (ARM mode) - Thumb-2 Mode

By default, Xcode 4.6.3 generates code for Thumb-2 in this manner:

Assembly


__text:00002B6C                 _hello_world
__text:00002B6C 80 B5           PUSH       {R7,LR}
__text:00002B6E 41 F2 D8 30     MOVW       R0, #0x13D8
__text:00002B72 6F 46           MOV        R7, SP
__text:00002B74 C0 F2 00 00     MOVT.W     R0, #0
__text:00002B78 78 44           ADD        R0, PC
__text:00002B7A 01 F0 38 EA     BLX        _puts
__text:00002B7E 00 20           MOVS       R0, #0
__text:00002B80 80 BD           POP        {R7,PC}

__cstring:00003E70 48 65 6C...  "Hello world!",0xA,0

In Thumb mode, the BL and BLX instructions are composed of pairs of 16-bit instructions.

However, in Thumb-2, these instructions were extended to allow 32-bit instructions.

This is evident because Thumb-2 instructions always start with the code 0xFx or 0xEx.

But in IDA, the byte order is reversed because ARM processors store bytes in reverse order:

In ARM or ARM64 mode: Byte order is 4-3-2-1.
In Thumb mode: 2-1.
In a pair of 16-bit instructions in Thumb-2: 2-1-4-3.

That’s why we see that instructions like MOVW, MOVT.W, and BLX all start with 0xFx.

For example:

Assembly


MOVW R0, #0x13D8

This places a 16-bit value in the lower part of the R0 register and zeroes out the upper bits.

Similarly, MOVT.W R0, #0 works the same as the MOVT in the previous example, but it’s the Thumb-2 version.

The difference is that here we use BLX instead of BL.

The difference between them is that BLX not only transfers execution to the puts() function and stores the address in LR, but it also switches the processor mode from Thumb or Thumb-2 to ARM (or vice versa).

This instruction is here because the location where the code is going has the following code:

Assembly


__symbolstub1:00003FEC    _puts
__symbolstub1:00003FEC    44 F0 9F E5    LDR PC, =__imp__puts

This is simply a jump to the location where the real address of the puts() function is stored in the import section.

You might ask: “Why don’t we just call puts() directly instead of this roundabout way?”

The answer is that this method saves space.

Almost every program uses dynamic libraries (like DLLs in Windows, .so in Linux, or .dylib in macOS).

These libraries contain ready-made functions like puts().

In the executable file (like EXE, ELF, or Mach-O), there’s an import section, which contains the names of the functions or variables that the program imports from other libraries, along with the name of the library they come from.

The OS loader reads this list and retrieves the real addresses of these functions from memory.

In our case, __imp__puts is a 32-bit variable that stores the address of the puts() function in memory.

Then, the LDR instruction reads this address and puts it in PC so execution transfers to the function.

That’s why instead of writing the address of puts() every time, we write it once in a designated place.

Also, it’s not possible to place the full 32-bit value into a register in a single instruction without touching memory.

So the optimal solution is to create a small function in ARM mode whose job is to jump to the original function in the library.

This is called a Thunk Function.

The code in Thumb mode calls this small function.

By the way, in the previous example (which was in ARM mode), the switch happens via a regular BL because the mode doesn’t change (that’s why the "X" is not in the name).

About Thunk Functions:

Many people get confused about Thunk functions because the name is strange.

But the idea is simple: a Thunk is an adapter or a wrapper between two different modes or systems.

That’s why sometimes they’re also called Wrappers (around another function).

Examples:

According to researcher P.Z. Ingerman, who invented the Thunk in 1961, a thunk is "a piece of code that calculates a certain value and leaves the address of the result in a known place."
Later, Microsoft and IBM also used the term when they made 16-bit and 32-bit systems work together (like WOW – Windows on Windows).
For example, from the LAPACK library (used for linear algebra) written in FORTRAN, C/C++ developers love to use it, but it’s impossible to rewrite all the code. So they create small functions in C (thunks), and these functions call the original FORTRAN functions from within.

Example:

Assembly


double Blas_Dot_Prod(const LaVectorDouble &dx, const LaVectorDouble &dy){
    assert(dx.size() == dy.size());
    integer n = dx.size();
    integer incx = dx.inc(), incy = dy.inc();
    return F77NAME(ddot)(&n, &dx(0), &incx, &dy(0), &incy);
}

These small functions are also called Wrappers.

ARM64

GCC

Let’s compile this example using GCC 4.8.1 on the ARM64 architecture:

Assembly


0000000000400590 main:
400590:   a9bf7bfd   stp    x29, x30, [sp,#-16]!
400594:   910003fd   mov    x29, sp
400598:   90000000   adrp   x0, 400000 _init-0x3b8
40059c:   91192000   add    x0, x0, #0x648
4005a0:   97ffffa0   bl     400420 puts@plt
4005a4:   52800000   mov    w0, #0x0
4005a8:   a8c17bfd   ldp    x29, x30, [sp],#16
4005ac:   d65f03c0   ret

In the .rodata section (which contains the string data):

Assembly


400640 01000200 00000000 48656c6c6f210a00 ........Hello!..

In ARM64, there’s neither Thumb nor Thumb-2 mode, meaning everything is in ARM mode only, and all instructions are 32-bit in size.

The number of registers has doubled. The registers here are 64-bit and begin with X, whereas the lower part of them (32-bit only) is written with W.

The STP (Store Pair) instruction stores two registers at once in the stack: here X29 and X30, and of course, they can be stored anywhere in memory, but here it is written as SP (Stack Pointer). This means this pair is stored in the stack itself.

Since the register is 64-bit, each of them is 8 bytes, so together they occupy 16 bytes.

The exclamation mark ! after the operand means that this value (16) is subtracted from SP first, and then the values are written to the stack — this is called pre-index. (This is the opposite of post-index, which increases SP afterward).

Comparing with x86, the first instruction here is essentially the same as:

Assembly


PUSH X29
PUSH X30

X29 is the Frame Pointer (FP), and X30 is the Link Register (LR). These are stored at the beginning of the function (prologue) and restored at the end (epilogue).

The second instruction mov x29, sp copies the address of the Stack Pointer into X29 to prepare the stack frame for the function.

Next, we have the ADRP and ADD instructions. Both are used to place the address of the string "Hello!" into register X0 (because the first parameter of the function is passed in X0).

There’s no single ARM instruction that can place a large number or a full address into a register (because the instruction size is limited to 4 bytes).

Thus, it is done in two steps:

ADRP places the address of the page (4KB Page) containing the string.
ADD adds the remaining part of the address.

The resulting address:

Assembly


0x400000 + 0x648 = 0x400648

This is indeed the location of the string "Hello!" that we found in .rodata.

After that, BL puts@plt is used to call the puts() function (as we saw earlier).

Then MOV W0, #0 places zero into register W0, which is the lower 32 bits of register X0:

Ascii


High 32 bits | Low 32 bits
       X0
              |   W0

The function returns a result in X0, and since main() returns an int (which is 32 bits), it’s sufficient to fill only the lower part W0.

To confirm, let’s modify the example to make it return a uint64_t (64 bits):



#include <stdio.h>
#include <stdint.h>

uint64_t main() {
    printf("Hello!"\n);
    return 0;
}

C/C++


include <stdio.h>
#include <stdint.h>

uint64_t main() {
    printf("Hello!"\n);
    return 0;
}

You will get the same result, but now this instruction has become:

Assembly


4005a4: d2800000  mov x0, #0x0

This means that when the function returns 64 bits, the value must be written in X0, not W0.

Next, LDP X29, X30, [SP], #16 reads the values we previously stored (X29 and X30) from the stack. But here there’s no exclamation mark — this means the values are pulled first, and then SP is increased by 16 (this is called post-index).

Finally, RET is a new instruction in ARM64. It performs the same job as BX LR, but with an extra bit that tells the processor this is a "return from function," not just a jump, making it execute faster.

Because the function is very simple, even in optimization mode (Optimized), GCC produces exactly the same code.

MIPS

There is a very important concept in the MIPS architecture called the “Global Pointer” (GP).

As we know, every instruction in MIPS is 32 bits in size, and that means it’s impossible to include a full 32-bit address inside a single instruction.

That’s why we have to use two instructions to load the full address (just like what GCC did in the previous example when loading the address of a text string).

But we can load data from an address within the range from register - 32768 up to register + 32767 using only one instruction.

That’s because 16 bits of signed offset can be encoded inside a single instruction.

So, we can dedicate a special register for this purpose, and also allocate a 64KiB space that contains the most frequently used data.

This register is called the “Global Pointer”, and it points exactly to the middle of that 64KiB space.

That space usually contains global variables and addresses of ready-made functions like printf().

That’s because the GCC developers decided that getting the address of any function should take only one instruction instead of two.

In ELF files (which are the executable file format in Linux), this 64KiB space is divided into two sections:

.sbss (“small BSS”) for uninitialized data
.sdata (“small data”) for initialized data

This means the programmer can choose which data should be accessed quickly and place it either in .sdata or .sbss as needed.

Some old programmers might remember the MS-DOS system, which divided the entire memory into 64KiB blocks — it’s basically the same idea here.

This concept isn’t exclusive to MIPS; at least the PowerPC architecture also used this same technique.

Optimizing GCC

Now let’s look at this example that explains the idea called the “Global Pointer” in the MIPS architecture.

Listing 1.32: Optimizing GCC 4.4.5 (assembly output)

Assembly


1 $LC0:
2 ; \000 is zero byte in octal base:
3     .ascii "Hello, world!\012\000"
4 main:
5 ; function prologue.
6 ; set the GP:
7     lui              $28,%hi  __gnu_local_gp
8     addiu         $sp,$sp,-32
9     addiu         $28,$28,%lo __gnu_local_gp
10 ; save the RA to the local stack:
11    sw                $31,28($sp)
12 ; load the address of the puts() function from the GP to $25:
13    lw                  $25,%call16(puts)($28)
14 ; load the address of the text string to $4 ($a0):
15    lui                   $4,%hi($LC0)
16 ; jump to puts(), saving the return address in the link register:
17    jalr                    $25
18    addiu                $4,$4,%lo($LC0) ; branch delay slot
19 ; restore the RA:
20    lw                      $31,28($sp)
21 ; copy 0 from $zero to $v0:
22    move                 $2,$0
23 ; return by jumping to the RA:
24    j                          $31
25 ; function epilogue:
26    addiu                   $sp,$sp,32 ; branch delay slot + free local stack

Non-optimizing GCC

The Non-optimizing GCC version tends to be more verbose in its assembly output.

Listing 1.34: Non-optimizing GCC 4.4.5 (assembly output)

Assembly


$LC0:
    .ascii  "Hello, world!\012\000"

main:
; function prologue.
; save the RA ($31) and FP in the stack:
    addiu   $sp, $sp, -32
    sw      $31, 28($sp)
    sw      $fp, 24($sp)

; set the FP (stack frame pointer):
    move    $fp, $sp

; set the GP:
    lui     $28, %hi(__gnu_local_gp)
    addiu   $28, $28, %lo(__gnu_local_gp)

; load the address of the text string:
    lui     $2, %hi($LC0)
    addiu   $4, $2, %lo($LC0)

; load the address of puts() using the GP:
    lw      $2, %call16(puts)($28)
    nop

; call puts():
    move    $25, $2
    jalr    $25
    nop     ; branch delay slot

; restore the GP from the local stack:
    lw      $28, 16($fp)

; set register $2 ($V0) to zero:
    move    $2, $0

; function epilogue.
; restore the SP:
    move    $sp, $fp

; restore the RA:
    lw      $31, 28($sp)

; restore the FP:
    lw      $fp, 24($sp)
    addiu   $sp, $sp, 32

; jump to the RA:
    j       $31
    nop     ; branch delay slot

It’s clear that the functions generating these listings aren’t very important to GCC users, which is probably why they still contain some minor visual bugs that haven’t been fixed.

Here, we can see that the FP register is used as a stack frame pointer.

We can also notice three NOPs (empty instructions) — the second and third of them appear after branch instructions.

It’s possible that GCC always places NOPs after branch instructions (because of branch delay slots). When optimization is enabled, it removes them — but in this case, it left them as they are.

Listing 1.35: Non-optimizing GCC 4.4.5 (IDA)

Assembly


.text:00000000 main:
.text:00000000
.text:00000000 var_10 = -0x10
.text:00000000 var_8  = -8
.text:00000000 var_4  = -4
.text:00000000

; function prologue.
; save the RA and FP in the stack:
.text:00000000 addiu $sp, -0x20
.text:00000004 sw    $ra, 0x20+var_4($sp)
.text:00000008 sw    $fp, 0x20+var_8($sp)

; set the FP (stack frame pointer):
.text:0000000C move  $fp, $sp

; set the GP:
.text:00000010 la    $gp, __gnu_local_gp
.text:00000018 sw    $gp, 0x20+var_10($sp)

; load the address of the text string:
.text:0000001C lui   $v0, (aHelloWorld >> 16)   # "Hello, world!"
.text:00000020 addiu $a0, $v0, (aHelloWorld & 0xFFFF)  # "Hello, world!"

; load the address of puts() using the GP:
.text:00000024 lw    $v0, (puts & 0xFFFF)($gp)
.text:00000028 or    $at, $zero                 ; NOP

; call puts():
.text:0000002C move  $t9, $v0
.text:00000030 jalr  $t9
.text:00000034 or    $at, $zero                 ; NOP

; restore the GP from local stack:
.text:00000038 lw    $gp, 0x20+var_10($fp)

; set register $2 ($V0) to zero:
.text:0000003C move  $v0, $zero

; function epilogue.
; restore the SP:
.text:00000040 move  $sp, $fp

; restore the RA:
.text:00000044 lw    $ra, 0x20+var_4($sp)

; restore the FP:
.text:00000048 lw    $fp, 0x20+var_8($sp)
.text:0000004C addiu $sp, 0x20

; jump to the RA:
.text:00000050 jr    $ra
.text:00000054 or    $at, $zero                 ; NOP

The interesting part here is that IDA recognized that the two instructions LUI and ADDIU together actually perform a Load Address operation, so it combined them into a single pseudo-instruction called LA.

This isn’t a real instruction in MIPS, but a pseudo-instruction representing both combined.

The LA instruction occupies 8 bytes, since it is actually composed of two real instructions.

Also, IDA doesn’t display NOPs as NOP, so they appear as OR $AT, $ZERO.

That means it’s performing an OR between register $AT and zero — so the result remains unchanged, effectively doing nothing.

Like several other architectures, MIPS doesn’t have a real NOP instruction, so it uses an OR operation like this as a replacement.

The Role of the Stack Frame in This Example

The text string’s address is passed through a register — so why do we even need a local stack?

The reason is that the values of RA and GP must be stored somewhere before calling printf() (or puts() in this case), because the call might modify them.

So, we use the stack to save them.

If this function were a leaf function (meaning it doesn’t call any other functions), we could skip the entire prologue and epilogue altogether.

Optimizing GCC: load it into GDB

GCC that performs Optimization: loading it into GDB

Listing 1.36: sample GDB session


root@debian-mips:~# gcc hw.c -O3 -o hw
root@debian-mips:~# gdb hw
GNU gdb (GDB) 7.0.1-debian
...
Reading symbols from /root/hw...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x400654
(gdb) run
Starting program: /root/hw
Breakpoint 1, 0x00400654 in main ()
(gdb) set step-mode on
(gdb) disas
Dump of assembler code for function main:
0x00400640 <main+0>:  lui     gp,0x42
0x00400644 <main+4>:  addiu   sp,sp,-32
0x00400648 <main+8>:  addiu   gp,gp,-30624
0x0040064c <main+12>: sw      ra,28(sp)
0x00400650 <main+16>: sw      gp,16(sp)
0x00400654 <main+20>: lw      t9,-32716(gp)
0x00400658 <main+24>: lui     a0,0x40
0x0040065c <main+28>: jalr    t9
0x00400660 <main+32>: addiu   a0,a0,2080
0x00400664 <main+36>: lw      ra,28(sp)
0x00400668 <main+40>: move    v0,zero
0x0040066c <main+44>: jr      ra
0x00400670 <main+48>: addiu   sp,sp,32
End of assembler dump.
(gdb) s
0x00400658 in main ()
(gdb) s
0x0040065c in main ()
(gdb) s
0x2ab2de60 in printf () from /lib/libc.so.6
(gdb) x/s $a0
0x400820: "hello, world"
(gdb)

The GCC that performs optimization (Optimizing GCC) is executed with this code inside GDB.

In this example, we see how the code is executed step by step inside the debugger.

We start compiling with:

gcc hw.c -O3 -o hw

This compiles the file hw.c with the highest optimization level (-O3).

Then we run the debugger:

gdb hw

And we set a breakpoint at the beginning of main:

(gdb) b main

After the program starts, we look at the assembler code (the code generated by the compiler).

disas (or disassemble) prints the machine code generated for the function main.

This part of the code:


0x00400654 <main+20>: lw t9,-32716(gp)
0x00400658 <main+24>: lui a0,0x40
0x0040065c <main+28>: jalr t9
0x00400660 <main+32>: addiu a0,a0,2080

This is what calls the printf() (or puts() depending on the program) function.

The register a0 holds the address of the string "hello, world".

This is confirmed when we type:


(gdb) x/s $a0
0x400820: "hello, world"

Here, GDB displays the text stored at the address held in the a0 register.

1.5.5 Summary:

The main difference between x86/ARM and x64/ARM64 code is that the pointer that points to the string became 64-bit instead of 32-bit.

This is because modern processors have become 64-bit, since memory is now cheaper and programs require more of it.

As a result, computers today can install much more memory than what 32-bit pointers could address.

Therefore, all pointers are now 64-bit.

Reverse, Books

Reverse Books

This post is licensed under CC BY 4.0 by the author.