Post

Reverse Engineering for Beginners (CH1.5 Hello, world!) {Part_1}

Reverse Engineering for Beginners (CH1.5 Hello, world!) {Part_1}

1.5 Hello, world!

HelloWorld

The author used a famous example to print Hello, World, and that was the example:

C/C++

#include <stdio.h>
int main()
{
    printf("hello, world\n"); 
    return 0;
}

1.5.1 x86 β€” MSVC

Let’s compile this code using MSVC 2010:

Terminal

cl 1.cpp /Fa1.asm

(The option /Fa makes the compiler generate an Assembly listing file.)

Here’s the generated code:

Assembly

CONST SEGMENT
$SG3830 DB 'hello, world', 0AH, 00H
CONST ENDS

PUBLIC _main
EXTRN _printf:PROC
; Function compile flags: /Odtp 

_TEXT SEGMENT
_main PROC
    push ebp
    mov  ebp, esp
    push OFFSET $SG3830
    call _printf
    add  esp, 4
    xor  eax, eax
    pop  ebp
    ret  0
_main ENDP
_TEXT ENDS

Note something: MSVC generates Assembly code using the Intel syntax, and we’ll explain the difference between it and AT&T syntax later.

The compiler produces a file called 1.obj, which is later linked to create 1.exe.

That file contains several sections:

  • CONST β†’ for constant data (like strings).
  • _TEXT β†’ for the code itself.

The string "hello, world" in C/C++ is of type const char[], but since it doesn’t have an explicit name, the compiler gives it an internal name like $SG3830.

So we can write the code like this:

C/C++

#include <stdio.h>
const char $SG3830[] = "hello, world\n";

int main()
{
    printf($SG3830);
    return 0;
}

If we look again at the Assembly, we’ll notice the string ends with a small byte (0), and that’s normal for C/C++ strings.


Analyzing the Assembly code

1. CONST SEGMENT

This part contains constant data (like the texts inside the program).

The computer stores the sentence "hello, world" here, and gives it an internal name so the compiler can access it later.

Assembly

$SG3830 DB 'hello, world', 0AH, 00H
  • $SG3830 β†’ is the name chosen by the compiler.
  • DB means β€œDefine Bytes”, i.e., store bytes.
  • 'hello, world' β†’ the actual text.
  • 0AH = newline code \n.
  • 00H = zero byte marking β€œend of string”.

2. PUBLIC _main

This means there’s a function named main that will be public (available to the whole program).

3. EXTRN _printf:PROC

This means there’s an external function called printf that’s not written here but will come from another library (the C standard library).

After that comes the _TEXT SEGMENT part β€” this is where the actual executable code resides.


_main PROC

This marks the beginning of the main() function.

push ebp

This is the first line in almost any function. The computer saves the old value of ebp (the base pointer) to return to it later.

mov ebp, esp

Here we say: β€œMake the base pointer (ebp) point to the same place as the stack pointer (esp).”

That means we’ve started a β€œnew frame” on the stack for this function’s work.

push OFFSET $SG3830

Here we push the address of the string "hello, world" onto the stack.

After printf() finishes and returns, the address we pushed is still on the stack β€” but we no longer need it, so we fix the stack pointer by:

Assembly

ADD ESP, 4

Why 4? Because the program is 32-bit, and an address takes exactly 4 bytes. If it were 64-bit, we’d need 8 bytes.

The instruction:

Assembly

ADD ESP, 4

is almost the same as:

Assembly

POP register

but without actually using a register.

Some compilers (like Intel C++ Compiler) prefer:

Assembly

POP ECX

to make the code smaller (1 byte instead of 3).

Example from Oracle RDBMS code:

Assembly

.text:0800029A  push ebx
.text:0800029B  call qksfroChild
.text:080002A0  pop  ecx

Even MSVC can do that sometimes:

Assembly

.text:0102106F  push 0
.text:01021071  call ds:time
.text:01021077  pop ecx

After calling printf(), the original C/C++ code has return 0;

In Assembly, that turns into:

Assembly

XOR EAX, EAX

The word XOR means β€œExclusive OR”, but the compiler uses it instead of MOV EAX, 0 because the code becomes shorter (2 bytes instead of 5).

Out of curiosity, I wanted to know why XOR EAX, EAX is shorter than MOV EAX, 0. Turns out the reason is simple β€” when encoded in x86 machine code:

Assembly

31 C0

That’s only 2 bytes.

While MOV EAX, 0 becomes:

hex

B8 00 00 00 00

That’s 5 bytes in total (1 + 4).

This was just something extra I wanted to understand better, so I decided to write it down as well.

Some other compilers use:

Assembly

SUB EAX, EAX

which means β€œsubtract EAX from itself” β†’ result is also zero.


Finally:

RET

This returns control to the program that called main() (usually the C runtime code), which then returns back to the operating system.


GCC

Now let’s try compiling the same C/C++ β€œHello, world” code, but this time using GCC on a Linux system, with this command:

Terminal

gcc 1.c -o 1

Then we’ll use a program called IDA Disassembler to see how the function main() was built after compilation. IDA uses the same Intel-syntax style as MSVC.

Assembly

main                 proc near
var_10              = dword ptr -10h

    push    ebp
    mov     ebp, esp
    and     esp, 0FFFFFFF0h
    sub     esp, 10h
    mov     eax, offset aHelloWorld ; "hello, world\n"
    mov     [esp+10h+var_10], eax
    call    _printf
    mov     eax, 0
    leave
    retn
endp main

The result is almost identical to the code generated by MSVC. The address of the string "hello, world" (stored in the .data section) is first loaded into the EAX register, then stored on the stack.

Also, at the beginning of the function, there’s this line:

Assembly

AND ESP, 0FFFFFFF0h

Here, GCC performs something called stack alignment.

That means it ensures that the address of ESP is a multiple of 16 (i.e., ends with 0 or 0x0).

Why? Because the CPU reads memory in β€œblocks,” and if a block starts at a neatly aligned address (like 0x1000 instead of 0x1003), it’s much faster.

So this line aligns the stack for better performance.

Then we have this line:

Assembly

SUB ESP, 10h

This allocates 16 bytes on the stack (since 10h = 16). In reality, we only need 4 bytes, but the compiler reserves 16 to maintain proper alignment.

After that, the address of the string is stored on the stack directly without using PUSH.

The variable var_10 is a local variable, and it’s also used as the argument to the printf() function.

Then the function printf() is called.

When GCC is running without optimization, it uses:

Assembly

MOV EAX, 0

instead of shorter instructions like XOR EAX, EAX.

The last instruction LEAVE is equivalent to:

Assembly

MOV ESP, EBP
POP EBP

This restores the stack to its original state and recovers the previous EBP value that existed before the function started.


GCC: AT&T syntax

Now let’s see how this code looks when written in AT&T syntax. This style is more common on UNIX systems.

Terminal

cc -S 1_1.c

This command tells GCC to generate Assembly code instead of an executable file.

Here’s the generated code:

Assembly

.file "1_1.c"
.section .rodata
.LC0:
    .string "hello, world\n"
.text
.globl main
.type main, @function
main:
.LFB0:
    .cfi_startproc
    pushl %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl %esp, %ebp
    .cfi_def_cfa_register 5
    andl $-16, %esp
    subl $16, %esp
    movl $.LC0, (%esp)
    call printf
    movl $0, %eax
    leave
    ret
    .cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits

The code contains many directives starting with a dot (.). These are called macros, and we don’t need to worry about them now. We can safely ignore them β€” except for .string, because it’s what stores the text "hello, world\n" in memory as a C-string (ending with null).

After removing unnecessary lines, the simplified version looks like this:

Assembly

.LC0:
    .string "hello, world\n"
main:
    pushl   %ebp
    mov     %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movl    $.LC0, (%esp)
    call    printf
    movl    $0, %eax
    leave
    ret

Differences between Intel and AT&T syntax

1- The order of source and destination is reversed:

  • Intel: <instruction> <destination>, <source>
  • AT&T: <instruction> <source>, <destination>

So, in Intel:

mov eax, ebx

In AT&T, it becomes:

movl %ebx, %eax

To remember: Think of Intel like an β€œequals sign (=)” and AT&T like an β€œarrow →”, meaning the value moves from left to right.


2 - In AT&T:

  • Registers start with % (e.g., %eax).
  • Constants start with $ (e.g., $16).
  • Parentheses ( ) are used instead of square brackets [ ].

3 - AT&T also adds a letter at the end of each instruction to indicate data size:

  • q β†’ quad (64-bit)
  • l β†’ long (32-bit)
  • w β†’ word (16-bit)
  • b β†’ byte (8-bit)

Back to our code:

The generated code looks very similar to what IDA produces, but there’s a small difference:

The value 0FFFFFFF0h appears here as $-16. They’re actually the same:

  • In decimal: -16
  • In hexadecimal: 0xFFFFFFF0

Both represent the same number in 32-bit systems.


Another note: The return value is set using MOV instead of XOR.

So we see:

movl $0, %eax

This copies the value 0 into %eax. The word β€œmove” is a bit misleading β€” it doesn’t move, it copies. In other architectures, you’ll find similar instructions called β€œLOAD” or β€œSTORE”.


String patching (Win32)

We can easily locate the string β€œhello, world” inside the executable file. It looks something like this:

Hex Dump

h  e  l  l  o  ,     w  o  r  l  d  \n  \0
68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 0A 00

If we wanted to translate it into Spanish, it would look like:

Hex Dump

h  o  l  a  \n  \0
68 6F 6C 61 0A 00

The original string length is 14 bytes (like in the example above), and the new string is 6 bytes long. You can replace it directly and leave the remaining bytes as-is, or pad them with 00 until the space is filled. Example after modification:

Hex Dump

68 6F 6C 61 0A 00 00 00 00 00 00 00 00 00

If we wanted to insert a longer message, there might be some null bytes (00) after the original English text. It’s not always safe to overwrite them β€” they might be used by CRT code. So only do it if you know what you’re doing.

The author shared a real story about software cracking:

There was an image-processing program that, when unregistered, would add watermarks like: β€œThis image was processed by the evaluation version of [Program Name]”.

By coincidence, they found this string inside the executable, and replaced it with spaces β€” the watermark disappeared! Technically, the program was still adding the watermark, but the text became invisible.


Software localization in the MS-DOS era

This method was common for translating MS-DOS programs into Russian during the 1980s and 1990s. It was suitable even for people unfamiliar with machine code or executable file formats.

The new text couldn’t be longer than the original, because adding bytes might overwrite nearby code or data. Russian words were often longer, so translated versions had many abbreviations to make text fit.

The same might have happened for other languages too. And with Delphi strings, the string length field also had to be updated if needed.


1.6 x86-64


Now we compile this time using 64-bit MSVC:

Assembly

$SG2989 DB  'hello, world', 0AH, 00H

main PROC
    sub     rsp, 40
    lea     rcx, OFFSET FLAT:$SG2989
    call    printf
    xor     eax, eax
    add     rsp, 40
    ret     0
main ENDP

  

In x86-64 all registers were extended to 64-bit, and their names now start with R.

To reduce stack usage (i.e., avoid repeated memory/cache access) there is a common convention to pass function arguments in registers rather than on the stack β€” commonly called fastcall.

That means some arguments are passed in registers, and the rest (if any) go on the stack.

On Win64, the first four arguments of any function are passed in these registers:

  • RCX
  • RDX
  • R8
  • R9

That’s what we see here: the pointer to the string passed to printf() is now passed in RCX instead of being pushed on the stack.

Also, pointers are now 64-bit, so they are passed in the 64-bit registers (the ones starting with R-).

For backward compatibility you can still access the lower 32-bit part via the E- prefix.

Example: the RAX / EAX / AX / AL hierarchy:

Byte number:
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”
β”‚7th β”‚6th β”‚5th β”‚4th β”‚3rd β”‚2nd β”‚1st β”‚0th β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                RAX (64-bit)            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚          EAX (32-bit)        β”‚         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    AX (16-bit)                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚   AH (8-bit) β”‚  AL (8-bit)             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The main() function returns an int, and in C/C++ int is still 32-bit. Therefore the compiler zeroes EAX (the 32-bit subregister) rather than the whole RAX to preserve compatibility.

Also, the function allocates 40 bytes on the stack. These 40 bytes are called the shadow space (explained later).


GCC: x86-64

Now let’s try GCC on a 64-bit Linux system:

Assembly

Listing 1.23: GCC 4.4.6 x64
.string "hello, world\n"

main:
    sub     rsp, 8
    mov     edi, OFFSET FLAT:.LC0 ; "hello, world\n"
    xor     eax, eax ; number of vector registers passed
    call    printf
    xor     eax, eax
    add     rsp, 8
    ret

  

On Linux/BSD/macOS the calling convention also passes arguments in registers. According to the System V ABI (used by Unix-like systems), the first six arguments are passed in registers:

Argument #Register
1RDI
2RSI
3RDX
4RCX
5R8
6R9

If there are more than six arguments the rest go on the stack as usual.

In the example above the pointer to the string is passed in EDI (the lower 32-bit part of RDI).

Why EDI and not RDI? β€” this is an optimization trick by the compiler:

  • Writing to the 32-bit subregister (e.g., EDI) automatically clears the upper 32 bits of the full 64-bit register (RDI).
  • This means a mov edi, imm instruction encodes smaller (5 bytes) than mov rdi, imm64 (7 bytes), saving space in the binary.

Example machine code bytes (from the object file) show this size saving:

Opcodes

.text:00000000004004D0  48 83 EC 08     sub  rsp, 8
.text:00000000004004D4   BF E8 05 40 00	mov  edi, offset format ; "hello, world"
.text:00000000004004D9   31 C0		xor  eax, eax
.text:00000000004004DB   E8 D8 FE FF FF	call _printf
.text:00000000004004E0   31 C0		xor  eax, eax
.text:00000000004004E2   48 83 C4 08	add  rsp, 8 
.text:00000000004004E6   C3		retn

As you see, the instruction writing into EDI at 0x4004D4 is 5 bytes long; writing a full 64-bit immediate into RDI would be 7 bytes β€” GCC chooses the shorter encoding because it’s safe (string addresses are typically below 4GB in these examples) and saves space.

Also note that EAX is zeroed before the call to printf(). According to the calling convention, the number of vector registers used must be placed in EAX for Unix x86-64 calls.


Address patching (Win64)

If we compile this example with MSVC 2013 and the /MD option (linking to external MSVCR*.DLL), the main() function is typically easy to find in the binary. The pointer load might look like:

Assembly

rcx , [0000000000002400]

As an experiment, if we increment that address by 1:

Assembly

rcx , [0000000000002401]

The program will read from the second byte of the string, and the output becomes ello, world instead of hello, world. Running the patched executable indeed prints that altered string.


This post is licensed under CC BY 4.0 by the author.

Trending Tags