0xV3n0m

Announcement

Welcome To My Personal Blog

1.5 Hello, world!

The author used a famous example to print Hello, World, and that was the example:

1
#include <stdio.h>
2
int main()
3
{
4
    printf("hello, world\n");
5
    return 0;
6
}

1.5.1 x86 — MSVC

Let’s compile this code using MSVC 2010:

Terminal


cl 1.cpp /Fa1.asm

(The option /Fa makes the compiler generate an Assembly listing file.)

Here’s the generated code:

Assembly

CONST SEGMENT $SG3830 DB 'hello, world', 0AH, 00H CONST ENDS PUBLIC _main EXTRN _printf ; Function compile flags: /Odtp

_TEXT SEGMENT _main PROC push ebp mov ebp, esp push OFFSET $SG3830 call _printf add esp, 4 xor eax, eax pop ebp ret 0 _main ENDP _TEXT ENDS

Note something: MSVC generates Assembly code using the Intel syntax, and we’ll explain the difference between it and AT&T syntax later.

The compiler produces a file called 1.obj, which is later linked to create 1.exe.

That file contains several sections:

CONST → for constant data (like strings).
_TEXT → for the code itself.

The string "hello, world" in C/C++ is of type const char[], but since it doesn’t have an explicit name, the compiler gives it an internal name like $SG3830.

So we can write the code like this:

1
#include <stdio.h>
2
const char $SG3830[] = "hello, world\n";
3

4
int main()
5
{
6
    printf($SG3830);
7
    return 0;
8
}

If we look again at the Assembly, we’ll notice the string ends with a small byte (0), and that’s normal for C/C++ strings.

Analyzing the Assembly code

1. `CONST SEGMENT`

This part contains constant data (like the texts inside the program).

The computer stores the sentence "hello, world" here, and gives it an internal name so the compiler can access it later.

1
$SG3830 DB 'hello, world', 0AH, 00H

$SG3830 → is the name chosen by the compiler.
DB means “Define Bytes”, i.e., store bytes.
'hello, world' → the actual text.
0AH = newline code \n.
00H = zero byte marking “end of string”.

2. `PUBLIC _main`

This means there’s a function named main that will be public (available to the whole program).

3. `EXTRN _printf:PROC`

This means there’s an external function called printf that’s not written here but will come from another library (the C standard library).

After that comes the _TEXT SEGMENT part — this is where the actual executable code resides.

`_main PROC`

This marks the beginning of the main() function.

`push ebp`

This is the first line in almost any function. The computer saves the old value of ebp (the base pointer) to return to it later.

`mov ebp, esp`

Here we say: “Make the base pointer (ebp) point to the same place as the stack pointer (esp).”

That means we’ve started a “new frame” on the stack for this function’s work.

`push OFFSET $SG3830`

Here we push the address of the string "hello, world" onto the stack.

After printf() finishes and returns, the address we pushed is still on the stack — but we no longer need it, so we fix the stack pointer by:

1
ADD ESP, 4

Why 4? Because the program is 32-bit, and an address takes exactly 4 bytes. If it were 64-bit, we’d need 8 bytes.

The instruction:

1
ADD ESP, 4

is almost the same as:

1
POP register

but without actually using a register.

Some compilers (like Intel C++ Compiler) prefer:

1
POP ECX

to make the code smaller (1 byte instead of 3).

Example from Oracle RDBMS code:

1
.text:0800029A  push ebx
2
.text:0800029B  call qksfroChild
3
.text:080002A0  pop  ecx

Even MSVC can do that sometimes:

1
.text:0102106F  push 0
2
.text:01021071  call ds:time
3
.text:01021077  pop ecx

After calling printf(), the original C/C++ code has return 0;

In Assembly, that turns into:

1
XOR EAX, EAX

The word XOR means “Exclusive OR”, but the compiler uses it instead of MOV EAX, 0 because the code becomes shorter (2 bytes instead of 5).

Out of curiosity, I wanted to know why XOR EAX, EAX is shorter than MOV EAX, 0. Turns out the reason is simple — when encoded in x86 machine code:

1
31 C0

That’s only 2 bytes.

While MOV EAX, 0 becomes:

1
B8 00 00 00 00

That’s 5 bytes in total (1 + 4).

This was just something extra I wanted to understand better, so I decided to write it down as well.

Some other compilers use:

1
SUB EAX, EAX

which means “subtract EAX from itself” → result is also zero.

Finally:

RET

This returns control to the program that called main() (usually the C runtime code), which then returns back to the operating system.

GCC

Now let’s try compiling the same C/C++ “Hello, world” code, but this time using GCC on a Linux system, with this command:

1
gcc 1.c -o 1

Then we’ll use a program called IDA Disassembler to see how the function main() was built after compilation. IDA uses the same Intel-syntax style as MSVC.

1
main                 proc near
2
var_10              = dword ptr -10h
3

4
    push    ebp
5
    mov     ebp, esp
6
    and     esp, 0FFFFFFF0h
7
    sub     esp, 10h
8
    mov     eax, offset aHelloWorld ; "hello, world\n"
9
    mov     [esp+10h+var_10], eax
10
    call    _printf
11
    mov     eax, 0
12
    leave
13
    retn
14
endp main

The result is almost identical to the code generated by MSVC. The address of the string "hello, world" (stored in the .data section) is first loaded into the EAX register, then stored on the stack.

Also, at the beginning of the function, there’s this line:

1
AND ESP, 0FFFFFFF0h

Here, GCC performs something called stack alignment.

That means it ensures that the address of ESP is a multiple of 16 (i.e., ends with 0 or 0x0).

Why? Because the CPU reads memory in “blocks,” and if a block starts at a neatly aligned address (like 0x1000 instead of 0x1003), it’s much faster.

So this line aligns the stack for better performance.

Then we have this line:

1
SUB ESP, 10h

This allocates 16 bytes on the stack (since 10h = 16). In reality, we only need 4 bytes, but the compiler reserves 16 to maintain proper alignment.

After that, the address of the string is stored on the stack directly without using PUSH.

The variable var_10 is a local variable, and it’s also used as the argument to the printf() function.

Then the function printf() is called.

When GCC is running without optimization, it uses:

1
MOV EAX, 0

instead of shorter instructions like XOR EAX, EAX.

The last instruction LEAVE is equivalent to:

1
MOV ESP, EBP
2
POP EBP

This restores the stack to its original state and recovers the previous EBP value that existed before the function started.

GCC: AT&T syntax

Now let’s see how this code looks when written in AT&T syntax. This style is more common on UNIX systems.

1
cc -S 1_1.c

This command tells GCC to generate Assembly code instead of an executable file.

Here’s the generated code:

1
.file "1_1.c"
2
.section .rodata
3
.LC0:
4
    .string "hello, world\n"
5
.text
6
.globl main
7
.type main, @function
8
main:
9
.LFB0:
10
    .cfi_startproc
11
    pushl %ebp
12
    .cfi_def_cfa_offset 8
13
    .cfi_offset 5, -8
14
    movl %esp, %ebp
15
    .cfi_def_cfa_register 5
16
    andl $-16, %esp
17
    subl $16, %esp
18
    movl $.LC0, (%esp)
19
    call printf
20
    movl $0, %eax
21
    leave
22
    ret
23
    .cfi_endproc
24
.LFE0:
25
.size main, .-main
26
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
27
.section .note.GNU-stack,"",@progbits

The code contains many directives starting with a dot (.). These are called macros, and we don’t need to worry about them now. We can safely ignore them — except for .string, because it’s what stores the text "hello, world\n" in memory as a C-string (ending with null).

After removing unnecessary lines, the simplified version looks like this:

1
.LC0:
2
    .string "hello, world\n"
3
main:
4
    pushl   %ebp
5
    mov     %esp, %ebp
6
    andl    $-16, %esp
7
    subl    $16, %esp
8
    movl    $.LC0, (%esp)
9
    call    printf
10
    movl    $0, %eax
11
    leave
12
    ret

Differences between Intel and AT&T syntax

1- The order of source and destination is reversed:

Intel: <instruction> <destination>, <source>
AT&T: <instruction> <source>, <destination>

So, in Intel:

mov eax, ebx

In AT&T, it becomes:

movl %ebx, %eax

To remember: Think of Intel like an “equals sign (=)” and AT&T like an “arrow →”, meaning the value moves from left to right.

2 - In AT&T:

Registers start with % (e.g., %eax).
Constants start with $ (e.g., $16).
Parentheses ( ) are used instead of square brackets [ ].

3 - AT&T also adds a letter at the end of each instruction to indicate data size:

q → quad (64-bit)
l → long (32-bit)
w → word (16-bit)
b → byte (8-bit)

Back to our code:

The generated code looks very similar to what IDA produces, but there’s a small difference:

The value 0FFFFFFF0h appears here as $-16. They’re actually the same:

In decimal: -16
In hexadecimal: 0xFFFFFFF0

Both represent the same number in 32-bit systems.

Another note: The return value is set using MOV instead of XOR.

So we see:

movl $0, %eax

This copies the value 0 into %eax. The word “move” is a bit misleading — it doesn’t move, it copies. In other architectures, you’ll find similar instructions called “LOAD” or “STORE”.

String patching (Win32)

We can easily locate the string “hello, world” inside the executable file. It looks something like this:

1
h  e  l  l  o  ,     w  o  r  l  d  \n  \0
2
68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 0A 00

If we wanted to translate it into Spanish, it would look like:

1
h  o  l  a  \n  \0
2
68 6F 6C 61 0A 00

The original string length is 14 bytes (like in the example above), and the new string is 6 bytes long. You can replace it directly and leave the remaining bytes as-is, or pad them with 00 until the space is filled. Example after modification:

1
68 6F 6C 61 0A 00 00 00 00 00 00 00 00 00

If we wanted to insert a longer message, there might be some null bytes (00) after the original English text. It’s not always safe to overwrite them — they might be used by CRT code. So only do it if you know what you’re doing.

The author shared a real story about software cracking:

There was an image-processing program that, when unregistered, would add watermarks like: “This image was processed by the evaluation version of [Program Name]”.

By coincidence, they found this string inside the executable, and replaced it with spaces — the watermark disappeared! Technically, the program was still adding the watermark, but the text became invisible.

Software localization in the MS-DOS era

This method was common for translating MS-DOS programs into Russian during the 1980s and 1990s. It was suitable even for people unfamiliar with machine code or executable file formats.

The new text couldn’t be longer than the original, because adding bytes might overwrite nearby code or data. Russian words were often longer, so translated versions had many abbreviations to make text fit.

The same might have happened for other languages too. And with Delphi strings, the string length field also had to be updated if needed.

1.6 x86-64

Now we compile this time using 64-bit MSVC:

1
$SG2989 DB  'hello, world', 0AH, 00H
2

3
main PROC
4
    sub     rsp, 40
5
    lea     rcx, OFFSET FLAT:$SG2989
6
    call    printf
7
    xor     eax, eax
8
    add     rsp, 40
9
    ret     0
10
main ENDP

In x86-64 all registers were extended to 64-bit, and their names now start with R.

To reduce stack usage (i.e., avoid repeated memory/cache access) there is a common convention to pass function arguments in registers rather than on the stack — commonly called fastcall.

That means some arguments are passed in registers, and the rest (if any) go on the stack.

On Win64, the first four arguments of any function are passed in these registers:

RCX
RDX
R8
R9

That’s what we see here: the pointer to the string passed to printf() is now passed in RCX instead of being pushed on the stack.

Also, pointers are now 64-bit, so they are passed in the 64-bit registers (the ones starting with R-).

For backward compatibility you can still access the lower 32-bit part via the E- prefix.

Example: the RAX / EAX / AX / AL hierarchy:

Byte number:
┌────┬────┬────┬────┬────┬────┬────┬────┐
│7th │6th │5th │4th │3rd │2nd │1st │0th │
├────────────────────────────────────────┤
│                RAX (64-bit)            │
├──────────────────────────────┬─────────┤
│          EAX (32-bit)        │         │
├────────────┬───────────────────────────┤
│    AX (16-bit)                         │
├────────────┬────────────┐              │
│   AH (8-bit) │  AL (8-bit)             │
└────────────┴────────────┴──────────────┘

The main() function returns an int, and in C/C++ int is still 32-bit. Therefore the compiler zeroes EAX (the 32-bit subregister) rather than the whole RAX to preserve compatibility.

Also, the function allocates 40 bytes on the stack. These 40 bytes are called the shadow space (explained later).

GCC: x86-64

Now let’s try GCC on a 64-bit Linux system:

1
Listing 1.23: GCC 4.4.6 x64
2
.string "hello, world\n"
3

4
main:
5
    sub     rsp, 8
6
    mov     edi, OFFSET FLAT:.LC0 ; "hello, world\n"
7
    xor     eax, eax ; number of vector registers passed
8
    call    printf
9
    xor     eax, eax
10
    add     rsp, 8
11
    ret

On Linux/BSD/macOS the calling convention also passes arguments in registers. According to the System V ABI (used by Unix-like systems), the first six arguments are passed in registers:

Argument #	Register
1	RDI
2	RSI
3	RDX
4	RCX
5	R8
6	R9

If there are more than six arguments the rest go on the stack as usual.

In the example above the pointer to the string is passed in EDI (the lower 32-bit part of RDI).

Why EDI and not RDI? — this is an optimization trick by the compiler:

Writing to the 32-bit subregister (e.g., EDI) automatically clears the upper 32 bits of the full 64-bit register (RDI).
This means a mov edi, imm instruction encodes smaller (5 bytes) than mov rdi, imm64 (7 bytes), saving space in the binary.

Example machine code bytes (from the object file) show this size saving:

Opcodes


.text:00000000004004D0  48 83 EC 08     sub  rsp, 8
.text:00000000004004D4   BF E8 05 40 00	mov  edi, offset format ; "hello, world"
.text:00000000004004D9   31 C0		xor  eax, eax
.text:00000000004004DB   E8 D8 FE FF FF	call _printf
.text:00000000004004E0   31 C0		xor  eax, eax
.text:00000000004004E2   48 83 C4 08	add  rsp, 8 
.text:00000000004004E6   C3		retn

As you see, the instruction writing into EDI at 0x4004D4 is 5 bytes long; writing a full 64-bit immediate into RDI would be 7 bytes — GCC chooses the shorter encoding because it’s safe (string addresses are typically below 4GB in these examples) and saves space.

Also note that EAX is zeroed before the call to printf(). According to the calling convention, the number of vector registers used must be placed in EAX for Unix x86-64 calls.

Address patching (Win64)

If we compile this example with MSVC 2013 and the /MD option (linking to external MSVCR*.DLL), the main() function is typically easy to find in the binary. The pointer load might look like:

Assembly


rcx , [0000000000002400]

As an experiment, if we increment that address by 1:

Assembly


rcx , [0000000000002401]

The program will read from the second byte of the string, and the output becomes ello, world instead of hello, world. Running the patched executable indeed prints that altered string.

Share

If this article helped you, please share it with others!

CH1.5 - Hello, World! (Part 1)

https://v3nn00m.github.io/posts/re4b/chapter1_5_1_6/

Author

0xV3n0m

Published at

2025-10-23

License

0xV3n0m's Personal Blog License

Some information may be outdated

CH1.6 - CH1.8 - Function Prologue and Epilogue

CH1.5 - Hello, World! (Part 2)

0xV3n0m

1.5 Hello, world!

1.5.1 x86 — MSVC

Analyzing the Assembly code

1. CONST SEGMENT

2. PUBLIC _main

3. EXTRN _printf:PROC

_main PROC

push ebp

mov ebp, esp

push OFFSET $SG3830

GCC

GCC: AT&T syntax

Differences between Intel and AT&T syntax

Back to our code:

String patching (Win32)

Software localization in the MS-DOS era

1.6 x86-64

GCC: x86-64

Address patching (Win64)

Table of Contents

1. `CONST SEGMENT`

2. `PUBLIC _main`

3. `EXTRN _printf:PROC`

`_main PROC`

`push ebp`

`mov ebp, esp`

`push OFFSET $SG3830`