1.5 Hello, world!
The author used a famous example to print Hello, World, and that was the example:
#include <stdio.h>int main(){ printf("hello, world\n"); return 0;}1.5.1 x86 — MSVC
Let’s compile this code using MSVC 2010:
(The option /Fa makes the compiler generate an Assembly listing file.)
Here’s the generated code:
Note something: MSVC generates Assembly code using the Intel syntax, and we’ll explain the difference between it and AT&T syntax later.
The compiler produces a file called 1.obj, which is later linked to create 1.exe.
That file contains several sections:
CONST→ for constant data (like strings)._TEXT→ for the code itself.
The string "hello, world" in C/C++ is of type const char[],
but since it doesn’t have an explicit name, the compiler gives it an internal name like $SG3830.
So we can write the code like this:
#include <stdio.h>const char $SG3830[] = "hello, world\n";
int main(){ printf($SG3830); return 0;}If we look again at the Assembly, we’ll notice the string ends with a small byte (0), and that’s normal for C/C++ strings.
Analyzing the Assembly code
1. CONST SEGMENT
This part contains constant data (like the texts inside the program).
The computer stores the sentence "hello, world" here,
and gives it an internal name so the compiler can access it later.
$SG3830 DB 'hello, world', 0AH, 00H$SG3830→ is the name chosen by the compiler.DBmeans “Define Bytes”, i.e., store bytes.'hello, world'→ the actual text.0AH= newline code\n.00H= zero byte marking “end of string”.
2. PUBLIC _main
This means there’s a function named main that will be public (available to the whole program).
3. EXTRN _printf:PROC
This means there’s an external function called printf
that’s not written here but will come from another library (the C standard library).
After that comes the _TEXT SEGMENT part —
this is where the actual executable code resides.
_main PROC
This marks the beginning of the main() function.
push ebp
This is the first line in almost any function. The computer saves the old value of ebp (the base pointer) to return to it later.
mov ebp, esp
Here we say: “Make the base pointer (ebp) point to the same place as the stack pointer (esp).”
That means we’ve started a “new frame” on the stack for this function’s work.
push OFFSET $SG3830
Here we push the address of the string "hello, world" onto the stack.
After printf() finishes and returns,
the address we pushed is still on the stack — but we no longer need it,
so we fix the stack pointer by:
ADD ESP, 4Why 4? Because the program is 32-bit, and an address takes exactly 4 bytes. If it were 64-bit, we’d need 8 bytes.
The instruction:
ADD ESP, 4is almost the same as:
POP registerbut without actually using a register.
Some compilers (like Intel C++ Compiler) prefer:
POP ECXto make the code smaller (1 byte instead of 3).
Example from Oracle RDBMS code:
.text:0800029A push ebx.text:0800029B call qksfroChild.text:080002A0 pop ecxEven MSVC can do that sometimes:
.text:0102106F push 0.text:01021071 call ds:time.text:01021077 pop ecx
After calling printf(), the original C/C++ code has return 0;
In Assembly, that turns into:
XOR EAX, EAX
The word XOR means “Exclusive OR”,
but the compiler uses it instead of MOV EAX, 0 because the code becomes shorter (2 bytes instead of 5).
Out of curiosity, I wanted to know why XOR EAX, EAX is shorter than MOV EAX, 0.
Turns out the reason is simple — when encoded in x86 machine code:
31 C0That’s only 2 bytes.
While MOV EAX, 0 becomes:
B8 00 00 00 00That’s 5 bytes in total (1 + 4).
This was just something extra I wanted to understand better, so I decided to write it down as well.
Some other compilers use:
SUB EAX, EAXwhich means “subtract EAX from itself” → result is also zero.
Finally:
RET
This returns control to the program that called main()
(usually the C runtime code), which then returns back to the operating system.
GCC
Now let’s try compiling the same C/C++ “Hello, world” code, but this time using GCC on a Linux system, with this command:
gcc 1.c -o 1
Then we’ll use a program called IDA Disassembler to see how the function main()
was built after compilation.
IDA uses the same Intel-syntax style as MSVC.
main proc nearvar_10 = dword ptr -10h
push ebp mov ebp, esp and esp, 0FFFFFFF0h sub esp, 10h mov eax, offset aHelloWorld ; "hello, world\n" mov [esp+10h+var_10], eax call _printf mov eax, 0 leave retnendp main
The result is almost identical to the code generated by MSVC.
The address of the string "hello, world" (stored in the .data section)
is first loaded into the EAX register, then stored on the stack.
Also, at the beginning of the function, there’s this line:
AND ESP, 0FFFFFFF0hHere, GCC performs something called stack alignment.
That means it ensures that the address of ESP is a multiple of 16 (i.e., ends with 0 or 0x0).
Why? Because the CPU reads memory in “blocks,” and if a block starts at a neatly aligned address (like 0x1000 instead of 0x1003), it’s much faster.
So this line aligns the stack for better performance.
Then we have this line:
SUB ESP, 10hThis allocates 16 bytes on the stack (since 10h = 16). In reality, we only need 4 bytes, but the compiler reserves 16 to maintain proper alignment.
After that, the address of the string is stored on the stack directly
without using PUSH.
The variable var_10 is a local variable,
and it’s also used as the argument to the printf() function.
Then the function printf() is called.
When GCC is running without optimization, it uses:
MOV EAX, 0
instead of shorter instructions like XOR EAX, EAX.
The last instruction LEAVE is equivalent to:
MOV ESP, EBPPOP EBPThis restores the stack to its original state and recovers the previous EBP value that existed before the function started.
GCC: AT&T syntax
Now let’s see how this code looks when written in AT&T syntax. This style is more common on UNIX systems.
cc -S 1_1.cThis command tells GCC to generate Assembly code instead of an executable file.
Here’s the generated code:
.file "1_1.c".section .rodata.LC0: .string "hello, world\n".text.globl main.type main, @functionmain:.LFB0: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf movl $0, %eax leave ret .cfi_endproc.LFE0:.size main, .-main.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3".section .note.GNU-stack,"",@progbits
The code contains many directives starting with a dot (.).
These are called macros, and we don’t need to worry about them now.
We can safely ignore them — except for .string,
because it’s what stores the text "hello, world\n" in memory as a C-string (ending with null).
After removing unnecessary lines, the simplified version looks like this:
.LC0: .string "hello, world\n"main: pushl %ebp mov %esp, %ebp andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf movl $0, %eax leave retDifferences between Intel and AT&T syntax
1- The order of source and destination is reversed:
- Intel:
<instruction> <destination>, <source> - AT&T:
<instruction> <source>, <destination>
So, in Intel:
mov eax, ebx
In AT&T, it becomes:
movl %ebx, %eax
To remember: Think of Intel like an “equals sign (=)” and AT&T like an “arrow →”, meaning the value moves from left to right.
2 - In AT&T:
- Registers start with % (e.g.,
%eax). - Constants start with $ (e.g.,
$16). - Parentheses ( ) are used instead of square brackets [ ].
3 - AT&T also adds a letter at the end of each instruction to indicate data size:
- q → quad (64-bit)
- l → long (32-bit)
- w → word (16-bit)
- b → byte (8-bit)
Back to our code:
The generated code looks very similar to what IDA produces, but there’s a small difference:
The value 0FFFFFFF0h appears here as $-16.
They’re actually the same:
- In decimal: -16
- In hexadecimal: 0xFFFFFFF0
Both represent the same number in 32-bit systems.
Another note:
The return value is set using MOV instead of XOR.
So we see:
movl $0, %eax
This copies the value 0 into %eax.
The word “move” is a bit misleading — it doesn’t move, it copies.
In other architectures, you’ll find similar instructions called “LOAD” or “STORE”.
String patching (Win32)
We can easily locate the string “hello, world” inside the executable file. It looks something like this:
h e l l o , w o r l d \n \068 65 6C 6C 6F 2C 20 77 6F 72 6C 64 0A 00If we wanted to translate it into Spanish, it would look like:
h o l a \n \068 6F 6C 61 0A 00
The original string length is 14 bytes (like in the example above),
and the new string is 6 bytes long.
You can replace it directly and leave the remaining bytes as-is,
or pad them with 00 until the space is filled. Example after modification:
68 6F 6C 61 0A 00 00 00 00 00 00 00 00 00
If we wanted to insert a longer message,
there might be some null bytes (00) after the original English text.
It’s not always safe to overwrite them — they might be used by CRT code.
So only do it if you know what you’re doing.
The author shared a real story about software cracking:
There was an image-processing program that, when unregistered, would add watermarks like: “This image was processed by the evaluation version of [Program Name]”.
By coincidence, they found this string inside the executable, and replaced it with spaces — the watermark disappeared! Technically, the program was still adding the watermark, but the text became invisible.
Software localization in the MS-DOS era
This method was common for translating MS-DOS programs into Russian during the 1980s and 1990s. It was suitable even for people unfamiliar with machine code or executable file formats.
The new text couldn’t be longer than the original, because adding bytes might overwrite nearby code or data. Russian words were often longer, so translated versions had many abbreviations to make text fit.
The same might have happened for other languages too. And with Delphi strings, the string length field also had to be updated if needed.
1.6 x86-64
Now we compile this time using 64-bit MSVC:
$SG2989 DB 'hello, world', 0AH, 00H
main PROC sub rsp, 40 lea rcx, OFFSET FLAT:$SG2989 call printf xor eax, eax add rsp, 40 ret 0main ENDPIn x86-64 all registers were extended to 64-bit, and their names now start with R.
To reduce stack usage (i.e., avoid repeated memory/cache access) there is a common convention to pass function arguments in registers rather than on the stack — commonly called fastcall.
That means some arguments are passed in registers, and the rest (if any) go on the stack.
On Win64, the first four arguments of any function are passed in these registers:
- RCX
- RDX
- R8
- R9
That’s what we see here: the pointer to the string passed to printf() is now passed in RCX instead of being pushed on the stack.
Also, pointers are now 64-bit, so they are passed in the 64-bit registers (the ones starting with R-).
For backward compatibility you can still access the lower 32-bit part via the E- prefix.
Example: the RAX / EAX / AX / AL hierarchy:
Byte number: ┌────┬────┬────┬────┬────┬────┬────┬────┐ │7th │6th │5th │4th │3rd │2nd │1st │0th │ ├────────────────────────────────────────┤ │ RAX (64-bit) │ ├──────────────────────────────┬─────────┤ │ EAX (32-bit) │ │ ├────────────┬───────────────────────────┤ │ AX (16-bit) │ ├────────────┬────────────┐ │ │ AH (8-bit) │ AL (8-bit) │ └────────────┴────────────┴──────────────┘
The main() function returns an int, and in C/C++ int is still 32-bit.
Therefore the compiler zeroes EAX (the 32-bit subregister) rather than the whole RAX to preserve compatibility.
Also, the function allocates 40 bytes on the stack. These 40 bytes are called the shadow space (explained later).
GCC: x86-64
Now let’s try GCC on a 64-bit Linux system:
Listing 1.23: GCC 4.4.6 x64.string "hello, world\n"
main: sub rsp, 8 mov edi, OFFSET FLAT:.LC0 ; "hello, world\n" xor eax, eax ; number of vector registers passed call printf xor eax, eax add rsp, 8 retOn Linux/BSD/macOS the calling convention also passes arguments in registers. According to the System V ABI (used by Unix-like systems), the first six arguments are passed in registers:
If there are more than six arguments the rest go on the stack as usual.
In the example above the pointer to the string is passed in EDI (the lower 32-bit part of RDI).
Why EDI and not RDI? — this is an optimization trick by the compiler:
- Writing to the 32-bit subregister (e.g., EDI) automatically clears the upper 32 bits of the full 64-bit register (RDI).
- This means a
mov edi, imminstruction encodes smaller (5 bytes) thanmov rdi, imm64(7 bytes), saving space in the binary.
Example machine code bytes (from the object file) show this size saving:
As you see, the instruction writing into EDI at 0x4004D4 is 5 bytes long; writing a full 64-bit immediate into RDI would be 7 bytes — GCC chooses the shorter encoding because it’s safe (string addresses are typically below 4GB in these examples) and saves space.
Also note that EAX is zeroed before the call to printf(). According to the calling convention, the number of vector registers used must be placed in EAX for Unix x86-64 calls.
Address patching (Win64)
If we compile this example with MSVC 2013 and the /MD option (linking to external MSVCR*.DLL), the main() function is typically easy to find in the binary. The pointer load might look like:
As an experiment, if we increment that address by 1:
The program will read from the second byte of the string, and the output becomes ello, world instead of hello, world. Running the patched executable indeed prints that altered string.
If this article helped you, please share it with others!
Some information may be outdated





