0xV3n0m

Announcement

Welcome To My Personal Blog

1.26 Arrays

Let's first define what an Array is: it is a group of variables stored next to each other in memory, all of the same type.

Here is a simple example:


#include <stdio.h> // include standard I/O header
int main() {
int a[20]; // declare an array of 20 integers (20 × 4 = 80 bytes on stack)
int i;     // loop counter
1
for (i = 0; i &lt; 20; i++)   // iterate from 0 to 19
2
    a[i] = i * 2;           // fill each element with index × 2
3

4
for (i = 0; i &lt; 20; i++)   // iterate again to print
5
    printf(&quot;a[%d]=%d\n&quot;, i, a[i]); // print index and value
6

7
return 0; // return 0 to indicate successful termination

}

x86

MSVC

Let's compile it and it will look like this:


_TEXT   SEGMENT
_i$     = -84       ; variable i (4 bytes) offset on stack
_a$     = -80       ; array a (80 bytes) offset on stack
_main   PROC
push    ebp                                     ; save base pointer
mov     ebp, esp                                ; set up stack frame
sub     esp, 84                                 ; allocate 84 bytes: 80 for array + 4 for i
1
; === First loop: fill the array ===
2
mov     DWORD PTR _i$[ebp], 0                   ; i = 0
3
jmp     SHORT $LN6@main                         ; jump to loop condition check

 $LN5@main: ; loop increment label mov eax, DWORD PTR _i$ [ebp]                 ; load i into EAX
add     eax, 1                                  ; EAX = i + 1
mov     DWORD PTR _i$[ebp], eax                 ; i = i + 1
 $LN6@main: ; loop condition label cmp DWORD PTR _i$ [ebp], 20                  ; compare i with 20
jge     SHORT $LN4@main                         ; if i >= 20, exit loop
1
mov     ecx, DWORD PTR _i$[ebp]                 ; load i into ECX
2
shl     ecx, 1                                  ; ECX = i * 2 (left shift by 1 bit = multiply by 2)
3
mov     edx, DWORD PTR _i$[ebp]                 ; load i into EDX (used as index)
4
mov     DWORD PTR _a$[ebp + edx*4], ecx         ; a[i] = i*2  (address = base + i*4)
5

6
jmp     SHORT $LN5@main                         ; jump back to increment

$LN4@main:                                          ; after first loop
1
; === Second loop: print the array ===
2
mov     DWORD PTR _i$[ebp], 0                   ; i = 0
3
jmp     SHORT $LN3@main                         ; jump to condition check

 $LN2@main: ; loop increment label mov eax, DWORD PTR _i$ [ebp]
add     eax, 1
mov     DWORD PTR _i$[ebp], eax                 ; i = i + 1
 $LN3@main: ; loop condition label cmp DWORD PTR _i$ [ebp], 20                  ; compare i with 20
jge     SHORT $LN1@main                         ; if i >= 20, exit loop
1
mov     ecx, DWORD PTR _i$[ebp]                 ; load i as index
2
mov     edx, DWORD PTR _a$[ebp + ecx*4]         ; load a[i] into EDX
3
push    edx                                     ; push a[i] as third argument to printf
4
mov     eax, DWORD PTR _i$[ebp]                 ; load i
5
push    eax                                     ; push i as second argument to printf
6
push    OFFSET $SG2463                          ; push format string &quot;a[%d]=%d\n&quot;
7
call    _printf                                 ; call printf
8
add     esp, 12                                 ; clean up stack (3 arguments × 4 bytes)
9

10
jmp     SHORT $LN2@main                         ; jump back to increment

$LN1@main:                                          ; after second loop
xor     eax, eax                                ; return value = 0
mov     esp, ebp                                ; restore stack pointer
pop     ebp                                     ; restore base pointer
ret     0                                       ; return
_main   ENDP

Nothing very strange, just two loops: the first loop fills the array (multiplying each element by 2), and the second loop prints the elements. But let's analyze the Assembly together a bit more for better understanding.

Imagine you have 20 drawers in a cabinet, all next to each other, and each drawer holds one number of type int. Therefore each drawer takes 4 bytes (because we said before that int in 32-bit takes 4 bytes), meaning the entire array takes 20 × 4 = 80 contiguous bytes.

Let's start with the instruction sub esp, 84 which allocates 84 bytes on the stack. Why 84 and not 80? In short, there are 4 extra bytes for the variable i.

Now let's go through the first loop — filling the array:

* mov _i$, 0 — as we know, start with i = 0

* cmp _i$, 20 — if i is greater than or equal to 20, exit the loop

* shl ecx, 1 — the fastest way to multiply i × 2 (instead of a regular multiply, a left shift by 1 bit = ×2)

* calculating the address of a[i]: mov DWORD PTR _a$[ebp + edx*4], ecx

An important question — why did the compiler use shl instead of imul? Because it is safer on the stack (no overflow if the array is small).

Let's try this example in x32dbg. We can see how the array is being filled: each element is a 32-bit int word and its value is the index multiplied by 2.

Since this array lives on the stack, we can see all 20 of its elements there.

GCC

This is what GCC produced:


main proc near
; DATA XREF: _start+17
var_70  = dword ptr -70h  ; stack slot used for printf arg (value)
var_6C  = dword ptr -6Ch  ; stack slot used for printf arg (index)
var_68  = dword ptr -68h  ; stack slot used for printf arg (format string pointer)
i_2     = dword ptr -54h  ; array a starts here (20 ints = 80 bytes)
i       = dword ptr -4    ; loop counter i
1
push    ebp
2
mov     ebp, esp
3
and     esp, 0FFFFFFF0h         ; align stack to 16 bytes
4
sub     esp, 70h                ; allocate 112 bytes on stack
5

6
mov     [esp+70h+i], 0          ; i = 0
7
jmp     short loc_804840A       ; jump to loop condition

loc_80483F7:                        ; loop body
mov     eax, [esp+70h+i]        ; load i into EAX (used as array index)
mov     edx, [esp+70h+i]        ; load i into EDX
add     edx, edx                ; EDX = i * 2 (add to itself = multiply by 2)
mov     [esp+eax4+70h+i_2], edx ; a[i] = i2  (address = array_base + i*4)
add     [esp+70h+i], 1          ; i++
loc_804840A:                        ; loop condition
cmp     [esp+70h+i], 13h        ; compare i with 19 (0x13)
jle     short loc_80483F7       ; if i <= 19, continue loop
1
mov     [esp+70h+i], 0          ; i = 0 (reset for second loop)
2
jmp     short loc_8048441       ; jump to second loop condition

loc_804841B:                        ; second loop body
mov     eax, [esp+70h+i]        ; load i as index
mov     edx, [esp+eax*4+70h+i_2]; load a[i] into EDX
mov     eax, offset aADD        ; load address of format string “a[%d]=%d\n”
mov     [esp+70h+var_68], edx   ; push a[i] as third arg on stack
mov     edx, [esp+70h+i]        ; load i
mov     [esp+70h+var_6C], edx   ; push i as second arg on stack
mov     [esp+70h+var_70], eax   ; push format string as first arg on stack
call    _printf                 ; call printf
add     [esp+70h+i], 1          ; i++
loc_8048441:                        ; second loop condition
cmp     [esp+70h+i], 13h        ; compare i with 19
jle     short loc_804841B       ; if i <= 19, continue loop
1
mov     eax, 0                  ; return value = 0
2
leave                           ; restore stack frame
3
retn                            ; return

main endp

By the way, the variable a is of type int (a pointer to int). You can pass a pointer to an array to another function, but it is more accurate to say "a pointer to the first element of the array is being passed" (and the rest of the addresses are calculated in a straightforward way). If you do indexing on this pointer like a[idx], idx is just added to the pointer and the element at that address is returned.

An interesting example: a string like "string" is an array of characters of type const char[]. You can do indexing on this pointer too. That is why it is possible to write "string"[i] and that is a completely valid expression in C/C++!

ARM

Non-optimizing Keil 6/2013 (ARM mode)


EXPORT _main
_main
    STMFD   SP!, {R4,LR}            ; save R4 and link register on stack
    SUB     SP, SP, #0x50           ; allocate 80 bytes (0x50) for 20 int array
1
; First loop: fill the array
2
MOV     R4, #0                  ; i = 0 (R4 is the loop counter)
3
B       loc_4A0                 ; jump to loop condition

loc_494:                            ; loop body
MOV     R0, R4,LSL#1            ; R0 = R4 * 2 (left shift by 1 = multiply by 2)
STR     R0, [SP,R4,LSL#2]       ; store R0 at address SP + R44  (a[i] = i2)
ADD     R4, R4, #1              ; i++
loc_4A0:                            ; loop condition
CMP     R4, #20                 ; compare i with 20
BLT     loc_494                 ; if i < 20, continue loop
1
; Second loop: print the array
2
MOV     R4, #0                  ; i = 0 (reset counter)
3
B       loc_4C4                 ; jump to loop condition

loc_4B0:                            ; loop body
LDR     R2, [SP,R4,LSL#2]       ; R2 = a[i]  (load from SP + i*4)
MOV     R1, R4                  ; R1 = i (second argument to printf)
ADR     R0, aADD                ; R0 = address of format string “a[%d]=%d\n”
BL      __2printf               ; call printf(format, i, a[i])
ADD     R4, R4, #1              ; i++
loc_4C4:                            ; loop condition
CMP     R4, #20                 ; compare i with 20
BLT     loc_4B0                 ; if i < 20, continue loop
1
MOV     R0, #0                  ; return value = 0
2
ADD     SP, SP, #0x50           ; deallocate array space
3
LDMFD   SP!, {R4,PC}            ; restore R4 and return

The int type takes 32 bits (4 bytes), so 20 elements need 80 bytes (0x50). The SUB SP, SP, #0x50 instruction at the beginning of the function allocates exactly that space.

In both loops, the counter i lives in R4. The number written into the array = i*2, and this is translated into MOV R0, R4,LSL#1 (left shift by 1 bit = multiply by 2). STR R0, [SP,R4,LSL#2] writes to the array. The element address = SP (start of array) + R4*4 (because each element is 4 bytes).

Optimizing Keil 6/2013 (Thumb mode)


_main
    PUSH    {R4,R5,LR}              ; save R4, R5 and link register
    SUB     SP, SP, #0x54           ; allocate 84 bytes: 80 for array + 4 extra (alignment)
1
MOVS    R0, #0                  ; i = 0
2
MOV     R5, SP                  ; R5 = base address of the array

loc_1CE:                            ; first loop body
LSLS    R1, R0, #1              ; R1 = i * 2 (left shift = multiply by 2)
LSLS    R2, R0, #2              ; R2 = i * 4 (byte offset into array)
ADDS    R0, R0, #1              ; i++
CMP     R0, #20                 ; compare i with 20
STR     R1, [R5,R2]             ; store i2 at address (R5 + i4)  → a[i] = i*2
BLT     loc_1CE                 ; if i < 20, continue loop
1
MOVS    R4, #0                  ; i = 0 (reset for second loop)

loc_1DC:                            ; second loop body
LSLS    R0, R4, #2              ; R0 = i * 4 (byte offset)
LDR     R2, [R5,R0]             ; R2 = a[i]  (load from base + i*4)
MOVS    R1, R4                  ; R1 = i (second argument to printf)
ADR     R0, aADD                ; R0 = address of format string “a[%d]=%d\n”
BL      __2printf               ; call printf(format, i, a[i])
ADDS    R4, R4, #1              ; i++
CMP     R4, #20                 ; compare i with 20
BLT     loc_1DC                 ; if i < 20, continue loop
1
MOVS    R0, #0                  ; return value = 0
2
ADD     SP, SP, #0x54           ; deallocate stack space
3
POP     {R4,R5,PC}              ; restore and return

The Thumb code is similar to the previous one but with dedicated shift instructions (LSLS). The compiler allocated 84 bytes instead of 80, the last 4 bytes are unused.

Non-optimizing GCC 4.9.1 (ARM64)

Listing 1.230


.LC0:
    .string "a[%d]=%d\n"             ; format string for printf
main:
stp     x29, x30, [sp, -112]!   ; save frame pointer and link register, allocate 112 bytes
add     x29, sp, 0              ; set frame pointer
1
str     wzr, [x29,108]          ; i = 0  (wzr is always zero)
2
b       .L2                     ; jump to loop condition

.L3:                                ; first loop body
ldr     w0, [x29,108]           ; load i into W0
lsl     w2, w0, 1               ; W2 = i * 2 (left shift by 1)
add     x0, x29, 24             ; X0 = base address of array a (x29 + 24)
ldrsw   x1, [x29,108]           ; load i as 64-bit signed value into X1
str     w2, [x0,x1,lsl 2]       ; a[i] = i2  (store at base + i4)
ldr     w0, [x29,108]           ; load i
add     w0, w0, 1               ; W0 = i + 1
str     w0, [x29,108]           ; i++
.L2:                                ; first loop condition
ldr     w0, [x29,108]           ; load i
cmp     w0, 19                  ; compare i with 19
ble     .L3                     ; if i <= 19, continue loop
1
str     wzr, [x29,108]          ; i = 0 (reset for second loop)
2
b       .L4                     ; jump to second loop condition

.L5:                                ; second loop body
add     x0, x29, 24             ; X0 = base address of array
ldrsw   x1, [x29,108]           ; load i as 64-bit index
ldr     w2, [x0,x1,lsl 2]       ; W2 = a[i]  (load from base + i*4)
adrp    x0, .LC0                ; load page address of format string
add     x0, x0, :lo12:.LC0      ; add page offset to get full address
ldr     w1, [x29,108]           ; W1 = i (second argument to printf)
bl      printf                  ; call printf(format, i, a[i])
ldr     w0, [x29,108]           ; load i
add     w0, w0, 1               ; W0 = i + 1
str     w0, [x29,108]           ; i++
.L4:                                ; second loop condition
ldr     w0, [x29,108]           ; load i
cmp     w0, 19                  ; compare i with 19
ble     .L5                     ; if i <= 19, continue loop
1
mov     w0, 0                   ; return value = 0
2
ldp     x29, x30, [sp], 112     ; restore frame pointer and link register
3
ret                             ; return

MIPS

The function uses many (S-registers) which must be preserved, so their values are saved at the beginning of the function (prologue) and restored at the end (epilogue).

Listing 1.231: Optimizing GCC 4.4.5 (IDA)


main:
    ; function prologue: save registers and set up stack
    lui     $gp, (__gnu_local_gp >> 16)     ; load upper 16 bits of global pointer
    addiu   $sp, -0x80                      ; allocate 128 bytes on stack
    la      $gp, (__gnu_local_gp & 0xFFFF)  ; load lower 16 bits of global pointer
    sw      $ra, 0x80+var_4($sp)            ; save return address
    sw      $s3, 0x80+var_8($sp)            ; save S3
    ; ... (save more S-registers)
1
addiu   $s1, $sp, 0x80+var_68           ; S1 = base address of array a on stack
2
move    $v1, $s1                        ; V1 = pointer to current array element
3
move    $v0, $zero                      ; V0 = 0 (value to write, starts at 0)
4
li      $a0, 0x28                       ; A0 = 40 (loop stops when V0 == 40, i.e. i*2 reaches 40)

loc_34:                                     ; first loop body
sw       $v0, 0($ v1)                     ; store current value (i*2) into current element
addiu    $v0, 2 ; V0 += 2 (next value = (i+1)*2) bne$ v0,  $a0, loc_34 ; if V0 != 40, continue loop addiu$ v1, 4                          ; delay slot: advance pointer by 4 bytes (next element)
1
la      $s3, $LC0                       ; S3 = address of format string &quot;a[%d]=%d\n&quot;
2
move    $s0, $zero                      ; S0 = 0 (i counter for second loop)
3
li      $s2, 0x14                       ; S2 = 20 (loop limit)

loc_54:                                     ; second loop body
lw       $t9, (printf &amp; 0xFFFF)($ gp)     ; load address of printf from GOT
lw       $a2, 0($ s1)                     ; A2 = a[i] (load current element)
move     $a1,$ s0                        ; A1 = i (second argument)
move     $a0,$ s3                        ; A0 = format string (first argument)
jalr     $t9 ; call printf addiu$ s0, 1                          ; delay slot: i++
lw       $gp, 0x80+var_70($ sp)           ; restore global pointer after call
bne      $s0,$ s2, loc_54               ; if i != 20, continue loop
addiu   $s1, 4                          ; delay slot: advance array pointer by 4 bytes
1
; epilogue: restore registers and return
2
jr      $ra                             ; return
3
addiu   $sp, 0x80                       ; delay slot: deallocate stack

$LC0: .ascii “a[%d]=%d\n”                  ; format string constant

An interesting thing: in both loops, the first loop does not need i at all! It only needs the value (i*2) which increases by +2 each time, and the memory address which increases by +4.

So we see two variables: one in $v0 that increases by +2, and another in $v1 that increases by +4.

The second loop is the one that calls printf and tells the user the value of i, which is why there is a variable that increases by +1 (in $s0) and an address that increases by +4 (in $s1).

This reminds us of loop optimizations that we talked about before. The goal is to completely eliminate multiplication operations.

1.26.2 Buffer overflow

Reading outside array bounds

When dealing with an array like array[index], if you look carefully at the code that the compiler generates, you will notice there is no bounds checking at all — meaning nothing checks whether the index is less than 20 or not.

What if the index becomes 20 or greater? This is exactly one of the most criticized things about C/C++. Here is code that compiles and runs just fine:


#include <stdio.h> // include standard I/O header
int main() {
int a[20]; // array of 20 integers
int i;     // loop counter
1
for (i = 0; i &lt; 20; i++) // fill array with i*2
2
    a[i] = i * 2;
3

4
printf(&quot;a[20]=%d\n&quot;, a[20]); // read one element BEYOND the array bounds (index 20 is invalid)
5

6
return 0;

}

Compile result (Non-optimizing MSVC 2008)

Listing 1.232


$SG2474 DB 'a[20]=%d', 0aH, 00H   ; format string with newline and null terminator
_i $= -84 ; size = 4 ; variable i offset on stack a$       = -80   ; size = 80 ; array a offset on stack (20 × 4 bytes)
_main PROC
push    ebp                                 ; save base pointer
mov     ebp, esp                            ; set up stack frame
sub     esp, 84                             ; allocate 84 bytes: 80 for array + 4 for i
1
mov     DWORD PTR _i$[ebp], 0               ; i = 0
2
jmp     SHORT $LN3@main                     ; jump to loop condition

 $LN2@main: ; loop increment mov eax, DWORD PTR _i$ [ebp]             ; load i
add     eax, 1                              ; i + 1
mov     DWORD PTR _i$[ebp], eax             ; i++
 $LN3@main: ; loop condition cmp DWORD PTR _i$ [ebp], 20              ; compare i with 20
jge     SHORT $LN1@main                     ; if i >= 20, exit loop
1
mov     ecx, DWORD PTR _i$[ebp]             ; load i into ECX
2
shl     ecx, 1                              ; ECX = i * 2
3
mov     edx, DWORD PTR _i$[ebp]             ; load i as index
4
mov     DWORD PTR _a$[ebp+edx*4], ecx       ; a[i] = i*2
5
jmp     SHORT $LN2@main                     ; loop back

 $LN1@main: mov eax, DWORD PTR _a$ [ebp+80]          ; read from a[20] — 80 bytes past start of array (OUT OF BOUNDS!)
push    eax                                 ; push value as argument to printf
push    OFFSET $SG2474                      ; push format string “a[20]=%d\n”
call    DWORD PTR __imp__printf             ; call printf
add     esp, 8                              ; clean up stack (2 arguments)
1
xor     eax, eax                            ; return value = 0
2
mov     esp, ebp                            ; restore stack pointer
3
pop     ebp                                 ; restore base pointer
4
ret     0                                   ; return

_main ENDP

The result that came out was a[20]=1638280, and it is obviously clear that this number is not an array value at all — it is a value that was sitting on the stack 80 bytes past the first element of a[].

Let's try to figure out where that value came from. We will do the same example on x32dbg and look at the value that is sitting exactly after the last element of the array.

What is this?

From the look of the stack, this is the saved value of register EBP (which was saved at the beginning of the function).

Let's follow further and see how this value comes back:

Indeed, how could it be otherwise? The compiler could generate extra code to check that the index is always within the array bounds (like what happens in higher-level languages), but that would make the code slower.

Writing beyond array bounds

Alright, we read values from the stack illegally, but what if we could write something into it?

Here is what happens:


#include <stdio.h> // include standard I/O header
int main() {
int a[20]; // array of only 20 integers
int i;
1
for (i = 0; i &lt; 30; i++) // loop goes to 30 — writes 10 elements BEYOND array bounds!
2
    a[i] = i;             // overwrites stack memory past the array
3

4
return 0;

}

MSVC

Here is what we found:


_i$     = -84       ; variable i offset on stack
a$      = -80       ; array a offset on stack
_main PROC
push    ebp                                 ; save base pointer
mov     ebp, esp                            ; set up stack frame
sub     esp, 84                             ; allocate 84 bytes on stack
1
mov     DWORD PTR _i$[ebp], 0               ; i = 0
2
jmp     SHORT $LN3@main                     ; jump to loop condition

 $LN2@main: ; loop increment mov eax, DWORD PTR _i$ [ebp]             ; load i
add     eax, 1                              ; i + 1
mov     DWORD PTR _i$[ebp], eax             ; i++
 $LN3@main: ; loop condition cmp DWORD PTR _i$ [ebp], 30              ; compare i with 30 (NOT 20 — loop goes beyond array!)
jge     SHORT $LN1@main                     ; if i >= 30, exit loop
1
mov     ecx, DWORD PTR _i$[ebp]             ; load i as index
2
mov     edx, DWORD PTR _i$[ebp]             ; load i as value (redundant load — could be optimized)
3
mov     DWORD PTR _a$[ebp+ecx*4], edx       ; a[i] = i  (writes past end of array for i &gt;= 20!)
4
jmp     SHORT $LN2@main                     ; loop back

$LN1@main:
xor     eax, eax                            ; return value = 0
mov     esp, ebp                            ; restore stack pointer
pop     ebp                                 ; restore base pointer (but EBP was overwritten!)
ret     0                                   ; return (but return address was overwritten too!)
_main ENDP

The result is of course that the program crashes. Let's see exactly where it crashes. We will run it on x32dbg, and to compile it without issues we will use this command:


cl /Od /Zi /GS- /sdl- overflow.c /link /SUBSYSTEM:CONSOLE
; /Od  = no optimization
; /Zi  = generate debug info
; /GS- = disable stack security cookie (buffer security check)
; /sdl-= disable additional security checks

We load the program in x32dbg and follow until it writes all 30 elements:

We follow until the end of the function:

When we focus on the registers we will find that EIP is now 0x16, and that is an illegal address for code (at least in Win32)! We ended up there without intending to. Also EBP became 0x15, and ECX and EDX became 0x1D.

Let's study the stack shape more carefully. After control entered main(), the value of EBP was saved on the stack. Then 84 bytes were allocated for the array and variable i (20+1 elements × 4 bytes). ESP now points to the variable _i, and after any new PUSH, the new thing appears next to it.

This is the shape of the stack while inside main():

The instruction a[19]=something writes the last int inside the array bounds (still safe). The instruction a[20]=something writes to the location that holds the saved value of EBP.

Look at the register state at the time of the crash: in our case, 20 was written into element number 20. At the end of the function, the epilogue restores the old EBP value (decimal 20 = 0x14 hex). Then RET executes, which is essentially like POP EIP. RET takes the return address from the stack (the address in the CRT that called main()), and now in its place is written 21 (0x16 hex). The CPU tries to execute at address 0x16, but there is no executable code there, so an exception occurs.

Welcome to the world of Buffer Overflow!

Replace an int array with a string (char array), intentionally craft a long string, and pass it to a program or function that does not check the string length and copies it into a small buffer, and you can redirect the program to jump to any address you want. It is not easy in practice, but that is the core idea.

GCC

Let's try the same code:


main proc near
a       = dword ptr -54h  ; array a offset on stack (20 ints = 80 bytes)
i       = dword ptr -4    ; variable i offset on stack
1
push    ebp
2
mov     ebp, esp
3
sub     esp, 60h                        ; allocate 96 bytes on stack
4

5
mov     [ebp+i], 0                      ; i = 0
6
jmp     short loc_80483D1               ; jump to loop condition

loc_80483C3:                                ; loop body
mov     eax, [ebp+i]                    ; load i as index
mov     edx, [ebp+i]                    ; load i as value
mov     [ebp+eax*4+a], edx             ; a[i] = i  (writes past array end for i >= 20!)
add     [ebp+i], 1                      ; i++
loc_80483D1:                                ; loop condition
cmp     [ebp+i], 1Dh                    ; compare i with 29 (0x1D); loop runs 0..29 = 30 iterations
jle     short loc_80483C3               ; if i <= 29, continue loop
1
mov     eax, 0                          ; return value = 0
2
leave                                   ; restore stack frame (EBP already corrupted!)
3
retn                                    ; return (return address already corrupted!)

main endp

When we run it on Linux it gives us a Segmentation fault (core dumped).

Let's do this practically on Linux and use it as a GDB exercise. First we compile it with this command:


gcc -m32 -O0 -ggdb -fno-stack-protector -z execstack test.c -o overflow
; -m32              = compile as 32-bit
; -O0               = no optimization
; -ggdb             = include GDB debug symbols
; -fno-stack-protector = disable stack canary (security cookie)
; -z execstack      = allow executable stack (needed for this demo)

And we try to run it:

And indeed it gave a Segmentation fault.

Now let's run it on GDB. First we do gdb overflow then run (or its shortcut r):

We will find it stopped by itself. Now let's look at the registers using info registers:

The register values are slightly different from Win32 because the stack layout is slightly different.

1.26.3 Buffer overflow protection methods

Let's talk about ways to protect against buffer overflow even if the C/C++ programmer is not paying attention.

MSVC has options like these:

* /RTCs → Stack Frame runtime checking

* /GZ → Enable stack checks (same as /RTCs)

One of the methods is that it writes a random value between the local variables on the stack at the beginning of the function (prologue), and then checks it at the end of the function (epilogue) before exiting.

For example, if that value changed → do not execute the last RET instruction, but instead stop the program (or suspend it). The program will stop, but that is much better than someone attacking your machine remotely.

This random value is called a "Canary", and this is connected to the canaries that mine workers used in the old days. They used canaries to detect toxic gases quickly. The canary is very sensitive to gases — if there is danger it becomes very distressed or dies.

If we compile the first example with the /RTC1 and /RTCs options, we will find at the end of the function a call to the function @_RTC_CheckStackVars@8 which checks that the "canary" is still intact.

Now let's see how it is done in GCC:


#ifdef __GNUC__
#include <alloca.h>   // GCC: use alloca from alloca.h
#else
#include <malloc.h>   // MSVC: use alloca from malloc.h
#endif
#include <stdio.h>
void f() {
char buf = (char)alloca(600); // allocate 600 bytes on the stack dynamically
#ifdef GNUC
snprintf(buf, 600, “hi! %d, %d, %d\n”, 1, 2, 3);  // GCC version
#else
_snprintf(buf, 600, “hi! %d, %d, %d\n”, 1, 2, 3); // MSVC version
#endif
1
puts(buf); // print the result

}

Without any extra options, GCC 4.7.3 inserts canary checking code automatically:

Listing 1.235: GCC 4.7.3


.LC0: .string "hi! %d, %d, %d\n"   ; format string constant
f:
push    ebp                             ; save base pointer
mov     ebp, esp                        ; set up stack frame
push    ebx                             ; save EBX
sub     esp, 676                        ; allocate stack space (600 + alignment + canary)
lea     ebx, [esp+39]                   ; EBX = aligned buffer start address
and     ebx, -16                        ; align buffer to 16-byte boundary
1
mov     DWORD PTR [esp+20], 3           ; push argument 3
2
mov     DWORD PTR [esp+16], 2           ; push argument 2
3
mov     DWORD PTR [esp+12], 1           ; push argument 1
4
mov     DWORD PTR [esp+8], OFFSET FLAT:.LC0  ; push format string address
5
mov     DWORD PTR [esp+4], 600          ; push buffer size
6
mov     DWORD PTR [esp], ebx            ; push buffer pointer
7

8
mov     eax, DWORD PTR gs:20            ; ← load canary value from thread-local storage (TLS)
9
mov     DWORD PTR [ebp-12], eax         ; save canary onto stack
10
xor     eax, eax                        ; clear EAX (security: don&#39;t leave canary in register)
11

12
call    _snprintf                       ; call snprintf(buf, 600, format, 1, 2, 3)
13

14
mov     DWORD PTR [esp], ebx            ; push buffer pointer
15
call    puts                            ; call puts(buf)
16

17
mov     eax, DWORD PTR [ebp-12]         ; load canary from stack
18
xor     eax, DWORD PTR gs:20            ; XOR with original canary → result should be 0 if unchanged
19
jne     .L5                             ; if not zero (canary was modified), jump to fail handler
20

21
mov     ebx, DWORD PTR [ebp-4]          ; restore EBX
22
leave                                   ; restore stack frame
23
ret                                     ; return normally

.L5:
call    __stack_chk_fail                ; canary was corrupted → terminate program with error

The random value lives in gs:20. It is written onto the stack, and then at the end of the function the value on the stack is compared with the original canary in gs:20. If they are not equal, __stack_chk_fail is called.

In the console (Ubuntu 13.04 x86) it will show something like this:


*** buffer overflow detected ***: ./2_1 terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x63)[0xb7699bc3]
...
Aborted (core dumped)

In short: in Linux, the gs register always points to TLS (Thread-Local Storage — information specific to the current thread). In win32, the fs register plays the same role and points to TIB.

Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode)

Now let's see how LLVM checks the canary:


_main
; local variable offsets:
var_64 = -0x64  ; array element [0]
var_60 = -0x60  ; array element [1]
var_5C = -0x5C  ; array element [2]
; ... (more array elements)
canary = -0x14  ; canary value slot on stack
var_10 = -0x10  ; saved R8
PUSH    {R4-R7,LR}              ; save registers and link register
ADD     R7, SP, #0xC            ; set frame pointer
STR.W   R8, [SP,#0xC+var_10]!  ; save R8 on stack
SUB     SP, SP, #0x54           ; allocate local stack space
MOVW    R0, #aObjc_methtype     ; load address of canary source (TLS pointer)
MOVS    R2, #0
MOVT.W  R0, #0
MOVS    R5, #0
ADD     R0, PC
LDR.W   R8, [R0]                ; R8 = pointer to canary source
LDR.W   R0, [R8]                ; R0 = canary value
STR     R0, [SP,#0x64+canary]   ; save canary onto stack
; … (array is filled here with unrolled writes — no loop, each element written individually)
; === canary check at end of function ===
LDR.W   R0, [R8]                ; reload original canary value
LDR     R1, [SP,#0x64+canary]   ; load canary copy from stack
CMP     R0, R1                  ; compare: are they still equal?
ITTTT   EQ                      ; if-then block: execute next 4 instructions only if EQ is true
MOVEQ   R0, #0                  ; set return value to 0
ADDEQ   SP, SP, #0x54           ; deallocate local stack
LDREQ.W R8, [SP+0x64+var_64],#4 ; restore R8
POPEQ   {R4-R7,PC}              ; restore registers and return normally
BLX     ___stack_chk_fail       ; canary mismatch → call stack check fail (terminates program)

The first thing we notice: LLVM unrolled the loop and wrote every value into the array one by one, because it calculated that this is faster.

At the end of the function we see the canary comparison (the one on the stack and the one in R8). If they are equal → execute the 4-instruction block (ITTTT EQ) and exit. If they are not equal → go to ___stack_chk_fail which will terminate the program.

1.26.4 One more word about arrays

Now we understand why it is impossible to write something like this in C/C++ code:


void f(int size) {
    int a[size]; // ERROR in standard C/C++: size must be known at compile time
    ...
};

This is because the compiler must know the exact size of the array at compile time, in order to be able to allocate its space on the local stack.

If you need an array with a variable (runtime) size, you must allocate it with malloc(), and then use the allocated block as an array of the desired type:


int *a = (int*)malloc(size * sizeof(int)); // allocate size*4 bytes on the Heap at runtime
// now you can use a[0] through a[size-1] normally
// at the end you must call:
free(a); // release the allocated memory back to the heap

The space is allocated on the Heap (not the stack), and the size can be determined at runtime.

Alternatively, you can use a C99 feature (present in the ISO/IEC 9899:TC3 standard):


void f(int size) {
    int a[size]; // Variable Length Array (VLA) — valid in C99 and supported by GCC/Clang
    // works internally like alloca() — allocates on the stack
    // NOT supported by MSVC
    ...
}

This works internally like alloca() (allocates on the stack), but it is not available in all compilers (MSVC does not support it).

It is also possible to use Garbage Collection libraries for C, or Smart Pointer libraries in C++.

Share

If this article helped you, please share it with others!

CH1.26(part1) Arrays

https://v3nn00m.github.io/posts/re4b/chapter1_26_part1/

Author

0xV3n0m

Published at

2026-03-18

License

0xV3n0m's Personal Blog License

Some information may be outdated

CH1.26 (part2) Array of Pointers to Strings

CH1.23 - Floating-Point Unit (Part 2)

0xV3n0m

1.26 Arrays

x86

MSVC

GCC

ARM

Non-optimizing Keil 6/2013 (ARM mode)

Optimizing Keil 6/2013 (Thumb mode)

Non-optimizing GCC 4.9.1 (ARM64)

MIPS

1.26.2 Buffer overflow

Reading outside array bounds

Writing beyond array bounds

MSVC

GCC

1.26.3 Buffer overflow protection methods

Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode)

1.26.4 One more word about arrays

Table of Contents