mobile wallpaper 1mobile wallpaper 2mobile wallpaper 3
5400 words
27 minutes
CH1.26(part1) Arrays
2026-03-18

1.26 Arrays


Let's first define what an Array is: it is a group of variables stored next to each other in memory, all of the same type.

Here is a simple example:


#include <stdio.h> // include standard I/O header

int main() { int a[20]; // declare an array of 20 integers (20 × 4 = 80 bytes on stack) int i; // loop counter

for (i = 0; i &lt; 20; i++) // iterate from 0 to 19
a[i] = i * 2; // fill each element with index × 2
for (i = 0; i &lt; 20; i++) // iterate again to print
printf(&quot;a[%d]=%d\n&quot;, i, a[i]); // print index and value
return 0; // return 0 to indicate successful termination

}


x86

MSVC

Let's compile it and it will look like this:


_TEXT   SEGMENT
_i$     = -84       ; variable i (4 bytes) offset on stack
_a$     = -80       ; array a (80 bytes) offset on stack

_main PROC push ebp ; save base pointer mov ebp, esp ; set up stack frame sub esp, 84 ; allocate 84 bytes: 80 for array + 4 for i

; === First loop: fill the array ===
mov DWORD PTR _i$[ebp], 0 ; i = 0
jmp SHORT $LN6@main ; jump to loop condition check

LN5@main:;loopincrementlabelmoveax,DWORDPTRiLN5@main: ; loop increment label mov eax, DWORD PTR _i[ebp] ; load i into EAX add eax, 1 ; EAX = i + 1 mov DWORD PTR _i$[ebp], eax ; i = i + 1

LN6@main:;loopconditionlabelcmpDWORDPTRiLN6@main: ; loop condition label cmp DWORD PTR _i[ebp], 20 ; compare i with 20 jge SHORT $LN4@main ; if i >= 20, exit loop

mov ecx, DWORD PTR _i$[ebp] ; load i into ECX
shl ecx, 1 ; ECX = i * 2 (left shift by 1 bit = multiply by 2)
mov edx, DWORD PTR _i$[ebp] ; load i into EDX (used as index)
mov DWORD PTR _a$[ebp + edx*4], ecx ; a[i] = i*2 (address = base + i*4)
jmp SHORT $LN5@main ; jump back to increment

$LN4@main: ; after first loop

; === Second loop: print the array ===
mov DWORD PTR _i$[ebp], 0 ; i = 0
jmp SHORT $LN3@main ; jump to condition check

LN2@main:;loopincrementlabelmoveax,DWORDPTRiLN2@main: ; loop increment label mov eax, DWORD PTR _i[ebp] add eax, 1 mov DWORD PTR _i$[ebp], eax ; i = i + 1

LN3@main:;loopconditionlabelcmpDWORDPTRiLN3@main: ; loop condition label cmp DWORD PTR _i[ebp], 20 ; compare i with 20 jge SHORT $LN1@main ; if i >= 20, exit loop

mov ecx, DWORD PTR _i$[ebp] ; load i as index
mov edx, DWORD PTR _a$[ebp + ecx*4] ; load a[i] into EDX
push edx ; push a[i] as third argument to printf
mov eax, DWORD PTR _i$[ebp] ; load i
push eax ; push i as second argument to printf
push OFFSET $SG2463 ; push format string &quot;a[%d]=%d\n&quot;
call _printf ; call printf
add esp, 12 ; clean up stack (3 arguments × 4 bytes)
jmp SHORT $LN2@main ; jump back to increment

$LN1@main: ; after second loop xor eax, eax ; return value = 0 mov esp, ebp ; restore stack pointer pop ebp ; restore base pointer ret 0 ; return _main ENDP

Nothing very strange, just two loops: the first loop fills the array (multiplying each element by 2), and the second loop prints the elements. But let's analyze the Assembly together a bit more for better understanding.

Imagine you have 20 drawers in a cabinet, all next to each other, and each drawer holds one number of type int. Therefore each drawer takes 4 bytes (because we said before that int in 32-bit takes 4 bytes), meaning the entire array takes 20 × 4 = 80 contiguous bytes.

Let's start with the instruction sub esp, 84 which allocates 84 bytes on the stack. Why 84 and not 80? In short, there are 4 extra bytes for the variable i.

Now let's go through the first loop — filling the array:

* mov _i$, 0 — as we know, start with i = 0

* cmp _i$, 20 — if i is greater than or equal to 20, exit the loop

* shl ecx, 1 — the fastest way to multiply i × 2 (instead of a regular multiply, a left shift by 1 bit = ×2)

* calculating the address of a[i]: mov DWORD PTR _a$[ebp + edx*4], ecx

An important question — why did the compiler use shl instead of imul? Because it is safer on the stack (no overflow if the array is small).

Let's try this example in x32dbg. We can see how the array is being filled: each element is a 32-bit int word and its value is the index multiplied by 2.

array_1

Since this array lives on the stack, we can see all 20 of its elements there.


GCC

This is what GCC produced:


main proc near
; DATA XREF: _start+17
var_70  = dword ptr -70h  ; stack slot used for printf arg (value)
var_6C  = dword ptr -6Ch  ; stack slot used for printf arg (index)
var_68  = dword ptr -68h  ; stack slot used for printf arg (format string pointer)
i_2     = dword ptr -54h  ; array a starts here (20 ints = 80 bytes)
i       = dword ptr -4    ; loop counter i
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h ; align stack to 16 bytes
sub esp, 70h ; allocate 112 bytes on stack
mov [esp+70h+i], 0 ; i = 0
jmp short loc_804840A ; jump to loop condition

loc_80483F7: ; loop body mov eax, [esp+70h+i] ; load i into EAX (used as array index) mov edx, [esp+70h+i] ; load i into EDX add edx, edx ; EDX = i * 2 (add to itself = multiply by 2) mov [esp+eax4+70h+i_2], edx ; a[i] = i2 (address = array_base + i*4) add [esp+70h+i], 1 ; i++

loc_804840A: ; loop condition cmp [esp+70h+i], 13h ; compare i with 19 (0x13) jle short loc_80483F7 ; if i <= 19, continue loop

mov [esp+70h+i], 0 ; i = 0 (reset for second loop)
jmp short loc_8048441 ; jump to second loop condition

loc_804841B: ; second loop body mov eax, [esp+70h+i] ; load i as index mov edx, [esp+eax*4+70h+i_2]; load a[i] into EDX mov eax, offset aADD ; load address of format string “a[%d]=%d\n” mov [esp+70h+var_68], edx ; push a[i] as third arg on stack mov edx, [esp+70h+i] ; load i mov [esp+70h+var_6C], edx ; push i as second arg on stack mov [esp+70h+var_70], eax ; push format string as first arg on stack call _printf ; call printf add [esp+70h+i], 1 ; i++

loc_8048441: ; second loop condition cmp [esp+70h+i], 13h ; compare i with 19 jle short loc_804841B ; if i <= 19, continue loop

mov eax, 0 ; return value = 0
leave ; restore stack frame
retn ; return

main endp

By the way, the variable a is of type int (a pointer to int). You can pass a pointer to an array to another function, but it is more accurate to say "a pointer to the first element of the array is being passed" (and the rest of the addresses are calculated in a straightforward way). If you do indexing on this pointer like a[idx], idx is just added to the pointer and the element at that address is returned.

An interesting example: a string like "string" is an array of characters of type const char[]. You can do indexing on this pointer too. That is why it is possible to write "string"[i] and that is a completely valid expression in C/C++!


ARM

Non-optimizing Keil 6/2013 (ARM mode)


EXPORT _main
_main
    STMFD   SP!, {R4,LR}            ; save R4 and link register on stack
    SUB     SP, SP, #0x50           ; allocate 80 bytes (0x50) for 20 int array
; First loop: fill the array
MOV R4, #0 ; i = 0 (R4 is the loop counter)
B loc_4A0 ; jump to loop condition

loc_494: ; loop body MOV R0, R4,LSL#1 ; R0 = R4 * 2 (left shift by 1 = multiply by 2) STR R0, [SP,R4,LSL#2] ; store R0 at address SP + R44 (a[i] = i2) ADD R4, R4, #1 ; i++

loc_4A0: ; loop condition CMP R4, #20 ; compare i with 20 BLT loc_494 ; if i < 20, continue loop

; Second loop: print the array
MOV R4, #0 ; i = 0 (reset counter)
B loc_4C4 ; jump to loop condition

loc_4B0: ; loop body LDR R2, [SP,R4,LSL#2] ; R2 = a[i] (load from SP + i*4) MOV R1, R4 ; R1 = i (second argument to printf) ADR R0, aADD ; R0 = address of format string “a[%d]=%d\n” BL __2printf ; call printf(format, i, a[i]) ADD R4, R4, #1 ; i++

loc_4C4: ; loop condition CMP R4, #20 ; compare i with 20 BLT loc_4B0 ; if i < 20, continue loop

MOV R0, #0 ; return value = 0
ADD SP, SP, #0x50 ; deallocate array space
LDMFD SP!, {R4,PC} ; restore R4 and return

The int type takes 32 bits (4 bytes), so 20 elements need 80 bytes (0x50). The SUB SP, SP, #0x50 instruction at the beginning of the function allocates exactly that space.

In both loops, the counter i lives in R4. The number written into the array = i*2, and this is translated into MOV R0, R4,LSL#1 (left shift by 1 bit = multiply by 2). STR R0, [SP,R4,LSL#2] writes to the array. The element address = SP (start of array) + R4*4 (because each element is 4 bytes).


Optimizing Keil 6/2013 (Thumb mode)


_main
    PUSH    {R4,R5,LR}              ; save R4, R5 and link register
    SUB     SP, SP, #0x54           ; allocate 84 bytes: 80 for array + 4 extra (alignment)
MOVS R0, #0 ; i = 0
MOV R5, SP ; R5 = base address of the array

loc_1CE: ; first loop body LSLS R1, R0, #1 ; R1 = i * 2 (left shift = multiply by 2) LSLS R2, R0, #2 ; R2 = i * 4 (byte offset into array) ADDS R0, R0, #1 ; i++ CMP R0, #20 ; compare i with 20 STR R1, [R5,R2] ; store i2 at address (R5 + i4) → a[i] = i*2 BLT loc_1CE ; if i < 20, continue loop

MOVS R4, #0 ; i = 0 (reset for second loop)

loc_1DC: ; second loop body LSLS R0, R4, #2 ; R0 = i * 4 (byte offset) LDR R2, [R5,R0] ; R2 = a[i] (load from base + i*4) MOVS R1, R4 ; R1 = i (second argument to printf) ADR R0, aADD ; R0 = address of format string “a[%d]=%d\n” BL __2printf ; call printf(format, i, a[i]) ADDS R4, R4, #1 ; i++ CMP R4, #20 ; compare i with 20 BLT loc_1DC ; if i < 20, continue loop

MOVS R0, #0 ; return value = 0
ADD SP, SP, #0x54 ; deallocate stack space
POP {R4,R5,PC} ; restore and return

The Thumb code is similar to the previous one but with dedicated shift instructions (LSLS). The compiler allocated 84 bytes instead of 80, the last 4 bytes are unused.


Non-optimizing GCC 4.9.1 (ARM64)

Listing 1.230


.LC0:
    .string "a[%d]=%d\n"             ; format string for printf

main: stp x29, x30, [sp, -112]! ; save frame pointer and link register, allocate 112 bytes add x29, sp, 0 ; set frame pointer

str wzr, [x29,108] ; i = 0 (wzr is always zero)
b .L2 ; jump to loop condition

.L3: ; first loop body ldr w0, [x29,108] ; load i into W0 lsl w2, w0, 1 ; W2 = i * 2 (left shift by 1) add x0, x29, 24 ; X0 = base address of array a (x29 + 24) ldrsw x1, [x29,108] ; load i as 64-bit signed value into X1 str w2, [x0,x1,lsl 2] ; a[i] = i2 (store at base + i4) ldr w0, [x29,108] ; load i add w0, w0, 1 ; W0 = i + 1 str w0, [x29,108] ; i++

.L2: ; first loop condition ldr w0, [x29,108] ; load i cmp w0, 19 ; compare i with 19 ble .L3 ; if i <= 19, continue loop

str wzr, [x29,108] ; i = 0 (reset for second loop)
b .L4 ; jump to second loop condition

.L5: ; second loop body add x0, x29, 24 ; X0 = base address of array ldrsw x1, [x29,108] ; load i as 64-bit index ldr w2, [x0,x1,lsl 2] ; W2 = a[i] (load from base + i*4) adrp x0, .LC0 ; load page address of format string add x0, x0, :lo12:.LC0 ; add page offset to get full address ldr w1, [x29,108] ; W1 = i (second argument to printf) bl printf ; call printf(format, i, a[i]) ldr w0, [x29,108] ; load i add w0, w0, 1 ; W0 = i + 1 str w0, [x29,108] ; i++

.L4: ; second loop condition ldr w0, [x29,108] ; load i cmp w0, 19 ; compare i with 19 ble .L5 ; if i <= 19, continue loop

mov w0, 0 ; return value = 0
ldp x29, x30, [sp], 112 ; restore frame pointer and link register
ret ; return

MIPS

The function uses many (S-registers) which must be preserved, so their values are saved at the beginning of the function (prologue) and restored at the end (epilogue).

Listing 1.231: Optimizing GCC 4.4.5 (IDA)


main:
    ; function prologue: save registers and set up stack
    lui     $gp, (__gnu_local_gp >> 16)     ; load upper 16 bits of global pointer
    addiu   $sp, -0x80                      ; allocate 128 bytes on stack
    la      $gp, (__gnu_local_gp & 0xFFFF)  ; load lower 16 bits of global pointer
    sw      $ra, 0x80+var_4($sp)            ; save return address
    sw      $s3, 0x80+var_8($sp)            ; save S3
    ; ... (save more S-registers)
addiu $s1, $sp, 0x80+var_68 ; S1 = base address of array a on stack
move $v1, $s1 ; V1 = pointer to current array element
move $v0, $zero ; V0 = 0 (value to write, starts at 0)
li $a0, 0x28 ; A0 = 40 (loop stops when V0 == 40, i.e. i*2 reaches 40)

loc_34: ; first loop body sw v0,0(v0, 0(v1) ; store current value (i*2) into current element addiu v0,2;V0+=2(nextvalue=(i+1)2)bnev0, 2 ; V0 += 2 (next value = (i+1)*2) bne v0, a0,loc34;ifV0!=40,continueloopaddiua0, loc_34 ; if V0 != 40, continue loop addiu v1, 4 ; delay slot: advance pointer by 4 bytes (next element)

la $s3, $LC0 ; S3 = address of format string &quot;a[%d]=%d\n&quot;
move $s0, $zero ; S0 = 0 (i counter for second loop)
li $s2, 0x14 ; S2 = 20 (loop limit)

loc_54: ; second loop body lw t9, (printf &amp; 0xFFFF)(gp) ; load address of printf from GOT lw a2,0(a2, 0(s1) ; A2 = a[i] (load current element) move a1,a1, s0 ; A1 = i (second argument) move a0,a0, s3 ; A0 = format string (first argument) jalr t9;callprintfaddiut9 ; call printf addiu s0, 1 ; delay slot: i++ lw gp,0x80+var70(gp, 0x80+var_70(sp) ; restore global pointer after call bne s0,s0, s2, loc_54 ; if i != 20, continue loop addiu $s1, 4 ; delay slot: advance array pointer by 4 bytes

; epilogue: restore registers and return
jr $ra ; return
addiu $sp, 0x80 ; delay slot: deallocate stack

$LC0: .ascii “a[%d]=%d\n” ; format string constant

An interesting thing: in both loops, the first loop does not need i at all! It only needs the value (i*2) which increases by +2 each time, and the memory address which increases by +4.

So we see two variables: one in $v0 that increases by +2, and another in $v1 that increases by +4.

The second loop is the one that calls printf and tells the user the value of i, which is why there is a variable that increases by +1 (in $s0) and an address that increases by +4 (in $s1).

This reminds us of loop optimizations that we talked about before. The goal is to completely eliminate multiplication operations.

1.26.2 Buffer overflow

Bufferoverflow

Reading outside array bounds

When dealing with an array like array[index], if you look carefully at the code that the compiler generates, you will notice there is no bounds checking at all — meaning nothing checks whether the index is less than 20 or not.

What if the index becomes 20 or greater? This is exactly one of the most criticized things about C/C++. Here is code that compiles and runs just fine:


#include <stdio.h> // include standard I/O header

int main() { int a[20]; // array of 20 integers int i; // loop counter

for (i = 0; i &lt; 20; i++) // fill array with i*2
a[i] = i * 2;
printf(&quot;a[20]=%d\n&quot;, a[20]); // read one element BEYOND the array bounds (index 20 is invalid)
return 0;

}

Compile result (Non-optimizing MSVC 2008)

Listing 1.232


$SG2474 DB 'a[20]=%d', 0aH, 00H   ; format string with newline and null terminator

_i=84;size=4;variableioffsetonstacka = -84 ; size = 4 ; variable i offset on stack a = -80 ; size = 80 ; array a offset on stack (20 × 4 bytes)

_main PROC push ebp ; save base pointer mov ebp, esp ; set up stack frame sub esp, 84 ; allocate 84 bytes: 80 for array + 4 for i

mov DWORD PTR _i$[ebp], 0 ; i = 0
jmp SHORT $LN3@main ; jump to loop condition

LN2@main:;loopincrementmoveax,DWORDPTRiLN2@main: ; loop increment mov eax, DWORD PTR _i[ebp] ; load i add eax, 1 ; i + 1 mov DWORD PTR _i$[ebp], eax ; i++

LN3@main:;loopconditioncmpDWORDPTRiLN3@main: ; loop condition cmp DWORD PTR _i[ebp], 20 ; compare i with 20 jge SHORT $LN1@main ; if i >= 20, exit loop

mov ecx, DWORD PTR _i$[ebp] ; load i into ECX
shl ecx, 1 ; ECX = i * 2
mov edx, DWORD PTR _i$[ebp] ; load i as index
mov DWORD PTR _a$[ebp+edx*4], ecx ; a[i] = i*2
jmp SHORT $LN2@main ; loop back

LN1@main:moveax,DWORDPTRaLN1@main: mov eax, DWORD PTR _a[ebp+80] ; read from a[20] — 80 bytes past start of array (OUT OF BOUNDS!) push eax ; push value as argument to printf push OFFSET $SG2474 ; push format string “a[20]=%d\n” call DWORD PTR __imp__printf ; call printf add esp, 8 ; clean up stack (2 arguments)

xor eax, eax ; return value = 0
mov esp, ebp ; restore stack pointer
pop ebp ; restore base pointer
ret 0 ; return

_main ENDP

The result that came out was a[20]=1638280, and it is obviously clear that this number is not an array value at all — it is a value that was sitting on the stack 80 bytes past the first element of a[].

Let's try to figure out where that value came from. We will do the same example on x32dbg and look at the value that is sitting exactly after the last element of the array.

What is this?

overflow_1

From the look of the stack, this is the saved value of register EBP (which was saved at the beginning of the function).

Let's follow further and see how this value comes back:

overflow_2

Indeed, how could it be otherwise? The compiler could generate extra code to check that the index is always within the array bounds (like what happens in higher-level languages), but that would make the code slower.


Writing beyond array bounds

Alright, we read values from the stack illegally, but what if we could write something into it?

Here is what happens:


#include <stdio.h> // include standard I/O header

int main() { int a[20]; // array of only 20 integers int i;

for (i = 0; i &lt; 30; i++) // loop goes to 30 — writes 10 elements BEYOND array bounds!
a[i] = i; // overwrites stack memory past the array
return 0;

}

MSVC

Here is what we found:


_i$     = -84       ; variable i offset on stack
a$      = -80       ; array a offset on stack

_main PROC push ebp ; save base pointer mov ebp, esp ; set up stack frame sub esp, 84 ; allocate 84 bytes on stack

mov DWORD PTR _i$[ebp], 0 ; i = 0
jmp SHORT $LN3@main ; jump to loop condition

LN2@main:;loopincrementmoveax,DWORDPTRiLN2@main: ; loop increment mov eax, DWORD PTR _i[ebp] ; load i add eax, 1 ; i + 1 mov DWORD PTR _i$[ebp], eax ; i++

LN3@main:;loopconditioncmpDWORDPTRiLN3@main: ; loop condition cmp DWORD PTR _i[ebp], 30 ; compare i with 30 (NOT 20 — loop goes beyond array!) jge SHORT $LN1@main ; if i >= 30, exit loop

mov ecx, DWORD PTR _i$[ebp] ; load i as index
mov edx, DWORD PTR _i$[ebp] ; load i as value (redundant load — could be optimized)
mov DWORD PTR _a$[ebp+ecx*4], edx ; a[i] = i (writes past end of array for i &gt;= 20!)
jmp SHORT $LN2@main ; loop back

$LN1@main: xor eax, eax ; return value = 0 mov esp, ebp ; restore stack pointer pop ebp ; restore base pointer (but EBP was overwritten!) ret 0 ; return (but return address was overwritten too!) _main ENDP

The result is of course that the program crashes. Let's see exactly where it crashes. We will run it on x32dbg, and to compile it without issues we will use this command:


cl /Od /Zi /GS- /sdl- overflow.c /link /SUBSYSTEM:CONSOLE
; /Od  = no optimization
; /Zi  = generate debug info
; /GS- = disable stack security cookie (buffer security check)
; /sdl-= disable additional security checks

  

We load the program in x32dbg and follow until it writes all 30 elements:

overflow_3

We follow until the end of the function:

overflow_4

When we focus on the registers we will find that EIP is now 0x16, and that is an illegal address for code (at least in Win32)! We ended up there without intending to. Also EBP became 0x15, and ECX and EDX became 0x1D.

Let's study the stack shape more carefully. After control entered main(), the value of EBP was saved on the stack. Then 84 bytes were allocated for the array and variable i (20+1 elements × 4 bytes). ESP now points to the variable _i, and after any new PUSH, the new thing appears next to it.

This is the shape of the stack while inside main():

stack layout diagram

The instruction a[19]=something writes the last int inside the array bounds (still safe). The instruction a[20]=something writes to the location that holds the saved value of EBP.

Look at the register state at the time of the crash: in our case, 20 was written into element number 20. At the end of the function, the epilogue restores the old EBP value (decimal 20 = 0x14 hex). Then RET executes, which is essentially like POP EIP. RET takes the return address from the stack (the address in the CRT that called main()), and now in its place is written 21 (0x16 hex). The CPU tries to execute at address 0x16, but there is no executable code there, so an exception occurs.

Welcome to the world of Buffer Overflow!

Replace an int array with a string (char array), intentionally craft a long string, and pass it to a program or function that does not check the string length and copies it into a small buffer, and you can redirect the program to jump to any address you want. It is not easy in practice, but that is the core idea.


GCC

Let's try the same code:


main proc near
a       = dword ptr -54h  ; array a offset on stack (20 ints = 80 bytes)
i       = dword ptr -4    ; variable i offset on stack
push ebp
mov ebp, esp
sub esp, 60h ; allocate 96 bytes on stack
mov [ebp+i], 0 ; i = 0
jmp short loc_80483D1 ; jump to loop condition

loc_80483C3: ; loop body mov eax, [ebp+i] ; load i as index mov edx, [ebp+i] ; load i as value mov [ebp+eax*4+a], edx ; a[i] = i (writes past array end for i >= 20!) add [ebp+i], 1 ; i++

loc_80483D1: ; loop condition cmp [ebp+i], 1Dh ; compare i with 29 (0x1D); loop runs 0..29 = 30 iterations jle short loc_80483C3 ; if i <= 29, continue loop

mov eax, 0 ; return value = 0
leave ; restore stack frame (EBP already corrupted!)
retn ; return (return address already corrupted!)

main endp

When we run it on Linux it gives us a Segmentation fault (core dumped).

Let's do this practically on Linux and use it as a GDB exercise. First we compile it with this command:


gcc -m32 -O0 -ggdb -fno-stack-protector -z execstack test.c -o overflow
; -m32              = compile as 32-bit
; -O0               = no optimization
; -ggdb             = include GDB debug symbols
; -fno-stack-protector = disable stack canary (security cookie)
; -z execstack      = allow executable stack (needed for this demo)

  

And we try to run it:

overflow_5

And indeed it gave a Segmentation fault.

Now let's run it on GDB. First we do gdb overflow then run (or its shortcut r):

overflow_6

We will find it stopped by itself. Now let's look at the registers using info registers:

overflow_7

The register values are slightly different from Win32 because the stack layout is slightly different.

1.26.3 Buffer overflow protection methods

overflow_1

Let's talk about ways to protect against buffer overflow even if the C/C++ programmer is not paying attention.

MSVC has options like these:

* /RTCs → Stack Frame runtime checking

* /GZ → Enable stack checks (same as /RTCs)

One of the methods is that it writes a random value between the local variables on the stack at the beginning of the function (prologue), and then checks it at the end of the function (epilogue) before exiting.

For example, if that value changed → do not execute the last RET instruction, but instead stop the program (or suspend it). The program will stop, but that is much better than someone attacking your machine remotely.

This random value is called a "Canary", and this is connected to the canaries that mine workers used in the old days. They used canaries to detect toxic gases quickly. The canary is very sensitive to gases — if there is danger it becomes very distressed or dies.

If we compile the first example with the /RTC1 and /RTCs options, we will find at the end of the function a call to the function @_RTC_CheckStackVars@8 which checks that the "canary" is still intact.

Now let's see how it is done in GCC:


#ifdef __GNUC__
#include <alloca.h>   // GCC: use alloca from alloca.h
#else
#include <malloc.h>   // MSVC: use alloca from malloc.h
#endif
#include <stdio.h>

void f() { char buf = (char)alloca(600); // allocate 600 bytes on the stack dynamically

#ifdef GNUC snprintf(buf, 600, “hi! %d, %d, %d\n”, 1, 2, 3); // GCC version #else _snprintf(buf, 600, “hi! %d, %d, %d\n”, 1, 2, 3); // MSVC version #endif

puts(buf); // print the result

}

Without any extra options, GCC 4.7.3 inserts canary checking code automatically:

Listing 1.235: GCC 4.7.3


.LC0: .string "hi! %d, %d, %d\n"   ; format string constant

f: push ebp ; save base pointer mov ebp, esp ; set up stack frame push ebx ; save EBX sub esp, 676 ; allocate stack space (600 + alignment + canary) lea ebx, [esp+39] ; EBX = aligned buffer start address and ebx, -16 ; align buffer to 16-byte boundary

mov DWORD PTR [esp+20], 3 ; push argument 3
mov DWORD PTR [esp+16], 2 ; push argument 2
mov DWORD PTR [esp+12], 1 ; push argument 1
mov DWORD PTR [esp+8], OFFSET FLAT:.LC0 ; push format string address
mov DWORD PTR [esp+4], 600 ; push buffer size
mov DWORD PTR [esp], ebx ; push buffer pointer
mov eax, DWORD PTR gs:20 ; ← load canary value from thread-local storage (TLS)
mov DWORD PTR [ebp-12], eax ; save canary onto stack
xor eax, eax ; clear EAX (security: don&#39;t leave canary in register)
call _snprintf ; call snprintf(buf, 600, format, 1, 2, 3)
mov DWORD PTR [esp], ebx ; push buffer pointer
call puts ; call puts(buf)
mov eax, DWORD PTR [ebp-12] ; load canary from stack
xor eax, DWORD PTR gs:20 ; XOR with original canary → result should be 0 if unchanged
jne .L5 ; if not zero (canary was modified), jump to fail handler
mov ebx, DWORD PTR [ebp-4] ; restore EBX
leave ; restore stack frame
ret ; return normally

.L5: call __stack_chk_fail ; canary was corrupted → terminate program with error

The random value lives in gs:20. It is written onto the stack, and then at the end of the function the value on the stack is compared with the original canary in gs:20. If they are not equal, __stack_chk_fail is called.

In the console (Ubuntu 13.04 x86) it will show something like this:


*** buffer overflow detected ***: ./2_1 terminated
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(__fortify_fail+0x63)[0xb7699bc3]
...
Aborted (core dumped)

  

In short: in Linux, the gs register always points to TLS (Thread-Local Storage — information specific to the current thread). In win32, the fs register plays the same role and points to TIB.


Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode)

Now let's see how LLVM checks the canary:


_main
; local variable offsets:
var_64 = -0x64  ; array element [0]
var_60 = -0x60  ; array element [1]
var_5C = -0x5C  ; array element [2]
; ... (more array elements)
canary = -0x14  ; canary value slot on stack
var_10 = -0x10  ; saved R8

PUSH {R4-R7,LR} ; save registers and link register ADD R7, SP, #0xC ; set frame pointer STR.W R8, [SP,#0xC+var_10]! ; save R8 on stack SUB SP, SP, #0x54 ; allocate local stack space

MOVW R0, #aObjc_methtype ; load address of canary source (TLS pointer) MOVS R2, #0 MOVT.W R0, #0 MOVS R5, #0 ADD R0, PC LDR.W R8, [R0] ; R8 = pointer to canary source LDR.W R0, [R8] ; R0 = canary value STR R0, [SP,#0x64+canary] ; save canary onto stack

; … (array is filled here with unrolled writes — no loop, each element written individually)

; === canary check at end of function === LDR.W R0, [R8] ; reload original canary value LDR R1, [SP,#0x64+canary] ; load canary copy from stack CMP R0, R1 ; compare: are they still equal? ITTTT EQ ; if-then block: execute next 4 instructions only if EQ is true MOVEQ R0, #0 ; set return value to 0 ADDEQ SP, SP, #0x54 ; deallocate local stack LDREQ.W R8, [SP+0x64+var_64],#4 ; restore R8 POPEQ {R4-R7,PC} ; restore registers and return normally BLX ___stack_chk_fail ; canary mismatch → call stack check fail (terminates program)

The first thing we notice: LLVM unrolled the loop and wrote every value into the array one by one, because it calculated that this is faster.

At the end of the function we see the canary comparison (the one on the stack and the one in R8). If they are equal → execute the 4-instruction block (ITTTT EQ) and exit. If they are not equal → go to ___stack_chk_fail which will terminate the program.


1.26.4 One more word about arrays

onemorewordaboutarrays

Now we understand why it is impossible to write something like this in C/C++ code:


void f(int size) {
    int a[size]; // ERROR in standard C/C++: size must be known at compile time
    ...
};

  

This is because the compiler must know the exact size of the array at compile time, in order to be able to allocate its space on the local stack.

If you need an array with a variable (runtime) size, you must allocate it with malloc(), and then use the allocated block as an array of the desired type:


int *a = (int*)malloc(size * sizeof(int)); // allocate size*4 bytes on the Heap at runtime
// now you can use a[0] through a[size-1] normally
// at the end you must call:
free(a); // release the allocated memory back to the heap

  

The space is allocated on the Heap (not the stack), and the size can be determined at runtime.

Alternatively, you can use a C99 feature (present in the ISO/IEC 9899:TC3 standard):


void f(int size) {
    int a[size]; // Variable Length Array (VLA) — valid in C99 and supported by GCC/Clang
    // works internally like alloca() — allocates on the stack
    // NOT supported by MSVC
    ...
}

  

This works internally like alloca() (allocates on the stack), but it is not available in all compilers (MSVC does not support it).

It is also possible to use Garbage Collection libraries for C, or Smart Pointer libraries in C++.

Share

If this article helped you, please share it with others!

CH1.26(part1) Arrays
https://v3nn00m.github.io/posts/re4b/chapter1_26_part1/
Author
0xV3n0m
Published at
2026-03-18

Some information may be outdated

Table of Contents