Post

Reverse Engineering for Beginners : 1.12 scanf()(CH1.11) {Part1}

Reverse Engineering for Beginners : 1.12 scanf()(CH1.11) {Part1}

scanf()


Let's make an example like this on scanf()

C

#include <stdio.h>    // Include the standard I/O header

int main()                // Program entry point
{
    int x;                // Declare an integer variable x
    printf("Enter X:\n"); // Print prompt for user input
    scanf("%d", &x);      // Read integer input from user and store in x (using address of x)
    printf("You entered %d...\n", x); // Print the entered value
    return 0;             // Return success
}

  

The author explained at that time and said that it is not smart to use scanf() to deal with the user these days. But we can, nevertheless, illustrate how to pass a pointer to an int type variable

About pointers


The author said at that time and explained and said that pointers are one of the basic concepts in computer science because simply when you have large data (like arrays or objects), passing them as a copy to another function takes time and space But if you send only its address, then you save time and space and the function can access it directly He gave an example on the Pointer and said: If you are going to print a string on the console, it is much easier to send its address to the OS kernel. Also, if the function that is called needs to modify the large array or structure that it received as a parameter and then return the entire structure, then the matter becomes almost absurd. So the simplest thing we can do is to send the address of the array or structure to the called function, and leave it to change what it needs to change.

Well, if you still don't understand, I will explain the matter to you more:

This is an example I used to clarify things a bit

C

#include <stdio.h>    // Include the standard I/O header

void changeValue(int *p) {  // Function that takes a pointer to an int
    *p = 20;                // Dereference the pointer and set the value at that address to 20
}

int main() {                // Program entry point
    int x = 5;              // Declare and initialize x to 5
    printf("Before change: %d\n", x);  // Print the value before modification

    // Pass the address of x to the function
    changeValue(&x);        // &x gets the address of x

    printf("After change: %d\n", x);   // Print the value after modification
    return 0;               // Return success
}

  

To understand what happened here one by one, focus with me:
In the main function, we have a variable with value 5, then we print this value, then we send the address of x to a function called changeValue using &x, then inside the function called changeValue, we use the Pointer to change the value that this address points to, so the value of X changes from 5 to 20 So here, instead of sending the value of x to another function, we send its address, and this allows the function to change the value directly in memory without needing to return the modified value from the function

In x86, the address is represented as a 32-bit number (taking 4 bytes), and in x86-64 it is 64-bit (taking 8 bytes). By the way, this is the reason that makes some people annoyed by the transition to x86-64 β€” all pointers in x64 architecture need double space, including the cache memory which is "expensive" memory

We can work with Untyped pointers but with a little effort

Well, let's first understand what Untyped pointers are and then explain with effort why Untyped pointers are pointers that are not linked to a specific type of data. This means that you do not have to specify the type of data that this pointer will point to

We can take an example in C, which is that there is a function called memcpy() that copies a block from one place in memory to another, it takes 2 pointers of type void* as arguments, because it is impossible to predict the type of data you want to copy. The type of data is not important, what matters is the size of the block

And also I will make you understand more and give you the example in the code so you don't get lost from me Look, this is a code here in the function that has memcpy()

C

#include <stdio.h>    // Include the standard I/O header
#include <string.h>  // Include the string header for memcpy

int main() {                // Program entry point
    int source[] = {1, 2, 3, 4, 5};  // Array of integers as source
    int destination[5];     // Destination array to copy into

    // Use memcpy to copy data from source to destination
    memcpy(destination, source, sizeof(source));  // Copy the entire size of source array

    // Print the destination after copying
    printf("Values in destination: ");  // Print label
    for (int i = 0; i < 5; i++) {       // Loop through the array
        printf("%d ", destination[i]);  // Print each element
    }
    printf("\n");               // New line

    return 0;                   // Return success
}

  

Here memcpy() takes 3 things:

  • destination - and to simplify things for you, the place where we will copy the data to.
  • source - and this is the place from which we will copy the data.
  • size - how much space we will transfer (number of bytes).

The void* pointer: memcpy() function takes pointers of type void* so that it can copy any type of data, not necessarily numbers

Pointers are also used a lot when a function needs to return more than one value and we will explain this later

The scanf() function β€” an example of this case.

Besides needing to say how many values were read successfully, it also needs to return all these values.

In C/C++ the pointer type is required only for compile-time type checking.

Inside the compiled code there is no information about pointer types at all

x86

X86 Scanf()

Here is what we get after compiling with MSVC 2010:

Assembly

; ================================
;   x86 MSVC 2010 Output
; ================================

CONST SEGMENT                          ; Start of constant data segment
    $SG3831 DB 'Enter X:', 0Ah, 00h    ; Format string "Enter X:\n" (null-terminated)
    $SG3832 DB '%d', 00h               ; Format string "%d" for scanf (null-terminated)
    $SG3833 DB 'You entered %d...', 0Ah, 00h  ; Format string "You entered %d...\n" (null-terminated)
CONST ENDS                             ; End of constant data segment

PUBLIC _main                           ; Declare main as public
EXTRN _scanf:PROC                      ; External declaration for scanf function
EXTRN _printf:PROC                     ; External declaration for printf function

; Function compile flags: /Odtp         ; Compiler flags (debug info, etc.)

_TEXT SEGMENT                         ; Start of code segment

_x$ = -4                               ; Define macro for local variable x offset (-4 from EBP)

_main PROC                             ; Start of main procedure
    push ebp                           ; Save old EBP (base pointer) on stack
    mov  ebp, esp                      ; Set EBP to current ESP (new stack frame)

    push ecx                           ; Reserve 4 bytes on stack for local var x (not saving ECX, no pop later)

    ; printf("Enter X:\n");
    push OFFSET $SG3831                ; Push address of "Enter X:\n" string
    call _printf                       ; Call printf
    add  esp, 4                        ; Clean stack (1 argument Γ— 4 bytes)

    ; scanf("%d", &x)
    lea  eax, [ebp+_x$]                ; Load effective address of x (EBP - 4) into EAX
    push eax                           ; Push address of x (2nd argument for scanf)
    push OFFSET $SG3832                ; Push address of "%d" string (1st argument)
    call _scanf                        ; Call scanf
    add  esp, 8                        ; Clean stack (2 arguments Γ— 4 bytes)

    ; printf("You entered %d...\n", x)
    mov  ecx, [ebp+_x$]                ; Load value of x (from EBP - 4) into ECX
    push ecx                           ; Push value of x (2nd argument)
    push OFFSET $SG3833                ; Push address of "You entered %d...\n" (1st argument)
    call _printf                       ; Call printf
    add  esp, 8                        ; Clean stack (2 arguments Γ— 4 bytes)

    ; return 0
    xor eax, eax                       ; Set EAX to 0 (return value)
    mov esp, ebp                       ; Restore ESP from EBP (clean locals)
    pop ebp                            ; Restore old EBP
    ret 0                              ; Return from function
_main ENDP                             ; End of main procedure

_TEXT ENDS                             ; End of code segment

  

Here the X was a local variable

According to C/C++ standard it must be visible only inside this function and not from any other external scope.

Traditionally, the local variables are stored on the stack. There are possible other ways to store them, but in x86 this is the way.

In the instruction which is

Assembly

push ecx                               ; Reserve 4 bytes on stack for local variable x (decrements ESP by 4; not saving ECX since no pop ecx later)

  

Here this allocates 4 bytes value for the variable X and it is here not existing to save the state of ECX

Because originally there is no POP ECX at the end of the Function

And the variable X is accessed with the help of the macro _x$ (its value -4) and the register EBP which points to the current frame.

During the execution of the function, EBP is pointing to the current stack frame, and this facilitates access to the local variables and the arguments through EBP+offset .

We can use ESP for the same purpose, but this is not comfortable because it changes a lot. The value of EBP can be seen as if it is a "frozen" copy of the value of ESP at the beginning of the execution of the function.

Here is a typical shape for the stack frame in a 32-bit environment:

Stack Frame


---------------------------------------------------------------
| EBP-8                  |   local variable #2  (var_8)        |  ; Example slot for another local variable (8 bytes below EBP)
---------------------------------------------------------------
| EBP-4                  |   local variable #1  (x = var_4)    |  ; Slot for variable x (4 bytes below EBP)
---------------------------------------------------------------
| EBP                    |   saved EBP                         |  ; Saved previous EBP value (at EBP+0)
---------------------------------------------------------------
| EBP+4                  |   return address                    |  ; Return address after function call
---------------------------------------------------------------
| EBP+8                  |   argument #1 (arg_0)               |  ; First function argument
---------------------------------------------------------------
| EBP+0xC                |   argument #2 (arg_4)               |  ; Second function argument
---------------------------------------------------------------
| EBP+0x10               |   argument #3 (arg_8)               |  ; Third function argument
---------------------------------------------------------------
| ...                    |  ...                                |  ; More arguments or stack space
---------------------------------------------------------------

  

In our example now the Scanf() function has 2 arguments

The first is a pointer to the string that has %d and the second is the address of the variable x.

First thing the address of x is put in register EAX with the instruction:

lea eax, DWORD PTR _x$[ebp]

LEA abbreviation for load effective address , and often used to form an address.

We can say that in this case LEA stores the sum of the value of EBP and the macro _x$ inside EAX .

And this is the same thing as:

lea eax, [ebp-4]

Meaning it subtracts 4 from the value of EBP and throws the result in EAX.

After that the value of EAX is done push on the stack and scanf() is called.

After that printf() is called with the first argument β€” pointer to the string:

"You entered %d...\n"

The second argument is prepared with:

mov ecx, [ebp-4]

This instruction puts the value of x not its address inside ECX.

And after that the value of ECXis pushed on the stack and the last printf() is called.

MSVC + OllyDbg


The author began to use this example on OllyDbg but of course I did it on X32dbg so I will explain it on it and explain it in details.

Initially after you write the C code you will start to compile it using the command:

Terminal

cl /Od /Zi test.c ; this compiles the C file test.c with optimization disabled and debug symbols enabled

  

So that you also understand the command:

  • /Od β†’ prevents the optimizer
  • /Zi β†’ makes debug symbols clear
    and this will make the code similar to the one in the book exactly

We start opening it on the X32dbg and we will notice that we are first inside the ntdll.ll 1

We go also to the Symbols and search for the main it will start to show the code we want 2

We will make at PUSH EBP Breakpoint by marking it and pressing on F9

After that you will keep pressing F9 to reach it

In the instruction which is push test.6DD000 this you will find it putting the address of this text Enter X:\n

In the instruction lea eax, [ebp-4] here is the step where the CPU puts the address of the variable x and puts it inside EAX 3

At the instruction call test.6511c2 here is the place where you enter your variable in the Console and let's say I put 123

As soon as we went to the instruction mov ecx,dword ptr ss:[ebp-4] you will find in the Stack like this 4

Which is the Value 0x7B which is 123 but in Hex

GCC


Let’s try compiling this code with GCC 4.4.1 under Linux:

Assembly (GCC 4.4.1 x86)

main proc near

    var_20 = dword ptr -20h    ; temporary space for arguments
    var_1C = dword ptr -1Ch    ; second argument
    var_4  = dword ptr -4      ; local variable x (our input)

    push    ebp                ; save old base pointer
    mov     ebp, esp           ; set up new stack frame
    and     esp, 0FFFFFFF0h    ; align stack to 16-byte boundary
    sub     esp, 20h           ; allocate 32 bytes on stack

    ; printf("Enter X:\n") β†’ optimized to puts("Enter X:\n")
    mov     [esp+20h+var_20], offset aEnterX   ; "Enter X:\n" address
    call    _puts              ; puts is faster than printf for simple strings

    ; scanf("%d", &x)
    mov     eax, offset aD                     ; "%d"
    lea     edx, [esp+20h+var_4]               ; address of x (on stack)
    mov     [esp+20h+var_1C], edx              ; second argument: &x
    mov     [esp+20h+var_20], eax              ; first argument: "%d"
    call    ___isoc99_scanf                    ; call scanf

    ; printf("You entered %d...\n", x)
    mov     edx, [esp+20h+var_4]               ; load the value user entered
    mov     eax, offset aYouEnteredD___        ; "You entered %d...\n"
    mov     [esp+20h+var_1C], edx              ; second argument: user's number
    mov     [esp+20h+var_20], eax              ; first argument: format string
    call    _printf                            ; print result

    mov     eax, 0             ; return 0
    leave                      ; equivalent to: mov esp,ebp / pop ebp
    retn
main endp

  

GCC replaced the printf("Enter X:\n") call with a call to puts() β€” the reason for this was explained before: puts is lighter, faster, and simpler than printf when no formatting is needed.

Just like in MSVC examples β€” arguments are placed on the stack using MOV instead of PUSH.

This simple example is a great demo of the fact that the compiler translates a sequence of expressions in a C/C++ block into a sequential list of machine instructions. There is nothing between the expressions in C/C++ β€” and therefore in the resulting machine code… there is nothing between them either. The control flow simply slides from one expression to the next.


MSVC 2012 x64

Assembly (MSVC 2012 x64)

_DATA SEGMENT
$SG1289 DB 'Enter X:', 0aH, 00H              ; "Enter X:\n"
$SG1291 DB '%d', 00H                        ; "%d"
$SG1292 DB 'You entered %d...', 0aH, 00H    ; "You entered %d...\n"
_DATA ENDS

_TEXT SEGMENT
x$ = 32                                      ; local variable x at [rsp+32]

main PROC
$LN3:
    sub     rsp, 56                          ; allocate shadow space + locals (56 bytes)
    lea     rcx, OFFSET FLAT:$SG1289         ; first argument: "Enter X:\n"
    call    printf                           ; print prompt

    lea     rdx, QWORD PTR x$[rsp]           ; second argument: address of x
    lea     rcx, OFFSET FLAT:$SG1291         ; first argument: "%d"
    call    scanf                            ; read integer from user

    mov     edx, DWORD PTR x$[rsp]           ; load the entered value
    lea     rcx, OFFSET FLAT:$SG1292         ; first argument: result string
    call    printf                           ; print "You entered ..."

    xor     eax, eax                         ; return 0
    add     rsp, 56                          ; deallocate stack space
    ret     0
main ENDP

  

Optimizing GCC 4.4.6 x64

Assembly (GCC 4.4.6 x64)

.LC0:
    .string "Enter X:"                       ; prompt string
.LC1:
    .string "%d"                             ; scanf format
.LC2:
    .string "You entered %d...\n"            ; output format

main:
    sub     rsp, 24                          ; allocate 24 bytes (aligned)
    mov     edi, OFFSET FLAT:.LC0            ; argument: "Enter X:"
    call    puts                             ; optimized from printf

    lea     rsi, [rsp+12]                    ; address of x (on stack)
    mov     edi, OFFSET FLAT:.LC1            ; "%d"
    xor     eax, eax                         ; clear AL (no floating-point args)
    call    __isoc99_scanf                   ; read input

    mov     esi, DWORD PTR [rsp+12]          ; load entered value into ESI
    mov     edi, OFFSET FLAT:.LC2            ; format string
    xor     eax, eax                         ; clear AL again
    call    printf                           ; print result

    xor     eax, eax                         ; return 0
    add     rsp, 24                          ; restore stack
    ret

  

ARM: Optimizing Keil 6/2013 (Thumb mode)

ARM Thumb (Keil)

.text:00000042                 scanf_main
.text:00000042 var_8           = -8

.text:00000042 08 B5           PUSH    {R3,LR}              ; save LR and reserve space
.text:00000044 A9 A0           ADR     R0, aEnterX          ; "Enter X:\n"
.text:00000046 06 F0 D3 F8     BL      __2printf            ; print prompt
.text:0000004A 69 46           MOV     R1, SP               ; R1 = address of x (SP points to free space)
.text:0000004C AA A0           ADR     R0, aD               ; "%d"
.text:0000004E 06 F0 CD F8     BL      __0scanf             ; read integer
.text:00000052 00 99           LDR     R1, [SP,#8+var_8]    ; load entered value from stack
.text:00000054 A9 A0           ADR     R0, aYouEnteredD___  ; "You entered %d...\n"
.text:00000056 06 F0 CB F8     BL      __2printf            ; print result
.text:0000005A 00 20           MOVS    R0, #0               ; return 0
.text:0000005C 08 BD           POP     {R3,PC}              ; restore and return

  

For scanf() to read the input, it needs a pointer to an int β€” and since int is 32-bit, we only need 4 bytes in memory. This could fit in a register, but here the local variable x is placed on the stack (IDA named it var_8).

There was no need to explicitly allocate space β€” because after PUSH {R3,LR}, the SP already points to free space on the stack. So we can use it directly. That’s why MOV R1, SP is used β€” it passes the address of x to scanf().

Note: PUSH and POP in ARM work opposite to x86 β€” they are synonyms for:

  • STMDB (Store Multiple Decrement Before)
  • LDMIA (Load Multiple Increment After)
PUSH writes first, then decrements SP.
POP increments SP first, then reads. So after PUSH, SP points to free space β€” perfect for scanf() and printf() to use. Then using LDR, the value is loaded back from the stack into R1 to pass to printf().


ARM64: Non-optimizing GCC 4.9.1

ARM64 (aarch64)

.LC0:
    .string "Enter X:"
.LC1:
    .string "%d"
.LC2:
    .string "You entered %d...\n"

scanf_main:
    stp     x29, x30, [sp, -32]!     ; save frame pointer and link register
    add     x29, sp, 0               ; set up frame pointer

    adrp    x0, .LC0
    add     x0, x0, :lo12:.LC0
    bl      puts                     ; print "Enter X:"

    adrp    x0, .LC1
    add     x0, x0, :lo12:.LC1       ; format "%d"

    add     x1, x29, 28              ; address of x = FP + 28 (why 28? see below)
    bl      __isoc99_scanf           ; read input

    ldr     w1, [x29,28]             ; load the 32-bit value user entered

    adrp    x0, .LC2
    add     x0, x0, :lo12:.LC2
    bl      printf                   ; print result

    mov     w0, 0                    ; return 0
    ldp     x29, x30, [sp], 32       ; restore FP/LR and deallocate 32 bytes
    ret

  

Here the compiler allocated 32 bytes for the stack frame β€” even though we only need 4 bytes for x β€” most likely for alignment reasons.

The most important part is where the variable x is stored (line with add x1, x29, 28). Why 28? Because the compiler decided to place the variable at the end of the stack frame instead of the beginning. The address is passed to scanf(), which writes the user input there. Then the value (32-bit int) is loaded back at ldr w1, [x29,28] and passed to printf().

1.12.2 The classic mistake


This is a very famous mistake (or typing error) that you pass the value of x instead of passing the pointer to x:

C

#include <stdio.h> // include the standard I/O header - this includes the library needed for printf and scanf
int main() // program entry point - this defines the main function where execution starts
{
    int x; // declare an integer variable x - this allocates space for x but does not initialize it
    printf ("Enter X:\\n"); // print the prompt "Enter X:\n" - this displays a message to the user
    scanf ("%d", x); // BUG - this calls scanf with format "%d" and the value of x (instead of &x), which is a mistake; scanf expects a pointer
    printf ("You entered %d...\\n", x); // print "You entered %d...\n" with the value of x - this attempts to display the entered value, but due to the bug, x is unchanged
    return 0; // return success - this ends the program with return code 0
};

  

I will tell you what happens here

x is not initialized and has some random noise (garbage) from the local stack.

When scanf() is called, it takes the string from the user, converts it to a number, and tries to write that number into x… but treating x as if it were an address in memory

At that time, of course, a Crash will occur

Let me explain it to you in a simpler way:

Suppose that X has a random number like this for example:

Assembly

x = 0x41414141; // assume x has this uninitialized garbage value - this is an example of random data in x

Then scanf will understand that this is an address in memory and go to write on it.

Where does it write?

In 0x41414141

And this is of course an empty/reserved/not allowed address. So the code crashes.

The nice thing is that some CRT libraries in debug mode put a distinctive pattern in the memory that has not been allocated yet, like 0xCCCCCCCC or 0x0BADF00D and such. In this case, x may have 0xCCCCCCCC inside it, and scanf() will try to write to this address 0xCCCCCCCC.

And if you notice that there is something in the process trying to write to 0xCCCCCCCC, you know that there is a variable (or pointer) that is not initialized and was used before it was initialized. And this is better than the new memory being all zeros

This post is licensed under CC BY 4.0 by the author.

Trending Tags