Post

Reverse Engineering for Beginners : More about results returning

Reverse Engineering for Beginners : More about results returning

1.15 More about results returning

resultsreturning

The author said that in x86, the result of function execution is usually returned in the EAX register. If the type is byte or char, the lower part of the EAX register which is AL is used. If the function returns a float number, the FPU register ST(0) is used. In ARM, the result is usually returned in the R0 register.

1.15.1 Attempt to use the result of a function returning void

returningvoid

Well, what happens if the return value of main() was void not int?

The so-called startup-code calls main() approximately like this:

nasm

push envp ; push the environment pointer onto the stack
push argv ; push the argument vector onto the stack
push argc ; push the argument count onto the stack
call main ; call the main function
push eax ; push the return value from main (in EAX) onto the stack
call exit ; call the exit function with the pushed value

  

In other words

C

exit(main(argc, argv, envp)); // call exit with the return value of main as argument

  

If you wrote void main() instead of int main(), what happens?

  • void main() means that no value is expected to be returned explicitly. But the EAX register may contain any meaningless value (leftover) from previous instructions.
  • When the startup code does push eax after call main, it will send the value in EAX to exit() β€” and therefore the exit code will be a random value or a value from the last executed function (like puts() or printf() if used).

We can illustrate this with code like this:

C

#include <stdio.h> // include standard I/O header
void main() // main function declared as void, no return value
{
    printf("Hello, world!\\n"); // print "Hello, world!" followed by newline
};

  

GCC here might replace printf with puts.

  • puts() returns the number of characters it printed in EAX. If main didn't return a value, EAX will retain this value.
nasm

.LC0: // label for the string
.string "Hello, world!" // define the string "Hello, world!"
main: // start of main function
      push ebp // save base pointer
      mov ebp, esp // set base pointer to stack pointer
      and esp, -16 // align stack to 16-byte boundary
      sub esp, 16 // allocate 16 bytes on stack
      mov DWORD PTR [esp], OFFSET FLAT:.LC0 // store string address on stack
      call puts // call puts to print the string
      leave // restore base and stack pointers
      ret // return from function

  

We write a bash script that displays the exit status:

Listing 1.101: tst.sh

bash

#!/bin/sh // shebang for shell script
./hello_world // run the hello_world executable
echo $? // echo the exit status of the previous command

  

And we run it:

$ tst.sh
Hello, world!
14

14 is the number of characters that were printed.

The number of characters leaked from printf() (or puts) through EAX/RAX and entered as β€œexit code”.

By the way, when we decompile C++ with Hex-Rays, sometimes we encounter a function that ends with a class destructor:

nasm

...
call ??1CString@@QAE@XZ ; CString::CString(void) // call the CString destructor
mov ecx, [esp+30h+var_C] // move value from stack to ECX
pop edi // pop EDI from stack
pop ebx // pop EBX from stack
mov large fs:0, ecx // move ECX to FS:0 (thread information block)
add esp, 28h // add 28h to ESP (clean stack)
retn // return from function

  

According to the C++ standard, the destructor does not return anything, but when Hex-Rays does not know that, and thinks that the destructor and the function itself return int, we see something like this in the outputs:

C

...
return CString::~CString(&Str); // Hex-Rays mistakenly shows destructor as returning value
}

  

In a clearer sense, it is that when Hex-Rays saw retn, it said that surely this Function returns a Value even though in reality this is just a return to the Caller, nothing more.

1.15.3 Returning a structure

Returningastructure

The author then explained and said the truth is that the return value is computed in the EAX register.

And without much chatter, the reason is that old C compilers could not make a function return something that does not fit in one register (usually int)

If one needs to return something bigger, he must return the data through pointers sent as arguments to the function.

So it is very normal that a function returns one value only, and the rest returns it through pointers.

Now we can return a full struct, but the subject is not famous.

If a function must return a large struct, the function that calls it (the caller) must allocate it and send a pointer to it as the first argument, and this happens hidden from the programmer.

Meaning it is the same idea as if you send a pointer in the first argument by hand, but the compiler hides this.

A small example:

C

struct s { // define structure s
    int a; // field a
    int b; // field b
    int c; // field c
};

struct s get_some_values(int a) // function that returns struct s
{
    struct s rt; // local struct rt
    rt.a = a+1; // set rt.a to a+1
    rt.b = a+2; // set rt.b to a+2
    rt.c = a+3; // set rt.c to a+3
    return rt; // return the struct
};

  

What we got (MSVC 2010 /Ox):

Assembly

$T3853 = 8 ; size = 4 // temporary variable for struct pointer
_a$ = 12 ; size = 4 // parameter a
?get_some_values@@YA?AUs@@H@Z PROC ; get_some_values // start of function
mov ecx, DWORD PTR _a$[esp-4] // move a to ECX
mov eax, DWORD PTR $T3853[esp-4] // move struct pointer to EAX
lea edx, DWORD PTR [ecx+1] // load a+1 to EDX
mov DWORD PTR [eax], edx // store a+1 in struct.a
lea edx, DWORD PTR [ecx+2] // load a+2 to EDX
add ecx, 3 // add 3 to ECX (a+3)
mov DWORD PTR [eax+4], edx // store a+2 in struct.b
mov DWORD PTR [eax+8], ecx // store a+3 in struct.c
ret 0 // return
?get_some_values@@YA?AUs@@H@Z ENDP ; get_some_values // end of function

  

The micro that the compiler uses here to pass the pointer to the struct is named $T3853.

We can write the same example using C99:

C

struct s { // define structure s
    int a; // field a
    int b; // field b
    int c; // field c
};

struct s get_some_values(int a) // function that returns struct s
{
    return (struct s){.a=a+1, .b=a+2, .c=a+3}; // return initialized struct
};

  
  • GCC 4.8.1:
Assembly

_get_some_values proc near // start of function
ptr_to_struct = dword ptr 4 // pointer to struct parameter
a = dword ptr 8 // parameter a
mov edx, [esp+a] // move a to EDX
mov eax, [esp+ptr_to_struct] // move struct pointer to EAX
lea ecx, [edx+1] // load a+1 to ECX
mov [eax], ecx // store a+1 in struct.a
lea ecx, [edx+2] // load a+2 to ECX
add edx, 3 // add 3 to EDX (a+3)
mov [eax+4], ecx // store a+2 in struct.b
mov [eax+8], edx // store a+3 in struct.c
retn // return
_get_some_values endp // end of function

  

As we see, the function fills the fields of the struct that was allocated before by the calling function, as if a pointer to the struct was sent as an argument.

So there is no loss in performance.

To make this part easier for you, I'll explain with a simple explanation that clarifies things a bit.

First, this is the big instruct will be in this shape for example

nasm

struct s { // define structure s
   int a; // field a
   int b; // field b
   int c; // field c
};

  

This will be its shape in memory

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   a     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   b     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   c     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The caller now before calling the function get_some_values(a)

He does this, allocates a place for the struct in memory like this

Caller Memory:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Empty space to save struct   β”‚  ← It will be returned here
β”‚ Address = 5000            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

And after that sends the address of this place to the function as a hidden argument

Caller
   β”‚
   β”‚  sends pointer = 5000
   β–Ό
Callee (get_some_values)

At that time the function receives a pointer to an empty place and starts writing the values inside it

Address 5000:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  a=a+1  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  b=a+2  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  c=a+3  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

And this is the final shape

Caller memory:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ struct at 5000:            β”‚
β”‚   a = a+1                  β”‚
β”‚   b = a+2                  β”‚
β”‚   c = a+3                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              ↑
              β”‚
   callee wrote the values here

After now the function finishes, the function does not return the struct directly, she returns the pointer that you originally sent (hidden)

So the caller sees the full struct appeared to him:

return value ← same address 5000

Caller now sees:
a = a+1
b = a+2
c = a+3

And this is a summary for all this talk

flowchart
This post is licensed under CC BY 4.0 by the author.

Trending Tags