Global variables
The author asked here and said what will happen if the variable x in the previous example is not a local variable but a global variable?
At that time, it will be accessible from anywhere in the code, not just from inside the function body.
Global variables are considered an Anti-pattern, but for the experiment we can do that
#include <stdio.h> // include the standard I/O header - this includes the library for printf and scanf
// now x is a global variable - this declares x as a global integer, accessible from anywhere
int x;
int main() // program entry point - this defines the main function
{
printf("Enter X:\\n"); // print the prompt "Enter X:\n" - this displays a message to the user
scanf("%d", &x); // read an integer into x - this calls scanf to read user input and store it in global x
printf("You entered %d...\\n", x); // print "You entered %d...\n" with x - this displays the value stored in x
return 0; // return success - this ends the program
}
MSVC: x86
_DATA SEGMENT
COMM _x:DWORD ; define global variable x as DWORD - this declares x as a common (global) 32-bit variable
$SG2456 DB 'Enter X:', 0Ah, 0 ; string "Enter X:\n" - this defines the prompt string with newline
$SG2457 DB '%d', 0 ; string "%d" - this is the format for scanf
$SG2458 DB 'You entered %d...', 0Ah, 0 ; string "You entered %d...\n" - this is the output format string
_DATA ENDS
PUBLIC _main ; make main public - this declares main as a public procedure
EXTRN _scanf:PROC ; external scanf procedure - this declares scanf as an external function
EXTRN _printf:PROC ; external printf procedure - this declares printf as an external function
_TEXT SEGMENT
_main PROC ; start of main procedure
push ebp ; save base pointer - this saves the caller's frame pointer
mov ebp, esp ; set up new frame - this sets EBP to current ESP
; printf("Enter X:\\n");
push OFFSET $SG2456 ; push address of prompt - this pushes the prompt string address
call _printf ; call printf - this calls printf to print the prompt
add esp, 4 ; clean up stack - this removes the argument from stack
; scanf("%d", &x);
push OFFSET _x ; address of variable - this pushes the address of global x
push OFFSET $SG2457 ; "%d" - this pushes the format string
call _scanf ; call scanf - this calls scanf to read input
add esp, 8 ; clean up stack - this removes two arguments (8 bytes)
; printf("You entered %d...\\n", x);
mov eax, DWORD PTR _x ; load x value - this loads the value of x into EAX
push eax ; push value - this pushes the value of x
push OFFSET $SG2458 ; push format - this pushes the output format string
call _printf ; call printf - this calls printf to print the result
add esp, 8 ; clean up stack - this removes two arguments
xor eax, eax ; return 0 - this sets EAX to 0
pop ebp ; restore base pointer - this restores the caller's frame pointer
ret 0 ; return - this returns from main
_main ENDP ; end of main procedure
_TEXT ENDS ; end of text segment
In this case, the variable x is defined in the _DATA segment and no memory is allocated for it in the local stack. It is accessed directly not through the stack.
Global variables that are not initialized do not take space in the executable file (why would you allocate space for variables set to zero anyway?), but as soon as someone accesses its address, the OS will allocate a block of zeros there.
Letβs write a value for variable X :
int x = 10; // default value - this initializes the global x to 10
At that time it will come out
DATA SEGMENT
_x DD 0aH ; initialized to 10 (0xA in hex) - this defines x as a DWORD with value 0xA (10 decimal)
...
Here we see the value 0xA of type DWORD (DD = DWORD = 32-bit) for the variable.
If you open the .exe compiled in IDA, you will find the variable x placed at the beginning of the _DATA segment, and right after it the text Strings.
And if we open the .exe from the previous example in IDA, where x had no value, you will see something like this:
.data:0040FA80 _x dd ? ; DATA XREF: _main+10 - this is x, uninitialized (?)
.data:0040FA80 ; _main+22
.data:0040FA84 dword_40FA84 dd ? ; DATA XREF: _memset+1E - another uninitialized dword
.data:0040FA84 ; unknown_libname_1+28
.data:0040FA88 dword_40FA88 dd ? ; DATA XREF: ___sbh_find_block+5 - another
.data:0040FA88 ; ___sbh_free_block+2BC
.data:0040FA8C lpMem dd ? ; DATA XREF: ___sbh_find_block+B - pointer to memory
.data:0040FA8C ; ___sbh_free_block+2CA
.data:0040FA90 dword_40FA90 dd ? ; DATA XREF: _V6_HeapAlloc+13 - another
.data:0040FA90 ; __calloc_impl+72
.data:0040FA94 dword_40FA94 dd ? ; DATA XREF: ___sbh_free_block+2FE - another
x is marked with ? like the other variables that do not need initialization. This means that after the exe is loaded into memory, space will be allocated for all these variables and filled with Zeros (according to the C99 Standard). But inside the exe file itself, the uninitialized variables do not take any space. And this is very useful if you have large Arrays for example.
MSVC: x86 + OllyDbg
The author said that things here will be simpler I don't need to tell you again that I will do it on x32 dbg
So we start together like this insha'allah
First thing after we write the code in C and compile it we start opening its EXE file on the x32 dbg
Like every time we do we go to Symbols and choose the Main Function and it will come out like this
You will find variable X here in this Instruction Push test._x
To see it in the Dump and monitor its values we enter Symbols and then go to Search and search for _x
We double click on it and it will show that its value is still 0 because the global variable initialized to zero
After that we start running Scanf() and we make Breakpoint at call test.6311c2 and it will automatically open the Console for you and ask you to enter the value of X
We will notice in the Dump that the value of X changed to 0x7B which is in DEC 123
GCC: x86
The situation in Linux is almost the same, with the difference that variables that are not initialized are present in the _bss segment .
In an ELF file, this segment has these properties:
; type of segment: Uninitialized (not initialized)
; segment permissions: Read/Write (read/write)
But if you did initialization for the variable with a value for example 10, at that time it will be placed in _data segment ,
and this segment has the same properties:
; type of segment: Pure data
; permissions: Read/Write
MSVC x64
_DATA SEGMENT
COMM x:DWORD ; define global variable x as 32-bit - this declares x as a common (global) DWORD
$SG2924 DB 'Enter X:', 0aH, 00H ; string for input prompt - this defines "Enter X:\n"
$SG2925 DB '%d', 00H ; scanf format string - this is "%d"
$SG2926 DB 'You entered %d...', 0aH, 00H ; string for printing - this is "You entered %d...\n"
_DATA ENDS
_TEXT SEGMENT
main PROC ; start of main procedure
sub rsp, 40 ; prepare the stack - this allocates 40 bytes on the stack (shadow space)
lea rcx, OFFSET FLAT:$SG2924 ; printf("Enter X:") - load address of prompt into RCX
call printf ; call printf - this prints the prompt
lea rdx, OFFSET FLAT:x ; address of variable x - load address of x into RDX (second arg)
lea rcx, OFFSET FLAT:$SG2925 ; "%d" - load format into RCX (first arg)
call scanf ; scanf("%d", &x) - call scanf to read input
mov edx, DWORD PTR x ; load value of x - move x's value into EDX (second arg for printf)
lea rcx, OFFSET FLAT:$SG2926 ; "You entered..." - load format into RCX
call printf ; call printf - this prints the result
xor eax, eax ; return 0 - set EAX to 0
add rsp, 40 ; restore stack - deallocate the 40 bytes
ret 0 ; return - return from main
main ENDP ; end of main
_TEXT ENDS ; end of text segment
The code is almost exactly like that of x86.
Notice that the address of variable x is passed to scanf() using the LEA instruction,
but the value of the variable itself is passed to scanf() using MOV .
DWORD PTR is part of the Assembly language and has nothing to do with the machine code.
It is just an indicator that the data size is 32-bit and therefore the MOV must be written in the appropriate way
ARM: Optimizing Keil 6/2013 (Thumb mode)
Listing 1.79: IDA
.text:00000000 main ; start of main
PUSH {R4,LR} ; save registers - push R4 and LR onto stack
ADR R0, aEnterX ; printf("Enter X:") - load address of prompt into R0
BL __2printf ; call printf - branch with link to printf
LDR R1, =x ; load address of variable x - load x's address into R1
ADR R0, aD ; "%d" - load format into R0
BL __0scanf ; scanf("%d", &x) - call scanf
LDR R0, =x ; load address of x - into R0
LDR R1, [R0] ; load value of x - dereference to get value into R1
ADR R0, aYouEnteredD___ ; "You entered..." - load format into R0
BL __2printf ; call printf - print result
MOVS R0, #0 ; set R0 to 0 - return 0
POP {R4,PC} ; restore registers - pop R4 and PC (return)
The variable x is now global, so it is stored in another segment which is .data.
You might ask yourself now why the text strings are in the **.text** segment and this doesn't change? And why the variable x alone in .data ?
Because the variable's value changes, so it cannot be placed in a fixed place (ROM) but the strings are fixed, so they are placed in the code segment itself as well the code segment can be present inside ROM because these devices have limited capabilities.
And it doesn't make sense to store fixed data in RAM while we have ROM ready after that we will find a pointer to the variable x in the code segment, and all operations on the variable happen through this pointer this is because variable x may be placed in a place far from the code,
and the address must be saved next to the code because the LDR has a limited range:
- in Thumb you can reach something with a maximum of Β±1020 bytes
- and in ARM-mode you can reach something with a maximum of Β±4095 bytes
And therefore, the address itself must be placed next to the code.
And if the variable is made const, the compiler places it in .constdata
and the linker may place it with the code inside the ROM.
ARM64
Listing 1.80: Non-optimizing GCC 4.9.1 ARM64
.comm x,4,4 ; define global variable named x size 4 bytes - this declares x as a common (global) 4-byte variable with 4-byte alignment
.LC0:
.string "Enter X:" ; string - this defines "Enter X:"
.LC1:
.string "%d" ; scanf string - this is "%d"
.LC2:
.string "You entered %d...\\n" ; printf string - this is "You entered %d...\n"
; ---------- main() ----------
f5: ; function label
stp x29, x30, [sp, -16]! ; save FP and LR on the stack - store pair, decrement SP by 16
add x29, sp, 0 ; FP = SP - set frame pointer
; printf("Enter X:")
adrp x0, .LC0 ; load page address of .LC0 - get high bits of label address
add x0, x0, :lo12:.LC0 ; add low 12 bits - complete address of string
bl puts ; call puts - branch with link to puts (optimized printf)
; scanf("%d", &x)
adrp x0, .LC1 ; load page address of .LC1
add x0, x0, :lo12:.LC1 ; complete "%d" address
adrp x1, x ; load page address of x - high bits of x's address
add x1, x1, :lo12:x ; complete &x
bl __isoc99_scanf ; call scanf - read input
; printf("You entered %d...", x)
adrp x0, x ; load page address of x
add x0, x0, :lo12:x ; complete address
ldr w1, [x0] ; load value of x - load 32-bit word from address into W1
adrp x0, .LC2 ; load page address of .LC2
add x0, x0, :lo12:.LC2 ; complete format address
bl printf ; call printf - print result
mov w0, 0 ; return 0 - set W0 to 0
ldp x29, x30, [sp], 16 ; restore FP and LR - load pair, increment SP by 16
ret ; return - return from function
MIPS
Uninitialized global variable
Now the variable x is global. We compile the executable file instead of object file and open it in IDA.
IDA shows the variable x in the .sbss segment of ELF (remember the Global Pointer?).
This is because the variable is not initialized at the beginning.
.text:004006C0 main: ; start of main
var_10 = -0x10 ; local variables
var_4 = -4
; ---------- Function prologue ----------
lui $gp, 0x42 ; load upper immediate for GP - set high bits of GP
addiu $sp, -0x20 ; allocate stack frame - decrement SP by 32
li $gp, 0x418940 ; set GP to specific value - complete GP address
sw $ra, 0x20+var_4($sp) ; save return address - store RA on stack
sw $gp, 0x20+var_10($sp) ; save GP on stack - store GP
; ---------- puts("Enter X:") ----------
la $t9, puts ; load address of puts - into T9
lui $a0, 0x40 ; high bits of prompt address
jalr $t9 ; call puts - jump and link register
la $a0, aEnterX ; branch delay slot - load "Enter X:" in delay slot
; ---------- scanf("%d", &x) ----------
lw $gp, 0x20+var_10($sp) ; restore GP
lui $a0, 0x40 ; high bits of "%d"
la $t9, __isoc99_scanf ; load scanf address
la $a1, x ; address of variable x - load &x
jalr $t9 ; call scanf
la $a0, aD ; branch delay slot β "%d" - load in delay slot
; ---------- printf("You entered %d...", x) ----------
lw $gp, 0x20+var_10($sp) ; restore GP
lui $a0, 0x40 ; high bits of format
la $v0, x ; address of x - into V0
la $t9, printf ; load printf address
lw $a1, (x - 0x41099C)($v0) ; load value of x - from memory using offset
jalr $t9 ; call printf
la $a0, aYouEnteredD___ ; branch delay slot - load format
; ---------- epilogue ----------
lw $ra, 0x20+var_4($sp) ; restore RA
move $v0, $zero ; return 0 - set V0 to 0
jr $ra ; return - jump to RA
addiu $sp, 0x20 ; branch delay slot - restore SP in delay slot
And after IDA, we did listing with objdump and added comments.
004006c0 main:
# -----------------------
# Function Prologue
# -----------------------
4006c0: 3c1c0042 lui gp,0x42 # load high part for Global Pointer - set upper 16 bits of GP
4006c4: 27bdffe0 addiu sp,sp,-32 # prepare stack frame (-32 bytes) - allocate 32 bytes on stack
4006c8: 279c8940 addiu gp,gp,-30400 # adjust gp to the correct point - complete GP value
4006cc: afbf001c sw ra,28(sp) # save return address - store RA
4006d0: afbc0010 sw gp,16(sp) # save gp on stack - store GP
# -----------------------
# call puts("Enter X:")
# -----------------------
4006d4: 8f998034 lw t9,-32716(gp) # load address of puts into t9 - from GOT
4006d8: 3c040040 lui a0,0x40 # high part of string address
4006dc: 0320f809 jalr t9 # call puts - jump and link
4006e0: 248408f0 addiu a0,a0,2288 # (Delay Slot) load "Enter X:" - complete address
# -----------------------
# call scanf("%d", &x)
# -----------------------
4006e4: 8fbc0010 lw gp,16(sp) # restore gp
4006e8: 3c040040 lui a0,0x40 # high part for "%d"
4006ec: 8f998038 lw t9,-32712(gp) # load scanf address
4006f0: 8f858044 lw a1,-32700(gp) # load address of variable x (the pointer) - from GOT
4006f4: 0320f809 jalr t9 # call scanf
4006f8: 248408fc addiu a0,a0,2300 # (Delay Slot) load "%d"
# -----------------------
# call printf("...", x)
# -----------------------
4006fc: 8fbc0010 lw gp,16(sp) # restore gp
400700: 3c040040 lui a0,0x40 # high part for printf string
400704: 8f828044 lw v0,-32700(gp) # load address of x - into V0
400708: 8f99803c lw t9,-32708(gp) # load printf address
40070c: 8c450000 lw a1,0(v0) # load value of x from memory - dereference
400710: 0320f809 jalr t9 # call printf
400714: 24840900 addiu a0,a0,2304 # (Delay Slot) load format printf
# -----------------------
# Function Epilogue
# -----------------------
400718: 8fbf001c lw ra,28(sp) # restore ra
40071c: 00001021 move v0,zero # return 0
400720: 03e00008 jr ra # return
400724: 27bd0020 addiu sp,sp,32 # (Delay Slot) free the stack - restore SP
# -----------------------
# Alignment NOPs
# -----------------------
400728: 00200825 move at,at # NOP - no operation
40072c: 00200825 move at,at # NOP - no operation
In the end we saw that the address of variable x is read from a buffer of size 64KB using GP and multiplying offset by negative.
And also we saw that the addresses of the three functions (puts / scanf / printf) are also taken from the same buffer using GP.
The GP points to the middle of the buffer, and the offset we see means that these functions and x's address are stored at the beginning of the buffer... and this makes sense because the code is originally small.
And another thing: at the end of the function there are NOPs (MOVE $AT,$AT instruction) to align the beginning of the next function on 16-byte boundaries.
Initialized global variable
Let's change our example by giving the variable x a default value:
int x=10; // default value - this initializes x to 10
Now IDA shows that the x variable is residing in the .data section:
Listing 1.83: Optimizing GCC 4.4.5 (IDA)
; -------------------- main --------------------
.text:004006A0 main: ; start of main
.text:004006A0 var_10 = -0x10 ; locals
.text:004006A0 var_8 = -8
.text:004006A0 var_4 = -4
.text:004006A0 lui $gp, 0x42 ; load GP high
.text:004006A4 addiu $sp, -0x20 ; allocate stack
.text:004006A8 li $gp, 0x418930 ; set GP
.text:004006AC sw $ra, 0x20+var_4($sp) ; save RA
.text:004006B0 sw $s0, 0x20+var_8($sp) ; save S0
.text:004006B4 sw $gp, 0x20+var_10($sp) ; save GP
.text:004006B8 la $t9, puts ; load puts
.text:004006BC lui $a0, 0x40 ; prompt high
.text:004006C0 jalr $t9 ; puts - call puts
.text:004006C4 la $a0, aEnterX ; "Enter X:" - load prompt
.text:004006C8 lw $gp, 0x20+var_10($sp) ; restore GP
; --- prepare high part of x address ---
.text:004006CC lui $s0, 0x41 ; high part into S0
.text:004006D0 la $t9, __isoc99_scanf ; load scanf
.text:004006D4 lui $a0, 0x40 ; format high
; --- add low part of x address ---
.text:004006D8 addiu $a1, $s0, (x - 0x410000) ; complete &x into A1
; now x address is in $a1
.text:004006DC jalr $t9 ; scanf - call scanf
.text:004006E0 la $a0, aD ; "%d" - load format
.text:004006E4 lw $gp, 0x20+var_10($sp) ; restore GP
; --- load x value from memory ---
.text:004006E8 lw $a1, x ; a1 = value of x - load x
.text:004006EC la $t9, printf ; load printf
.text:004006F0 lui $a0, 0x40 ; format high
.text:004006F4 jalr $t9 ; printf - call printf
.text:004006F8 la $a0, aYouEnteredD___ ; "You entered %d...\n" - load format
.text:004006FC lw $ra, 0x20+var_4($sp) ; restore RA
.text:00400700 move $v0, $zero ; return 0
.text:00400704 lw $s0, 0x20+var_8($sp) ; restore S0
.text:00400708 jr $ra ; return
.text:0040070C addiu $sp, 0x20 ; restore SP
Why not in .sdata? Maybe it depends on a choice in GCC? Anyway, now x is in .data, and this is a global area in memory, and we can see how to deal with the variables there.
The address of the variable is formed using two Instructions. In our case they are LUI (Load Upper Immediate) and ADDIU (Add Immediate Unsigned Word).
And this is also the objdump listing for more precise examination:
004006a0 main:
4006a0: 3c1c0042 lui gp,0x42 ; load GP high bits
4006a4: 27bdffe0 addiu sp,sp,-32 ; allocate stack
4006a8: 279c8930 addiu gp,gp,-30416 ; complete GP
4006ac: afbf001c sw ra,28(sp) ; save RA
4006b0: afb00018 sw s0,24(sp) ; save S0
4006b4: afbc0010 sw gp,16(sp) ; save GP
4006b8: 8f998034 lw t9,-32716(gp) ; load puts
4006bc: 3c040040 lui a0,0x40 ; prompt high
4006c0: 0320f809 jalr t9 ; call puts
4006c4: 248408d0 addiu a0,a0,2256 ; complete prompt
4006c8: 8fbc0010 lw gp,16(sp) ; restore GP
; --- prepare high part of x address ---
4006cc: 3c100041 lui s0,0x41 ; high part into S0
4006d0: 8f998038 lw t9,-32712(gp) ; load scanf
4006d4: 3c040040 lui a0,0x40 ; format high
; --- add low part of x address ---
4006d8: 26050920 addiu a1,s0,2336 ; complete &x
; address of x is now in a1
4006dc: 0320f809 jalr t9 ; call scanf
4006e0: 248408dc addiu a0,a0,2268 ; complete "%d"
4006e4: 8fbc0010 lw gp,16(sp) ; restore GP
; high part still in s0 β load x value
4006e8: 8e050920 lw a1,2336(s0) ; load x value
4006ec: 8f99803c lw t9,-32708(gp) ; load printf
4006f0: 3c040040 lui a0,0x40 ; format high
4006f4: 0320f809 jalr t9 ; call printf
4006f8: 248408e0 addiu a0,a0,2272 ; complete format
4006fc: 8fbf001c lw ra,28(sp) ; restore RA
400700: 00001021 move v0,zero ; return 0
400704: 8fb00018 lw s0,24(sp) ; restore S0
400708: 03e00008 jr ra ; return
40070c: 27bd0020 addiu sp,sp,32 ; restore SP
We see that the address is formed using LUI and ADDIU, but the high part of the address is still stored in the register S0, and this allows the offset to be encoded inside an Instruction of type LW (Load Word), and thus one Instruction of type LW is enough to load the value from the variable and pass it to printf().
The registers that hold temporary data have names starting with T, but here we also see some starting with S, and these are contents that must be saved before being used in another Function (meaning stored in another place).
And that's why the value of S0 was set at address 0x4006cc and used again at address 0x4006e8 after calling scanf(). And scanf() doesn't change its value.
scanf()
As we said before, using scanf() has become a bit old and not trendy now.
But if we have to use it, we must make sure thatscanf() finished correctly without Error.
#include <stdio.h> // include the standard I/O header
int main() // program entry point
{
int x; // declare an integer variable x
printf("Enter X:\\n"); // print prompt for user input
if (scanf("%d", &x) == 1) // read an integer into x and check if one field was successfully read
printf("You entered %d...\\n", x); // print the entered value if successful
else
printf("What you entered? Huh?\\n"); // print error message if not successful
return 0; // return success
}
According to the Standard, the scanf() function returns the number of fields it read successfully.
In our case, if everything is going correctly and the user entered a number β scanf() returns 1. And if there was an Error (or EOF) β it returns 0.
Come on, let's add a bit of C code to check the value that scanf() returned and print an Error message if there is a problem.
And this works as expected:
C:\\\\...>ex3.exe // run the executable
Enter X: // prompt for input
123 // user input
You entered 123... // output if successful
C:\\\\...>ex3.exe // run the executable again
Enter X: // prompt for input
ouch // invalid user input
What you entered? Huh? // error output
MSVC: x86
Here is what we get in the assembly output (MSVC 2010):
lea eax, DWORD PTR _x$[ebp] ; load effective address of x into EAX
push eax ; push the address of x onto the stack
push OFFSET $SG3833 ; push the address of the format string "%d" onto the stack
call _scanf ; call the scanf function
add esp, 8 ; clean up the stack by adding 8 bytes (two arguments)
cmp eax, 1 ; compare the return value in EAX with 1
jne SHORT $LN2@main ; jump if not equal to $LN2@main
mov ecx, DWORD PTR _x$[ebp] ; move the value of x into ECX
push ecx ; push the value of x onto the stack
push OFFSET $SG3834 ; push the address of "You entered %d..." onto the stack
call _printf ; call the printf function
add esp, 8 ; clean up the stack by adding 8 bytes
jmp SHORT $LN1@main ; jump to $LN1@main
$LN2@main: ; label for error case
push OFFSET $SG3836 ; push the address of "What you entered? Huh?" onto the stack
call _printf ; call the printf function
add esp, 4 ; clean up the stack by adding 4 bytes
$LN1@main: ; label for end
xor eax, eax ; set EAX to 0 (return 0)
The function that is calling (main()) needs the result of the function that is called (scanf()),
So scanf() returns the result in the EAX register
Then we do a Check using the command: CMP EAX, 1 (meaning Compare).
Meaning we compare the value in EAX with the number 1.
After the CMP there is a conditional jump JNE.
JNE = Jump If Not Equal β jump if not equal.
So if the value in EAX is not 1, the CPU will go to the address in the JNE,
In our case: $LN2@main.
And when it jumps there, this makes the CPU execute printf() that prints:
"What you entered? Huh?"
But if everything is fine (meaning scanf returned 1),
Then the JNE is not taken, and the other message (You entered %d...) will be printed.
Since the second printf() function is not supposed to be executed if there was an Error,
You will find there is a JMP before it (unconditional jump).
And this transfers the execution to the point after the second printf
And before the command XOR EAX, EAX which is executing return 0.
So we can say that comparing a value with another value is often done through the pair:
CMP / Jcc
And cc means Condition Code.
CMP compares two values and sets the processor's flags.
Jcc looks at these flags and decides to jump or not.
And this might be a bit strange, but the CMP command is in fact SUB (subtraction).
All arithmetic commands change the flags, not just CMP.
If we compare 1 and 1 β
1 β 1 = 0 β then the ZF (Zero Flag) is set.
And there is no other case where ZF is set except if the two values were equal.
JNE looks only at ZF, and jumps if the flag is not set.
JNE is actually synonymous with JNZ (Jump If Not Zero).
The two names produce the same opcode.
So CMP can be replaced with SUB in most cases,
The only difference is that SUB changes the value of the first operand.
CMP = SUB but without storing the result β just changes the flags.
MSVC: x86: IDA
The author started to pave the way that it's time to explain on IDA and we will try to do some things together
And by the way, for beginners it's better that you use /MD in MSVC, and this means that all standard functions won't be linked inside the EXE file, but will be pulled from MSVCR.DLL instead.
And thus it will be easier for you to see any standard function being used and where.
We will start with the easiest one for you and we will do it together one by one and I will try to simplify the information as much as I can
First thing after we write the C code we need to Compile it with this Command:
cl /MD ex3.c /test.obj /test.exe ; compile ex3.c with /MD flag, output object to test.obj and executable to test.exe
This makes linking the standard functions from MSVCR.DLL and makes IDA analysis clearer
And we will start opening it on IDA and I chose Intel 80x86 Processors and from it choose MetaPC
And this is an extra thing just because I like to understand everything when I searched a bit on why MetaPc
- Supports the full x86 instruction set, from 8086 up to modern IA-32
- Compatible with Windows PE executables
- IDE and Decompiler (Hex-Rays) use it
- And it's what gives the same code shape as in the book
After that we press OK and IDA will open for us
I faced a problem that the Main Function didn't appear so when I searched a bit I will tell you how I reached it
You go to Strings either by going to View β Open subviews β Strings or press F12
And you will see the Strings that you wrote in the C code
And I will double click on Enter X:
It will start showing the main function
And while we analyze the code in IDA, it's very useful that you leave notes for yourself (and others).
Example: While analyzing this example, we find that the JNZ works in case of Error, so you can move the cursor to the label, press βnβ, and name it error.
And then make another Label and name it exit.
And the code will be like this
If we pressed the Space button it will start displaying the code in Graph form
As you see you will find two arrows one of them green and this is if the condition is met while the red if the condition is not met
This topic is very useful.
We can say that a very important part of Reverse Engineering work is that you reduce the amount of information you deal with
MSVC: x86 + OllyDbg
We will try to Hack this program in X32 dbg, and make it think that scanf() always works without Error
We will start of course to do as we do every time until the main code appears to us
We will keep doing F8 until we reach the Call test.271110 then the Console will ask us to enter the value of X and suppose I entered the name V3n0m
And then we change the value of EAX to 1
Then the result will appear to me normally
MSVC: x86 + Hiew
Here is an explanation that we can Patch this program and bypass the check and so on but did it on Hiew I did it on x32 dbg and anyway the same idea
As we did the previous part all but what we will do extra we come to the instruction jnz 0x0027103A and we press on it and press Space from the keyboard and we make it Nop then the program will Skip it and work normally and continue like that
MSVC: x64
Since we are working here with variables of type int, which are still 32-bit in x86-64 architecture, we see that the 32-bit part of the registers (which is preceded by E-) is used here as well. But, when we work with pointers, we will find that the 64-bit parts of the registers are the ones being used, which are preceded by R-.
_DATA SEGMENT ; start of data segment
$SG2924 DB 'Enter X:', 0aH, 00H ; define string "Enter X:" with newline and null terminator
$SG2926 DB '%d', 00H ; define format string "%d" with null terminator
$SG2927 DB 'You entered %d...', 0aH, 00H ; define string "You entered %d..." with newline and null terminator
$SG2929 DB 'What you entered? Huh?', 0aH, 00H ; define string "What you entered? Huh?" with newline and null terminator
_DATA ENDS ; end of data segment
_TEXT SEGMENT ; start of text segment
x$ = 32 ; offset for variable x on stack
main PROC ; start of main procedure
$LN5: ; label
sub rsp, 56 ; subtract 56 from RSP to allocate stack space
lea rcx, OFFSET FLAT:$SG2924 ; load address of "Enter X:" into RCX
call printf ; call printf to print the prompt
lea rdx, QWORD PTR x$[rsp] ; load address of x into RDX
lea rcx, OFFSET FLAT:$SG2926 ; load address of "%d" into RCX
call scanf ; call scanf to read input
cmp eax, 1 ; compare return value in EAX with 1
jne SHORT $LN2@main ; jump if not equal to $LN2@main (error case)
mov edx, DWORD PTR x$[rsp] ; move value of x into EDX
lea rcx, OFFSET FLAT:$SG2927 ; load address of "You entered %d..." into RCX
call printf ; call printf to print success message
jmp SHORT $LN1@main ; jump to $LN1@main
$LN2@main: ; label for error case
lea rcx, OFFSET FLAT:$SG2929 ; load address of "What you entered? Huh?" into RCX
call printf ; call printf to print error message
$LN1@main: ; label for end
; return 0
xor eax, eax ; set EAX to 0
add rsp, 56 ; add 56 to RSP to deallocate stack space
ret 0 ; return from function
main ENDP ; end of main procedure
_TEXT ENDS ; end of text segment
END ; end of assembly
ARM
ARM: Optimizing Keil 6/2013 (Thumb mode)
var_8 = -8 ; define stack offset for variable
PUSH {R3,LR} ; push R3 and LR (link register) onto the stack
ADR R0, aEnterX ; "Enter X:\\n" ; load address of "Enter X:\\n" into R0
BL __2printf ; branch with link to printf
MOV R1, SP ; move SP (stack pointer) into R1 (address for input)
ADR R0, aD ; "%d" ; load address of "%d" into R0
BL __0scanf ; branch with link to scanf
CMP R0, #1 ; compare return value in R0 with 1
BEQ loc_1E ; branch if equal to loc_1E (success case)
ADR R0, aWhatYouEntered ; "What you entered? Huh?\\n" ; load address of error message into R0
BL __2printf ; branch with link to printf
loc_1A: ; CODE XREF: main+26 ; label, cross-reference from below
MOVS R0, #0 ; move 0 into R0 (return value)
POP {R3,PC} ; pop R3 and PC (return)
loc_1E: ; CODE XREF: main+12 ; label, cross-reference from CMP
LDR R1, [SP,#8+var_8] ; load value from stack into R1
ADR R0, aYouEnteredD___ ; "You entered %d...\\n" ; load address of success message into R0
BL __2printf ; branch with link to printf
B loc_1A ; branch to loc_1A
The new instructions here are CMP and BEQ
CMP
It is similar to the x86 instruction with the same name, subtracts one operand from the other and updates the conditional flags if necessary.
BEQ
Jumps to another address if the operands were equal to each other, or if the result of the last operation was 0, or if flag Z equals 1. Meaning it behaves like JZ in x86.
Everything else is simple: the execution behavior branches into two branches, and then the two branches intersect at the place where the value 0 is written in R0 as a return value from the function, and then the function ends.
ARM64
.LC0: ; label for string
.string "Enter X:" ; define string "Enter X:"
.LC1: ; label for string
.string "%d" ; define format string "%d"
.LC2: ; label for string
.string "You entered %d...\\n" ; define string "You entered %d...\\n"
.LC3: ; label for string
.string "What you entered? Huh?" ; define string "What you entered? Huh?"
f6: ; function label (main)
; save FP and LR in stack frame
stp x29, x30, [sp, -32]! ; store pair X29 and X30 on stack, pre-decrement SP by 32
; set frame pointer FP to SP
add x29, sp, 0 ; add 0 to SP and store in X29
: ; load address of "Enter X:"
adrp x0, .LC0 ; load page address of .LC0 into X0
add x0, x0, :lo12:.LC0 ; add low 12 bits to get full address
bl puts ; branch with link to puts
; load address of "%d"
adrp x0, .LC1 ; load page address of .LC1 into X0
add x0, x0, :lo12:.LC1 ; add low 12 bits to get full address
; calculate address of x in local stack
add x1, x29, 28 ; add 28 to X29 and store in X1
bl __isoc99_scanf ; branch with link to scanf
; W0 ; check result returned by scanf in W0
cmp w0, 1 ; compare W0 with 1
; BNE means Branch if Not Equal
; if W0 != 1, branch to .L2
bne .L2 ; branch not equal to .L2
; load value of x from local stack
ldr w1, [x29,28] ; load word from [X29+28] into W1
; %d...\\n" ; load address of "You entered %d...\\n"
adrp x0, .LC2 ; load page address
add x0, x0, :lo12:.LC2 ; add low bits
bl printf ; branch with link to printf
;"What you entered? Huh?" ; skip error message code
b .L3 ; branch to .L3
.L2: ; label for error
; "What you entered? Hu register: ; load address of error message
adrp x0, .LC3 ; load page address
add x0, x0, :lo12:.LC3 ; add low bits
bl puts ; branch with link to puts
.L3: ; label for end
; ; return 0
mov w0, 0 ; move 0 into W0
;; restore FP and LR from stack
ldp x29, x30, [sp], 32 ; load pair X29 and X30, post-increment SP by 32
ret ; return
This code shows the use of CMP and BNE (Branch if Not Equal) instructions
MIPS
text:004006A0 main: ; start of main
var_18 = -0x18 ; define stack offsets
var_10 = -0x10
var_4 = -4
lui $gp, 0x42 ; load upper immediate into GP
addiu $sp, -0x28 ; add immediate unsigned to SP (allocate stack)
li $gp, 0x418960 ; load immediate into GP
sw $ra, 0x28+var_4($sp) ; store word RA to stack
sw $gp, 0x28+var_18($sp) ; store word GP to stack
la $t9, puts ; load address of puts into T9
lui $a0, 0x40 ; load upper immediate into A0
jalr $t9 ; puts ; jump and link register to puts
la $a0, aEnterX # "Enter X:" ; load address of "Enter X:" (branch delay slot)
lw $gp, 0x28+var_18($sp) ; load word GP from stack
lui $a0, 0x40 ; load upper immediate into A0
la $t9, __isoc99_scanf ; load address of scanf into T9
la $a0, aD # "%d" ; load address of "%d"
jalr $t9 ; __isoc99_scanf ; jump and link to scanf
addiu $a1, $sp, 0x28+var_10 # branch delay slot ; add immediate to SP for address
li $v1, 1 ; load immediate 1 into V1
lw $gp, 0x28+var_18($sp) ; load GP
li $v1, 1 ; load 1 into V1 again
lw $gp, 0x28+var_18($sp) ; load GP
beq $v0, $v1, loc_40070C ; branch if equal to loc_40070C (success)
or $at, $zero # branch delay slot, NOP ; or zero (NOP)
la $t9, puts ; load address of puts
lui $a0, 0x40 ; load upper into A0
jalr $t9 ; puts ; jump to puts
la $a0, aWhatYouEntered # "What you entered? Huh?" ; load error message (delay slot)
lw $ra, 0x28+var_4($sp) ; load RA
move $v0, $zero ; move zero to V0
jr $ra ; jump register RA (return)
addiu $sp, 0x28 ; add to SP (deallocate, delay slot)
loc_40070C: ; label for success
la $t9, printf ; load address of printf
lw $a1, 0x28+var_10($sp) ; load from stack into A1
lui $a0, 0x40 ; load upper into A0
jalr $t9 ; printf ; jump to printf
la $a0, aYouEnteredD___ # "You entered %d...\\n" ; load success message (delay slot)
lw $ra, 0x28+var_4($sp) ; load RA
move $v0, $zero ; move zero to V0
jr $ra ; return
addiu $sp, 0x28 ; deallocate stack (delay slot)
The scanf() returns its result in the $V0 register. It is checked at address 0x004006E4 by comparing the value in $V0 with the value stored in $V1 (which is 1). The BEQ instruction means "Branch Equal", meaning if the two values are equal (which is success of the operation), a transfer will occur to address 0x0040070C.