Post

Reverse Engineering for Beginners : Replacing arithmetic instructions to other ones

Reverse Engineering for Beginners : Replacing arithmetic instructions to other ones

For the sake of optimization, an instruction can be replaced with another instruction, or even with a group of instructions.

For example, ADD and SUB can replace each other.

Also, the LEA instruction is often used for simple arithmetic calculations.

1.24.1 Multiplication

1.24.1 Multiplication

Multiplication using addition

A simple example:

C

unsigned int f(unsigned int a)
{
    return a*8;
};

  

Multiplication by 8 was replaced with 3 addition instructions, which do the same thing.

It is obvious that the MSVC optimizer decided that this code could be faster.

Assembly

_TEXT SEGMENT
_a$ = 8                                  ; size = 4
_f PROC
    mov     eax, DWORD PTR _a$[esp-4]    ; load a into EAX
    add     eax, eax                     ; EAX = EAX + EAX = a*2
    add     eax, eax                     ; EAX = EAX + EAX = a*4
    add     eax, eax                     ; EAX = EAX + EAX = a*8
    ret     0
_f ENDP
_TEXT ENDS
END

  

Multiplication using shifting

Multiplication and division instructions by numbers that are powers of 2 are often replaced with shift instructions.

C

unsigned int f(unsigned int a)
{
    return a*4;
};

  
Assembly

_a$ = 8                                  ; size = 4
_f PROC
    push    ebp
    mov     ebp, esp
    mov     eax, DWORD PTR _a$[ebp]      ; load a
    shl     eax, 2                       ; shift left by 2 bits = multiply by 4
    pop     ebp
    ret     0
_f ENDP

  

The instruction named SHL is an abbreviation for SHift Left.

Multiplication by 4 is just shifting the number left by 2 bits and adding two zero bits on the right (as the least significant bits).

This is like multiplying 3 by 100 — we simply add two zeros on the right.

This is how the shift left instruction works:

Shift left example

The bits added on the right are always zeros.

To make it clearer in case you got confused:

Let us say:

C

a = 3

  

3 in binary:

Text

3 = 00000011

  

If we do:

Assembly

shl eax, 2

  

Meaning shift left twice:

Text

00000011 << 2 = 00001100

  

The result 00001100 in decimal = 12

Which is indeed: 3 * 4 = 12

Multiplication by 4 on ARM

Assembly

f PROC
    LSL r0, r0, #2                       ; shift left by 2 bits = multiply by 4
    BX  lr
ENDP

  

Multiplication by 4 on MIPS

Assembly

    jr      $ra
    sll     $v0, $a0, 2                  ; shift left by 2 = multiply by 4 (delay slot)

  

SLL means "Shift Left Logical".

Multiplication using shifting, subtracting, and adding

It is still possible to get rid of multiplication when multiplying by numbers like 7 or 17, also using shifting. The math used here is relatively simple.

32-bit

C

#include <stdint.h>

int f1(int a)
{
    return a*7;
};

int f2(int a)
{
    return a*28;
};

int f3(int a)
{
    return a*17;
};

  

x86

Assembly

; a*7
_a$ = 8
_f1 PROC
    mov     ecx, DWORD PTR _a$[esp-4]    ; ECX = a
    lea     eax, DWORD PTR [ecx*8]       ; EAX = ECX*8 = a*8
    sub     eax, ecx                     ; EAX = EAX - ECX = a*8 - a = a*7
    ret     0
_f1 ENDP

; a*17
_a$ = 8
_f3 PROC
    mov     eax, DWORD PTR _a$[esp-4]    ; EAX = a
    shl     eax, 4                       ; EAX = EAX << 4 = a*16
    add     eax, DWORD PTR _a$[esp-4]    ; EAX = EAX + a = a*16 + a = a*17
    ret     0
_f3 ENDP

  

ARM

Keil in ARM mode exploits the second operand shift modifiers:

Assembly

; a*7
||f1|| PROC
    RSB     r0, r0, r0, LSL #3           ; R0 = (R0 << 3) - R0 = a*8 - a = a*7
    BX      lr
ENDP

; a*28
||f2|| PROC
    RSB     r0, r0, r0, LSL #3           ; R0 = (R0 << 3) - R0 = a*8 - a = a*7
    LSL     r0, r0, #2                   ; R0 = R0 << 2 = a*7 * 4 = a*28
    BX      lr
ENDP

; a*17
||f3|| PROC
    ADD     r0, r0, r0, LSL #4           ; R0 = R0 + (R0 << 4) = a + a*16 = a*17
    BX      lr
ENDP

  

But there are no such modifiers in Thumb mode. It also cannot optimize f2():

Assembly

; a*7
||f1|| PROC
    LSLS    r1, r0, #3                   ; R1 = R0 << 3 = a*8
    SUBS    r0, r1, r0                    ; R0 = R1 - R0 = a*8 - a = a*7
    BX      lr
ENDP

; a*28
||f2|| PROC
    MOVS    r1, #0x1c                    ; R1 = 28
    MULS    r0, r1, r0                   ; R0 = 28 * a
    BX      lr
ENDP

; a*17
||f3|| PROC
    LSLS    r1, r0, #4                   ; R1 = R0 << 4 = a*16
    ADDS    r0, r0, r1                   ; R0 = a + a*16 = a*17
    BX      lr
ENDP

  

MIPS

Assembly

_f1:
    sll     $v0, $a0, 3                  ; $v0 = $a0 << 3 = a*8
    jr      $ra
    subu    $v0, $a0                     ; $v0 = $v0 - $a0 = a*8 - a = a*7 (delay slot)

_f2:
    sll     $v0, $a0, 5                  ; $v0 = $a0 << 5 = a*32
    sll     $a0, 2                       ; $a0 = $a0 << 2 = a*4
    jr      $ra
    subu    $v0, $a0                     ; $v0 = a*32 - a*4 = a*28 (delay slot)

_f3:
    sll     $v0, $a0, 4                  ; $v0 = $a0 << 4 = a*16
    jr      $ra
    addu    $v0, $a0                     ; $v0 = a*16 + a = a*17 (delay slot)

  

64-bit

C

#include <stdint.h>

int64_t f1(int64_t a)
{
    return a*7;
};

int64_t f2(int64_t a)
{
    return a*28;
};

int64_t f3(int64_t a)
{
    return a*17;
};

  

x64

Assembly

; a*7
f1:
    lea     rax, [0+rdi*8]               ; RAX = RDI*8 = a*8
    sub     rax, rdi                     ; RAX = RAX - RDI = a*8 - a = a*7
    ret

; a*28
f2:
    lea     rax, [0+rdi*4]               ; RAX = RDI*4 = a*4
    sal     rdi, 5                       ; RDI = RDI << 5 = a*32
    sub     rdi, rax                     ; RDI = a*32 - a*4 = a*28
    mov     rax, rdi
    ret

; a*17
f3:
    mov     rax, rdi
    sal     rax, 4                       ; RAX = RAX << 4 = a*16
    add     rax, rdi                     ; RAX = a*16 + a = a*17
    ret

  

ARM64

GCC 4.9 for ARM64 is also concise, thanks to shift modifiers:

Assembly

; a*7
f1:
    lsl     x1, x0, 3                    ; X1 = X0 << 3 = a*8
    sub     x0, x1, x0                   ; X0 = X1 - X0 = a*8 - a = a*7
    ret

; a*28
f2:
    lsl     x1, x0, 5                    ; X1 = X0 << 5 = a*32
    sub     x0, x1, x0, lsl 2             ; X0 = X1 - (X0 << 2) = a*32 - a*4 = a*28
    ret

; a*17
f3:
    add     x0, x0, x0, lsl 4            ; X0 = X0 + (X0 << 4) = a + a*16 = a*17
    ret

  

Booth’s multiplication algorithm

There was a time when computers were big and expensive to the point that some did not have hardware support for multiplication inside the CPU, like the Data General Nova.

When multiplication was needed, it could be done at the software level, for example using Booth’s multiplication algorithm.

This is a multiplication algorithm that uses only addition and shifts. What modern optimizing compilers do is not the same thing, but the goal (multiplication) and resources (faster operations) are the same.

1.24.2 Division

arithmetic optimizations

Division using shifts

An example of division by 4:

C

unsigned int f(unsigned int a)
{
    return a/4;
};

  

We get (MSVC 2010):

Assembly

_a$ = 8
_f PROC
    mov     eax, DWORD PTR _a$[esp-4]
    shr     eax, 2                       ; shift right by 2 bits = divide by 4
    ret     0
_f ENDP

  

The SHR (SHift Right) instruction in this example shifts the number right by 2 bits.

The two bits that were freed on the left (most significant bits) are filled with zeros.

The two bits on the right (least significant bits) are discarded.

In fact, these two discarded bits are the division remainder.

The SHR instruction works exactly like SHL, but in the opposite direction.

Shift right example

It is easy to understand if you imagine the number 23 in the decimal system.

23 can be easily divided by 10 by simply removing the last digit (3 — this is the remainder).

What remains after the operation is 2 as the quotient.

The remainder is discarded, but that's fine because we are working on integer values anyway, these are not real numbers!

Division by 4 on ARM

Assembly

f PROC
    LSR     r0, r0, #2                   ; logical shift right by 2 bits = divide by 4
    BX      lr
ENDP

  

Division by 4 on MIPS

Assembly

    jr      $ra
    srl     $v0, $a0, 2                  ; shift right logical by 2 = divide by 4 (delay slot)

  

SRL means "Shift Right Logical".

This post is licensed under CC BY 4.0 by the author.

Trending Tags