Reverse Engineering for Beginners : Floating-point unit(Part1)

Posted Feb 21, 2026

By 0X_V3n0m

17 min read

1.25 Floating-point unit

The author started explaining that the FPU is a part inside the main CPU, specialized in dealing with floating point numbers.

In the old days it was called "coprocessor" and it was somewhat separate from the main processor.

1.25.1 IEEE 754

A number in IEEE 754 format consists of:

* a sign

* a fractional part (significand or fraction)

* an exponent

1.25.2 x86

The author said it is important to look into the idea of stack machines or learn the basics of the Forth language before studying the FPU in x86.

An interesting thing is that in the old days (before the 80486 processor) the coprocessor was a separate chip, and was not always installed on the motherboard. It was possible to buy it separately and install it. Starting from the 80486 DX processor, the FPU became integrated inside the CPU itself.

The FWAIT instruction reminds us of this fact — it puts the CPU in a wait state until the FPU finishes its work.

There are also remnants from that era, which is that FPU instructions start with what are called "escape opcodes" (D8..DF), meaning opcodes that used to be sent to a separate coprocessor.

The FPU has a Stack that can hold 8 registers, each one 80-bit.

And these registers are named:

Text


ST(0) .. ST(7)

And for simplicity, IDA and OllyDbg display ST(0) by the name:

Text

ST

And this is called in books: Stack Top

1.25.3 ARM, MIPS, x86/x64 SIMD

In ARM and MIPS the FPU is not a Stack. It is a set of registers we can access any one of them directly, just like the GPR exactly. The same idea is present in the SIMD extensions of x86/x64.

1.25.4 C/C++

The C and C++ languages provide at least two floating types:

* float → single precision (32-bit)

* double → double precision (64-bit)

And it is known that:

* single-precision means the number is stored in a single 32-bit word

* double-precision means it is stored in two words (64-bit)

GCC also supports the type:


long double

With extended precision (80-bit), but MSVC does not support it.

The float type takes the same number of bits as int in a 32-bit environment, but the representation of the number is completely different.

1.25.5 Simple example

Let's look at this simple example:


#include <stdio.h> // include standard I/O header

double f (double a, double b) // define function f taking two double parameters
{
    return a/3.14 + b*4.1; // divide a by 3.14, multiply b by 4.1, return their sum
};

int main() // program entry point
{
    printf ("%f\n", f(1.2, 3.4)); // call f with 1.2 and 3.4, print result as float
};

x86

MSVC

Let's compile it in MSVC 2010:

Listing 1.207: MSVC 2010: f()

Assembly


CONST SEGMENT
__real@4010666666666666 DQ 04010666666666666r ; 4.1 ; constant 4.1 stored as IEEE 754 64-bit
CONST ENDS

CONST SEGMENT
__real@40091eb851eb851f DQ 040091eb851eb851fr ; 3.14 ; constant 3.14 stored as IEEE 754 64-bit
CONST ENDS

_TEXT SEGMENT

_a$ = 8  ; size = 8 ; parameter a offset on stack
_b$ = 16 ; size = 8 ; parameter b offset on stack

_f PROC
    push ebp                                    ; save base pointer
    mov  ebp, esp                               ; set up stack frame

    fld  QWORD PTR _a$[ebp]                     ; load a (8 bytes) from stack into ST(0)
    ; current stack state: ST(0) = _a

    fdiv QWORD PTR __real@40091eb851eb851f      ; divide ST(0) by 3.14
    ; current stack state:
    ; ST(0) = result of _a divided by 3.14

    fld  QWORD PTR _b$[ebp]                     ; load b (8 bytes) from stack, push onto FPU stack
    ; current stack state:
    ; ST(0) = _b
    ; ST(1) = result of _a divided by 3.14

    fmul QWORD PTR __real@4010666666666666      ; multiply ST(0) by 4.1
    ; current stack state:
    ; ST(0) = result of _b * 4.1
    ; ST(1) = result of _a divided by 3.14

    faddp ST(1), ST(0)                          ; add ST(0) and ST(1), pop ST(0), result stays in ST(0)
    ; current stack state:
    ; ST(0) = result of addition

    pop ebp                                     ; restore base pointer
    ret 0                                       ; return (result is in ST(0))
_f ENDP

The FLD instruction takes 8 bytes from the stack and loads the number into register ST(0), and it is automatically converted to the internal 80-bit format (extended precision).

The FDIV instruction divides the value in ST(0) by the number stored at the address:

Text


__real@40091eb851eb851f

And that is the number 3.14 stored in IEEE 754 64-bit format. Because assembly does not support writing floating numbers directly, we are seeing the hex representation.

After executing FDIV, ST(0) contains the result of the division.

By the way, there is an instruction called FDIVP that divides ST(1) by ST(0), pops both values from the stack, and puts the result in their place. If you know the Forth language you will quickly understand that this is a Stack Machine.

After that the FLD instruction adds the value of b onto the stack.

As a result:

* ST(0) = b

* ST(1) = result of a/3.14

After that the FMUL instruction multiplies b (which is in ST(0)) by the number stored at:

Text


__real@4010666666666666

Which is 4.1. And the result is stored in ST(0).

The last instruction FADDP adds the two values on top of the stack:

* The result is placed in ST(1)

* Then ST(0) is popped

So the final result remains in:

Text


ST(0)

The function must return the result in ST(0) because that is the calling convention in x86 for floating point. And that is why there are no other instructions besides the function epilogue after FADDP.

MSVC + OllyDbg

We will do this also on x32dbg and we will compile it this way:

Shell


cl /arch:IA32 /fp:precise /Od test.c

And then we will run the exe on x32dbg.

Two pairs of 32-bit words highlighted in red in the stack. Each pair is a double number in IEEE 754 format, and they were sent from main().

We are seeing how the first FLD instruction loaded the value (1.2) from the stack and placed it in ST(0):

Due to unavoidable conversion errors from 64-bit IEEE 754 to 80-bit (which the FPU uses internally), we are seeing 1.1999… which is close to 1.2.

Now EIP is pointing to the next instruction (FDIV), which loads a double (constant) number from memory.

The FDIV instruction was executed, and now ST(0) contains 0.382… (the result of the division):

The next FLD instruction was executed, and loaded 3.4 into ST(0) (here we see the approximate value 3.39999…):

At the same time, the result of the division was pushed into ST(1). Now EIP is pointing to the next instruction: FMUL. It loads the constant 4.1 from memory.

After that: the FMUL instruction was executed, so the result of the multiplication is now in ST(0).

After that: the FADDP instruction was executed, and now the result of the addition is in ST(0) and ST(1) was cleared:

The result remained in ST(0), because the function returns its value in ST(0). main() then takes this value from the register.

We also see something slightly strange: the value 13.93… is now present in ST(7). Why?

As we read earlier in the book, the FPU registers are a Stack. But that is a simplification. Imagine if it were implemented literally in hardware that way, the contents of the 7 registers would have to be moved or copied every time a push or pop happens — and that is a lot of work.

In reality, the FPU has only 8 registers and a pointer called TOP that contains the number of the register which is the current "top of the stack".

When a value is pushed, TOP moves to the next available register, and then the value is written there.

When a pop happens, the operation is done in reverse, but the register that was cleared is not zeroed out (it could be zeroed, but that is extra work and reduces performance).

And that is why this is what we are seeing here.

We could say that FADDP stored the sum in the stack and then popped an element. But in reality, the instruction stored the result and then moved the TOP pointer.

And to be more precise, the FPU registers are a circular buffer.

GCC

GCC 4.4.1 (with the -O3 option) produces almost the same code, but with a small difference:

Listing 1.208: Optimizing GCC 4.4.1

Assembly


public f
f proc near

arg_0 = qword ptr 8  ; first argument (a) offset on stack
arg_8 = qword ptr 10h ; second argument (b) offset on stack

    push ebp                        ; save base pointer

    fld  ds:dbl_8048608             ; load constant 3.14 into ST(0)
    ; stack state now: ST(0) = 3.14

    mov  ebp, esp                   ; set up stack frame

    fdivr [ebp+arg_0]               ; reverse divide: ST(0) = arg_0 / ST(0)  (a / 3.14)
    ; stack state now: ST(0) = result of division

    fld  ds:dbl_8048610             ; load constant 4.1, push onto FPU stack
    ; stack state now:
    ; ST(0) = 4.1
    ; ST(1) = result of division

    fmul [ebp+arg_8]                ; multiply ST(0) by b (arg_8)
    ; stack state now:
    ; ST(0) = result of multiplication
    ; ST(1) = result of division

    pop ebp                         ; restore base pointer

    faddp st(1), st                 ; add ST(0) and ST(1), pop ST(0), result in ST(0)
    ; stack state now: ST(0) = result of addition

    retn                            ; return (result is in ST(0))
f endp

The difference is that first 3.14 is placed on the stack (in ST(0)), and then the value of arg_0 is divided by the value in ST(0). FDIVR means Reverse Divide — meaning it divides with the dividend and divisor swapped.

There is no similar instruction for multiplication, because multiplication is a commutative operation, so we use FMUL normally without an -R version.

FADDP adds the two values and also pops one of them. After this operation, ST(0) contains the sum.

ARM: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

The author mentioned that before ARM unified its floating point support, many companies used to add their own custom extensions. Then VFP (Vector Floating Point) became standard.

An important difference from x86 is that in ARM there is no stack, you work with registers directly.

Listing 1.209: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

Assembly


f
    VLDR D16, =3.14          ; load constant 3.14 into D16
    VMOV D17, R0, R1         ; load "a" from R0:R1 pair into D17
    VMOV D18, R2, R3         ; load "b" from R2:R3 pair into D18

    VDIV.F64 D16, D17, D16   ; D16 = D17 / D16  (a / 3.14)

    VLDR D17, =4.1           ; load constant 4.1 into D17
    VMUL.F64 D17, D18, D17   ; D17 = D18 * D17  (b * 4.1)

    VADD.F64 D16, D17, D16   ; D16 = D17 + D16  (b*4.1 + a/3.14)

    VMOV R0, R1, D16         ; move result from D16 into R0:R1 pair for return
    BX LR                    ; return

dbl_2C98 DCFD 3.14           ; constant 3.14 stored in memory
dbl_2CA0 DCFD 4.1            ; constant 4.1 stored in memory

We are seeing new registers with the letter D. These are 64-bit registers, and there are 32 of them. They can be used for double, and also for SIMD (NEON). There are also 32 S registers (32-bit) for float.

Easy to remember:

* D = Double

* S = Single

The constants 3.14 and 4.1 are stored in memory in IEEE 754 format. VLDR and VMOV are like LDR and MOV but they work on D-registers.

Functions receive arguments in R-registers, but each double is 64-bit, so it needs two registers.

VMOV D17, R0, R1 combines R0 and R1 into 64-bit and places them in D17. And the reverse is true when returning.

VDIV, VMUL, VADD are floating point instructions for division, multiplication and addition.

ARM (Thumb mode – without FPU)

Keil here generated code for a processor that has no FPU or NEON:

Assembly


BL __aeabi_dmul  ; call library function to emulate double multiplication
BL __aeabi_ddiv  ; call library function to emulate double division
BL __aeabi_dadd  ; call library function to emulate double addition

Instead of using FPU instructions, it calls library functions that emulate these operations. This is called:

* soft float / armel (emulation)

* hard float / armhf (using a real FPU)

ARM64: Optimizing GCC (Linaro) 4.9

Very concise code:

Listing 1.210

Assembly


f:
    ; D0 = a, D1 = b

    ldr d2, .LC25        ; load constant 3.14 into D2
    fdiv d0, d0, d2      ; D0 = D0 / D2  (a / 3.14)

    ldr d2, .LC26        ; load constant 4.1 into D2
    fmadd d0, d1, d2, d0 ; D0 = D1*D2 + D0  (b*4.1 + a/3.14) in one instruction

    ret                  ; return (result in D0)

FMADD does:

Text


D0 = D1*D2 + D0

In a single instruction.

ARM64: Non-optimizing GCC

The code is much longer, with many value transfers between registers and memory, and there are FMOV instructions that are clearly redundant. It is obvious that GCC 4.9 at that time was not yet strong in generating ARM64 code.

An important point: ARM64 registers are 64-bit, so it is possible to store a double directly in a GPR, and this is not possible in a 32-bit CPU.

1.25.6 Passing floating point numbers via arguments


#include <math.h>   // include math header for pow()
#include <stdio.h>  // include standard I/O header

int main () // program entry point
{
    printf ("32.01 ^ 1.54 = %lf\n", pow (32.01,1.54)); // compute 32.01 raised to power 1.54 and print result
    return 0; // return 0 to indicate successful termination
}

x86

Let's see what came out in (MSVC 2010):

Listing 1.212: MSVC 2010

Assembly


CONST SEGMENT
__real@40400147ae147ae1 DQ 040400147ae147ae1r ; 32.01 ; constant 32.01 stored as IEEE 754 64-bit
__real@3ff8a3d70a3d70a4 DQ 03ff8a3d70a3d70a4r ; 1.54  ; constant 1.54 stored as IEEE 754 64-bit
CONST ENDS

_main PROC
    push ebp                                   ; save base pointer
    mov  ebp, esp                              ; set up stack frame

    sub  esp, 8                                ; allocate 8 bytes on stack for second argument
    fld  QWORD PTR __real@3ff8a3d70a3d70a4     ; load 1.54 into ST(0)
    fstp QWORD PTR [esp]                       ; store ST(0) (1.54) onto stack, pop FPU stack

    sub  esp, 8                                ; allocate 8 bytes on stack for first argument
    fld  QWORD PTR __real@40400147ae147ae1     ; load 32.01 into ST(0)
    fstp QWORD PTR [esp]                       ; store ST(0) (32.01) onto stack, pop FPU stack

    call _pow                                  ; call pow(32.01, 1.54)

    add  esp, 8                                ; clean up one argument from stack
    ; result is in ST(0)

    fstp QWORD PTR [esp]                       ; store result (double) onto stack for printf
    push OFFSET $SG2651                        ; push format string address
    call _printf                               ; call printf

    add  esp, 12                               ; clean up stack (format string + double)
    xor  eax, eax                              ; set return value to 0
    pop  ebp                                   ; restore base pointer
    ret  0                                     ; return
_main ENDP

FLD and FSTP transfer values between the data segment and the FPU stack. pow() takes the two values from the stack and returns the result in ST(0). printf() takes 8 bytes from the local stack and interprets them as a double.

By the way, it was possible to use a pair of MOV instructions instead of FLD/FSTP, because the values in memory are already in IEEE 754 format, and pow() also takes them in the same format, so no conversion is needed.

ARM + Non-optimizing Xcode 4.6.3 (Thumb-2)

Assembly


_main
    PUSH {R7,LR}              ; save R7 and link register
    MOV  R7, SP               ; set frame pointer
    SUB  SP, SP, #4           ; allocate stack space

    VLDR D16, =32.01          ; load constant 32.01 into D16
    VMOV R0, R1, D16          ; move D16 into R0:R1 pair (first argument to pow)

    VLDR D16, =1.54           ; load constant 1.54 into D16
    VMOV R2, R3, D16          ; move D16 into R2:R3 pair (second argument to pow)

    BLX _pow                  ; call pow(32.01, 1.54)

    VMOV D16, R0, R1          ; move result from R0:R1 back into D16

    MOV  R0, 0xFC1            ; load format string offset
    ADD  R0, PC               ; calculate absolute address of format string
    VMOV R1, R2, D16          ; move result into R1:R2 pair for printf
    BLX _printf               ; call printf

    MOVS R1, 0                ; set return value to 0
    MOV  R0, R1               ; move 0 into R0

    ADD  SP, SP, #4           ; deallocate stack space
    POP  {R7,PC}              ; restore and return

dbl_2F90 DCFD 32.01           ; constant 32.01 stored in memory
dbl_2F98 DCFD 1.54            ; constant 1.54 stored in memory

As said before, double numbers (64-bit) are passed in pairs of R-registers. _pow takes:

* the first argument in R0 and R1

* the second in R2 and R3

And returns the result in R0 and R1. The result is moved to D16 and then to R1 and R2 so that printf() can take it. The code has some redundancy because optimization is disabled.

ARM + Non-optimizing Keil (ARM mode)

Assembly


_main
    STMFD SP!, {R4-R6,LR}        ; save registers and link register on stack

    LDR R2, =0xA3D70A4           ; load low 32 bits of 1.54 (IEEE 754)
    LDR R3, =0x3FF8A3D7          ; load high 32 bits of 1.54

    LDR R0, =0xAE147AE1          ; load low 32 bits of 32.01 (IEEE 754)
    LDR R1, =0x40400147          ; load high 32 bits of 32.01

    BL pow                       ; call pow(32.01, 1.54); R0:R1 = first arg, R2:R3 = second arg

    MOV R4, R0                   ; save low 32 bits of result
    MOV R2, R4                   ; move low bits to R2 for printf
    MOV R3, R1                   ; move high bits to R3 for printf

    ADR R0, a32_011_54Lf         ; load address of format string
    BL __2printf                 ; call printf

    MOV R0, #0                   ; set return value to 0
    LDMFD SP!, {R4-R6,PC}        ; restore registers and return

Here there is no use of D-registers, but pairs of R-registers.

ARM64 + Optimizing GCC (Linaro) 4.9

Listing 1.213

Assembly


f:
    stp x29, x30, [sp, -16]!     ; save frame pointer and link register on stack
    add x29, sp, 0               ; set frame pointer

    ldr d1, .LC1                 ; load constant 1.54 into D1 (second argument to pow)
    ldr d0, .LC0                 ; load constant 32.01 into D0 (first argument to pow)

    bl  pow                      ; call pow(32.01, 1.54); result returned in D0
    ; result is in D0

    adrp x0, .LC2                ; load page address of format string
    add  x0, x0, :lo12:.LC2     ; add page offset to get full address
    bl   printf                  ; call printf(format, D0); D0 passed directly

    mov  w0, 0                   ; set return value to 0
    ldp  x29, x30, [sp], 16     ; restore frame pointer and link register
    ret                          ; return

.LC0:
    .word -1374389535            ; low 32 bits of 32.01 (IEEE 754)
    .word 1077936455             ; high 32 bits of 32.01

.LC1:
    .word 171798692              ; low 32 bits of 1.54 (IEEE 754)
    .word 1073259479             ; high 32 bits of 1.54

.LC2:
    .string "32.01 ^ 1.54 = %lf\n" ; format string for printf

The constants are loaded into D0 and D1. pow() takes them from there. The result returns in D0. And it is passed to printf() without any modification, because:

* Integers and pointers are passed in X-registers

* Floating point numbers are passed in D-registers

Reverse, Books

Reverse Books

This post is licensed under CC BY 4.0 by the author.