1.18.3 Ternary conditional operator
The ternary conditional operator in C/C++ has this syntax:
expression ? expression : expression // ternary operator syntax; evaluates first expression, returns second if true, third if false
Example:
const char* f(int a) // define function returning const char pointer; takes int parameter a
{
return a == 10 ? "it is ten" : "it is not ten"; // return "it is ten" if a equals 10, otherwise "it is not ten"
};
x86 – Old and non-optimizing compiler
Old and non-optimizing compilers generate the code as if you used if/else:
Listing 1.127: Non-optimizing MSVC 2008
$SG746 DB 'it is ten', 00H ; define null-terminated string "it is ten"
$SG747 DB 'it is not ten', 00H ; define null-terminated string "it is not ten"
tv65 = -4 ; temporary variable
_a$ = 8 ; parameter a offset
_f PROC ; start of function procedure
push ebp ; save base pointer
mov ebp, esp ; set up stack frame
push ecx ; allocate space for temporary
cmp DWORD PTR _a$[ebp], 10 ; compare a with 10
jne SHORT $LN3@f ; jump if not equal
mov DWORD PTR tv65[ebp], OFFSET $SG746 ; load address of "it is ten" into temporary
jmp SHORT $LN4@f ; jump to exit
$LN3@f: ; label
mov DWORD PTR tv65[ebp], OFFSET $SG747 ; load address of "it is not ten" into temporary
$LN4@f: ; label
mov eax, DWORD PTR tv65[ebp] ; move temporary address to EAX (return value)
mov esp, ebp ; restore stack pointer
pop ebp ; restore base pointer
ret 0 ; return
_f ENDP ; end of procedure
x86 – Optimizing compiler (MSVC 2008)
Listing 1.128: Optimizing MSVC 2008
$SG792 DB 'it is ten', 00H ; define string "it is ten"
$SG793 DB 'it is not ten', 00H ; define string "it is not ten"
_a$ = 8 ; size = 4 ; parameter offset
_f PROC ; start of procedure
cmp DWORD PTR _a$[esp-4], 10 ; compare a with 10
mov eax, OFFSET $SG792 ; load address of "it is ten" into EAX
je SHORT $LN4@f ; jump if equal
mov eax, OFFSET $SG793 ; load address of "it is not ten" into EAX
$LN4@f: ; label
ret 0 ; return
_f ENDP ; end of procedure
The newer compilers are shorter and simpler.
x64 – Optimizing MSVC 2012
Listing 1.129: Optimizing MSVC 2012 x64
$SG1355 DB 'it is ten', 00H ; define string
$SG1356 DB 'it is not ten', 00H ; define string
a$ = 8 ; parameter offset
f PROC ; start of procedure
lea rdx, OFFSET FLAT:$SG1355 ; load effective address of "it is ten" into RDX
lea rax, OFFSET FLAT:$SG1356 ; load effective address of "it is not ten" into RAX
cmp ecx, 10 ; compare input with 10
cmove rax, rdx ; move RDX to RAX if equal
ret 0 ; return
f ENDP ; end of procedure
Optimizing GCC 4.8 on x86 also uses CMOVcc, while non-optimizing uses conditional jumps.
ARM – Optimizing Keil (ARM mode)
Listing 1.130: Optimizing Keil 6/2013 (ARM mode)
f PROC ; start of procedure
CMP r0,#0xa ; compare input with 10
ADREQ r0,|L0.16| ; load address of "it is ten" if equal
ADRNE r0,|L0.28| ; load address of "it is not ten" if not equal
BX lr ; return
ENDP ; end of procedure
|L0.16| ; label
DCB "it is ten",0 ; define string
|L0.28| ; label
DCB "it is not ten",0 ; define string
The ADREQ and ADRNE instructions cannot be executed at the same time without manual intervention.
ARM – Optimizing Keil (Thumb mode)
Listing 1.131: Optimizing Keil 6/2013 (Thumb mode)
f PROC ; start of procedure
CMP r0,#0xa ; compare with 10
BEQ |L0.8| ; branch if equal
ADR r0,|L0.12| ; load "it is not ten"
BX lr ; return
|L0.8| ; label
ADR r0,|L0.28| ; load "it is ten"
BX lr ; return
ENDP ; end of procedure
|L0.12| ; label
DCB "it is not ten",0 ; define string
|L0.28| ; label
DCB "it is ten",0 ; define string
In Thumb mode there are no conditional load instructions, so it needs jumps.
ARM64 – GCC (Linaro) 4.9
Listing 1.132: Optimizing GCC (Linaro) 4.9
f: ; function label
cmp x0, 10 ; compare input with 10
beq .L3 ; branch if equal
adrp x0, .LC1 ; load page address of "it is ten"
add x0, x0, :lo12:.LC1 ; add low 12 bits
ret ; return
.L3: ; label
adrp x0, .LC0 ; load page address of "it is not ten"
add x0, x0, :lo12:.LC0 ; add low 12 bits
ret ; return
.LC0: ; label
.string "it is ten" ; define string
.LC1: ; label
.string "it is not ten" ; define string
ARM64 has no conditional load instruction like ADRcc in ARM32 or CMOVcc in x86. There is Conditional SELect (CSEL) but GCC 4.9 does not use it here.
MIPS – GCC 4.4.5
Listing 1.133: Optimizing GCC 4.4.5 (assembly output)
$LC0: ; label
.ascii "it is not ten\000" ; define string
$LC1: ; label
.ascii "it is ten\000" ; define string
f: ; function label
li $2,10 # 0xa ; load immediate 10 into $2
beq $4,$2,$L2 ; branch if input equal to 10
nop # branch delay slot
lui $2,%hi($LC0) ; load upper immediate of address
j $31 ; jump to return address
addiu $2,$2,%lo($LC0) ; add lower immediate
$L2: ; label
lui $2,%hi($LC1) ; load upper immediate
j $31 ; jump to return
addiu $2,$2,%lo($LC1) ; add lower immediate
Alternative if/else in C
const char* f(int a) // define function
{
if (a==10) // check equality
return "it is ten"; // return string
else
return "it is not ten"; // return other string
};
Interestingly, optimizing GCC 4.8 on x86 was able to use CMOVcc in this case:
Listing 1.134: Optimizing GCC 4.8
.LC0: ; label
.string "it is ten" ; define string
.LC1: ; label
.string "it is not ten" ; define string
f: ; function
.LFB0: ; label
cmp DWORD PTR [esp+4], 10 ; compare input with 10
mov edx, OFFSET FLAT:.LC1 ; load "it is not ten"
mov eax, OFFSET FLAT:.LC0 ; load "it is ten"
cmovne eax, edx ; move EDX to EAX if not equal
ret ; return
And in summary for this talk the optimized compilers try to get rid of conditional jumps so the code is faster.
1.18.4 Getting minimal and maximal values
32-bit
int my_max(int a,int b) // define max function
{
if (a > b) // check if a greater than b
return a; // return a
else
return b; // return b
};
int my_min(int a,int b) // define min function
{
if (a < b) // check if a less than b
return a; // return a
else
return b; // return b
};
x86 – Non-optimizing MSVC 2013
Listing 1.135: Non-optimizing MSVC 2013
_a$ = 8 ; parameter a
_b$ = 12 ; parameter b
_my_min PROC ; min procedure
push ebp ; save base pointer
mov ebp, esp ; set stack frame
mov eax, DWORD PTR _a$[ebp] ; load a into EAX
cmp eax, DWORD PTR _b$[ebp] ; compare a and b
jge SHORT $LN2@my_min ; jump if a >= b
mov eax, DWORD PTR _a$[ebp] ; return a
jmp SHORT $LN3@my_min ; jump to exit
jmp SHORT $LN3@my_min ; redundant jump
$LN2@my_min: ; label
mov eax, DWORD PTR _b$[ebp] ; return b
$LN3@my_min: ; label
pop ebp ; restore base pointer
ret 0 ; return
_my_min ENDP ; end procedure
_my_max PROC ; max procedure
push ebp ; save base pointer
mov ebp, esp ; set stack frame
mov eax, DWORD PTR _a$[ebp] ; load a
cmp eax, DWORD PTR _b$[ebp] ; compare a and b
jle SHORT $LN2@my_max ; jump if a <= b
mov eax, DWORD PTR _a$[ebp] ; return a
jmp SHORT $LN3@my_max ; jump to exit
jmp SHORT $LN3@my_max ; redundant jump
$LN2@my_max: ; label
mov eax, DWORD PTR _b$[ebp] ; return b
$LN3@my_max: ; label
pop ebp ; restore base pointer
ret 0 ; return
_my_max ENDP ; end procedure
The difference between the two functions is in the jump instruction: JGE in the first and JLE in the second.
There is a redundant JMP in each function, MSVC probably left it by mistake.
ARM – Thumb mode (Branchless-ish)
Listing 1.136: Optimizing Keil 6/2013 (Thumb mode)
my_max PROC ; max procedure
; R0=A, R1=B
CMP r0, r1 ; compare a and b
BGT |L0.6| ; branch if a > b
MOVS r0, r1 ; return b if a <= b
|L0.6| ; label
BX lr ; return
ENDP ; end procedure
my_min PROC ; min procedure
; R0=A, R1=B
CMP r0, r1 ; compare a and b
BLT |L0.14| ; branch if a < b
MOVS r0, r1 ; return b if a >= b
|L0.14| ; label
BX lr ; return
ENDP ; end procedure
The difference is in the jump instructions: BGT and BLT.
ARM – ARM mode (MOVcc)
Listing 1.137: Optimizing Keil 6/2013 (ARM mode)
my_max PROC ; max procedure
CMP r0, r1 ; compare a and b
MOVLE r0, r1 ; move b to r0 if a <= b
BX lr ; return
ENDP ; end procedure
my_min PROC ; min procedure
CMP r0, r1 ; compare a and b
MOVGE r0, r1 ; move b to r0 if a >= b
BX lr ; return
ENDP ; end procedure
MOVcc is executed only if the condition is met.
x86 – Optimizing MSVC 2013 (CMOVcc)
Listing 1.138: Optimizing MSVC 2013
my_max: ; max label
mov edx, DWORD PTR [esp+4] ; load a into EDX
mov eax, DWORD PTR [esp+8] ; load b into EAX
cmp edx, eax ; compare a and b
cmovge eax, edx ; move a to EAX if a >= b
ret ; return
my_min: ; min label
mov edx, DWORD PTR [esp+4] ; load a into EDX
mov eax, DWORD PTR [esp+8] ; load b into EAX
cmp edx, eax ; compare a and b
cmovle eax, edx ; move a to EAX if a <= b
ret ; return
64-bit – C code
#include <stdint.h> // include standard integer types
int64_t my_max(int64_t a,int64_t b) // define 64-bit max
{
if (a > b) // check greater
return a;
else
return b;
};
int64_t my_min(int64_t a,int64_t b) // define 64-bit min
{
if (a < b) // check less
return a;
else
return b;
};
ARM64 – Non-optimizing GCC 4.9.1
Listing 1.139: Non-optimizing GCC 4.9.1 ARM64
my_max: ; max label
sub sp, sp, #16 ; allocate stack
str x0, [sp,8] ; store a
str x1, [sp] ; store b
ldr x1, [sp,8] ; load a
ldr x0, [sp] ; load b
cmp x1, x0 ; compare
ble .L2 ; branch if a <= b
ldr x0, [sp,8] ; load a
b .L3 ; branch to exit
.L2: ; label
ldr x0, [sp] ; load b
.L3: ; label
add sp, sp, 16 ; deallocate
ret ; return
my_min: ; min label
sub sp, sp, #16 ; allocate
str x0, [sp,8] ; store a
str x1, [sp] ; store b
ldr x1, [sp,8] ; load a
ldr x0, [sp] ; load b
cmp x1, x0 ; compare
bge .L5 ; branch if a >= b
ldr x0, [sp,8] ; load a
b .L6 ; branch
.L5: ; label
ldr x0, [sp] ; load b
.L6: ; label
add sp, sp, 16 ; deallocate
ret ; return
There is some extra value shuffling, but the code is understandable.
x64 – Optimizing GCC 4.9.1 (Branchless)
Listing 1.140: Optimizing GCC 4.9.1 x64
my_max: ; max label
; RDI=A, RSI=B
cmp rdi, rsi ; compare a and b
mov rax, rsi ; prepare b for return
cmovge rax, rdi ; move a if a >= b
ret ; return
my_min: ; min label
; RDI=A, RSI=B
cmp rdi, rsi ; compare
mov rax, rsi ; prepare b
cmovle rax, rdi ; move a if a <= b
ret ; return
MSVC 2013 does almost the same thing.
ARM64 – Optimizing GCC 4.9.1 (CSEL)
Listing 1.141: Optimizing GCC 4.9.1 ARM64
my_max: ; max label
; X0=A, X1=B
cmp x0, x1 ; compare
csel x0, x0, x1, ge ; select a if >= else b
ret ; return
my_min: ; min label
; X0=A, X1=B
cmp x0, x1 ; compare
csel x0, x0, x1, le ; select a if <= else b
ret ; return
CSEL works like MOVcc in ARM or CMOVcc in x86.
MIPS – GCC 4.4.5
Listing 1.142: Optimizing GCC 4.4.5 (IDA)
my_max: ; max label
slt $v1, $a1, $a0 ; set $v1 to 1 if b < a
beqz $v1, locret_10 ; branch if $v1 == 0
move $v0, $a1 ; return b (branch taken)
move $v0, $a0 ; return a (branch not taken)
locret_10: ; label
jr $ra ; return
or $at, $zero ; delay slot NOP
my_min: ; min label
slt $v1, $a0, $a1 ; swap operands, set if a < b
beqz $v1, locret_28 ; branch if not
move $v0, $a1 ; return b
move $v0, $a0 ; return a
locret_28: ; label
jr $ra ; return
or $at, $zero ; delay slot NOP
And pay attention now to branch delay slots:
- The first MOVE is executed before BEQZ
- The second MOVE is executed if branch not taken.
1.18.5 Conclusion
x86
This is the general form (basic structure) of the conditional jump:
Listing 1.143: x86
CMP register, register/value ; compare
Jcc true ; cc = condition code, jump if true
false: ; label
; ... some code executed if comparison false ...
JMP exit ; jump to exit
true: ; label
; ... some code executed if comparison true ...
exit: ; label
ARM
Listing 1.144: ARM
CMP register, register/value ; compare
Bcc true ; cc = condition code
false: ; label
; ... some code if false ...
JMP exit
true: ; label
; ... some code if true ...
exit: ; label
MIPS
Listing 1.145: Check if value is zero
BEQZ REG, label ; branch if equal to zero
...
Listing 1.146: Check if value less than zero using pseudoinstruction
BLTZ REG, label ; branch if less than zero
...
Listing 1.147: Check equality
BEQ REG1, REG2, label ; branch if equal
...
Listing 1.148: Check not equal
BNE REG1, REG2, label ; branch if not equal
...
Listing 1.149: Check less than (signed)
SLT REG1, REG2, REG3 ; set REG1 to 1 if REG2 < REG3 (signed)
BEQ REG1, label ; branch if set
...
Listing 1.150: Check less than (unsigned)
SLTU REG1, REG2, REG3 ; set REG1 to 1 if REG2 < REG3 (unsigned)
BEQ REG1, label ; branch if set
...
Without jumps (Branchless)
If the body of the conditional statement is very small, it is possible to use a conditional move instruction instead of jump:
- MOVcc in ARM (ARM mode)
- CSEL in ARM64
- CMOVcc in x86
ARM – Conditional suffixes
In ARM mode, it is possible to use conditional suffixes for some instructions:
Listing 1.151: ARM (ARM mode)
CMP register, register/value ; compare
instr1_cc ; this instruction executed if condition true
instr2_cc ; another instruction if another condition true
; ... and so on ...
Of course there is no limit to the number of instructions with conditional suffixes,
As long as none of them changes the CPU flags.
ARM – Thumb mode (IT instruction)
Thumb mode has an instruction called IT,
Which allows adding conditions to the four instructions after it.
Listing 1.152: ARM (Thumb mode)
CMP register, register/value ; compare
ITEEE EQ ; if-then-else-else-else
instr1 ; executed if condition true
instr2 ; executed if condition false
instr3 ; executed if condition false
instr4 ; executed if condition false