0xV3n0m

Announcement

Welcome To My Personal Blog

1.28 Manipulating specific bit(s)

At the start of this topic the author was saying that many functions define their inputs as "Flags" in bit fields. The idea is that in programming we sometimes need to pass more than one option or setting to a function using a single small number, instead of using separate variables for each option. The most common approach is to use a bit field or flag bits — meaning each bit in the number (for example a 32-bit number) represents a specific option (enabled/disabled). For example: bit 0 means "read", bit 1 means "write", and so on.

To make it clearer, the advantage of this approach is that it saves both memory space and CPU processing time.

The author said: "Of course we could have used bool variables for each option, but that is not economical." It would cost more space (each bool takes a byte for example instead of a single bit) and things would run slower.

1.28.1 Specific bit checking

An example from the Win32 x86 API:

1
HANDLE fh;
2
fh=CreateFile ("file", GENERIC_WRITE | GENERIC_READ, // combine two flags using bitwise OR
3
    FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

What we get (from MSVC 2010):

Listing 1.266: MSVC 2010

1
push    0
2
push    128                     ; 00000080H = FILE_ATTRIBUTE_NORMAL
3
push    4                       ; OPEN_ALWAYS
4
push    0                       ; NULL (security attributes)
5
push    1                       ; FILE_SHARE_READ
6
push    -1073741824             ; C0000000H = GENERIC_READ | GENERIC_WRITE combined
7
push    OFFSET $SG78813         ; pointer to "file" string
8
call    DWORD PTR __imp__CreateFileA@28 ; call CreateFileA via import table
9
mov     DWORD PTR _fh$[ebp], eax ; store returned file handle

Let's take a look inside the WinNT.h file:

Listing 1.267: WinNT.h

1
#define GENERIC_READ    (0x80000000L) // bit 31 — read access flag
2
#define GENERIC_WRITE   (0x40000000L) // bit 30 — write access flag
3
#define GENERIC_EXECUTE (0x20000000L) // bit 29 — execute access flag
4
#define GENERIC_ALL     (0x10000000L) // bit 28 — all access flag

Everything here is clear.
GENERIC_READ | GENERIC_WRITE = 0x80000000 | 0x40000000 = 0xC0000000, and that value is used as the second parameter (argument) to the CreateFile() function.

To explain how that value was formed:

* GENERIC_READ = 0x80000000 in binary: 1000 0000 0000 0000 0000 0000 0000 0000 (highest bit — bit 31)

* GENERIC_WRITE = 0x40000000 in binary: 0100 0000 ... (bit 30)

* GENERIC_READ | GENERIC_WRITE = 0xC0000000 in binary: 1100 0000 ... (both bits 31 and 30 are set)

When we do OR, the result = 0xC0000000. The book confirms that this value is the one used as the second argument to the function.

Now let's connect this to the Assembly we wrote above. In decimal, 0xC0000000 is -1073741824 — this confirms it is dwDesiredAccess.

How does CreateFile() check these flags?

If we look inside KERNEL32.DLL on Windows XP SP3 x86, we find this code section in CreateFileW:

Listing 1.268: KERNEL32.DLL (Windows XP SP3 x86)

1
.text:7C83D429 test byte ptr [ebp+dwDesiredAccess+3], 40h ; check bit 30 (GENERIC_WRITE) in the top byte
2
.text:7C83D42D mov [ebp+var_8], 1                         ; set local variable to 1
3
.text:7C83D434 jz short loc_7C83D417                      ; if bit was 0 (not set), jump away
4
.text:7C83D436 jmp loc_7C810817                           ; bit was 1 (set), jump to write handling

Here we see the TEST instruction, but it does not take the full second parameter — it only takes the highest byte (ebp+dwDesiredAccess+3) and checks it against the flag 0x40 (which here means the GENERIC_WRITE flag). TEST is essentially the same as AND, but without saving the result.

Let's go through each instruction step by step to make things easier and review together:

test byte ptr [ebp+dwDesiredAccess+3], 40h — This instruction performs a bitwise AND between the two operands but does not store the result anywhere. It only affects the processor flags (FLAGS register) — specifically ZF (Zero Flag), SF (Sign Flag), PF, etc. The byte ptr means it works on a single byte (8 bits) only, not all 4 bytes.

[ebp+dwDesiredAccess+3] — ebp is the stack frame base pointer. dwDesiredAccess is the name of the local variable or parameter. The +3 means it is taking the highest byte of the dwDesiredAccess DWORD (4 bytes). The highest byte covers bits 24 through 31. The GENERIC_WRITE flag lives in bit 30, which is inside that top byte. Specifically 0x40000000 >> 24 = 0x40, so it is checking bit 30 of the original value by looking at only the highest byte.

mov [ebp+var_8], 1 — simply moves the value 1 into a location on the stack.

The equivalent C code for this logic is:

1
if ((dwDesiredAccess & 0x40000000) == 0) goto loc_7C83D417; // if GENERIC_WRITE bit is NOT set, skip

If the AND left that bit set, the ZF flag will be cleared and the JZ conditional jump will not fire. The jump only happens if the bit 0x40000000 is not present in dwDesiredAccess — in that case the AND result is 0, ZF gets set to 1, and the jump fires.

Quick comparison — TEST vs AND, and their relation to CMP vs SUB

Since we mentioned before that CMP is the same as SUB but without saving the result, here is a quick side-by-side for review:

* AND (e.g. and eax, 0x40) — performs AND between EAX and 0x40, stores the result in EAX (first operand), and updates ZF, SF, PF.

* TEST (e.g. test eax, 0x40) — same AND operation but does not store the result. Only updates the flags.

* SUB (e.g. sub eax, ebx) — subtracts and stores the result in EAX, and updates the flags.

* CMP (e.g. cmp eax, ebx) — same subtraction but without storing the result. Only updates the flags.

Let's try GCC 4.4.1 on Linux:

1
#include <fcntl.h>      // file control flags (O_RDWR, O_CREAT, etc.)
2
#include <sys/types.h>  // POSIX types
3
#include <unistd.h>     // POSIX API (close, read, write...)
4

5
void main()
6
{
7
    int handle;
8
    handle=open ("file", O_RDWR | O_CREAT); // open file for read+write, create if not exists
9
};

And this is what comes out:

Listing 1.269: GCC 4.4.1

1
public main
2
main proc near
3
var_20 = dword ptr -20h   ; local stack slot
4
var_1C = dword ptr -1Ch   ; second argument to open() — flags
5
var_4  = dword ptr -4     ; local variable to store returned handle
6

7
    push    ebp
8
    mov     ebp, esp
9
    and     esp, 0FFFFFFF0h         ; align stack to 16-byte boundary
10
    sub     esp, 20h                ; allocate 32 bytes of local stack space
11
    mov     [esp+20h+var_1C], 42h   ; flags = 0x42 = O_RDWR | O_CREAT
12
    mov     [esp+20h+var_20], offset aFile ; first argument = pointer to "file" string
13
    call    _open                   ; call open(filename, flags)
14
    mov     [esp+20h+var_4], eax    ; store returned file descriptor
15
    leave                           ; restore stack frame
16
    retn                            ; return
17
main endp

If we look inside the open() function in libc.so, we will find it is just a syscall (system call):

Listing 1.270: open() (libc.so.6)

1
.text:000BE69B mov edx, [esp+4+mode]     ; load mode argument
2
.text:000BE69F mov ecx, [esp+4+flags]    ; load flags argument (O_RDWR | O_CREAT = 0x42)
3
.text:000BE6A3 mov ebx, [esp+4+filename] ; load filename pointer
4
.text:000BE6A7 mov eax, 5                ; syscall number 5 = sys_open
5
.text:000BE6AC int 80h                   ; invoke Linux kernel via interrupt 0x80

So the bit fields of open() are checked somewhere deep inside the Linux kernel. Of course, it is easy to download the Glibc source and the Linux kernel source, but we are interested in understanding the topic without them.

So, starting from Linux 2.6, when the syscall sys_open is invoked, control eventually transfers to do_sys_open, and from there to the function do_filp_open() (located in the kernel source tree at fs/namei.c).

Note: besides passing parameters via the stack, there is also a method called fastcall (which we will explain in detail later), which passes some of them via registers. This is faster because the CPU does not need to reach into stack memory to read the parameter values. GCC has the regparm option, through which you can specify how many parameters can be passed via registers. The Linux 2.6 kernel is compiled with the -mregparm=3 option. What this means for us is that the first 3 parameters will be passed via registers EAX, EDX, and ECX, and the rest via the stack. Of course, if there are fewer than 3 parameters, only part of that register set will be used.

The author downloaded the Linux 2.6.31 kernel, compiled it on Ubuntu with make vmlinux, opened it in IDA, and found the do_filp_open() function. Here is a part of it with his comments:

Listing 1.271: do_filp_open() (Linux kernel 2.6.31)

1
do_filp_open proc near
2
...
3
    push    ebp
4
    mov     ebp, esp
5
    push    edi
6
    push    esi
7
    push    ebx
8
    mov     ebx, ecx                ; EBX = open_flag (3rd argument, passed in ECX via regparm)
9
    add     ebx, 1
10
    sub     esp, 98h                ; allocate local stack space
11
    mov     esi, [ebp+arg_4]        ; ESI = acc_mode (5th argument, from stack)
12
    test    bl, 3                   ; test lowest 2 bits of open_flag
13
    mov     [ebp+var_80], eax       ; save dfd (1st argument, was in EAX)
14
    mov     [ebp+var_7C], edx       ; save pathname (2nd argument, was in EDX)
15
    mov     [ebp+var_78], ecx       ; save open_flag (3rd argument, was in ECX)
16
    jnz     short loc_C01EF684      ; if bits set, jump to flag-checking block
17
    mov     ebx, ecx                ; EBX <- open_flag

GCC saves the values of the first 3 parameters into local stack. If this were not done, the compiler would overwrite those registers, and that would be a very tight environment for the compiler's register allocator.

Let's find this specific code section:

Listing 1.272: do_filp_open() (Linux kernel 2.6.31)

1
loc_C01EF684:                   ; CODE XREF: do_filp_open+4F
2
    test    bl, 40h             ; check bit 6 of open_flag = O_CREAT flag (0x40)
3
    jnz     loc_C01EF810        ; if O_CREAT is set, jump to file-creation path
4
    mov     edi, ebx            ; EDI = open_flag
5
    shr     edi, 11h            ; shift right by 17 bits
6
    xor     edi, 1              ; flip bit 0
7
    and     edi, 1              ; keep only bit 0
8
    test    ebx, 10000h         ; check another flag bit
9
    jz      short loc_C01EF6D3  ; if not set, skip
10
    or      edi, 2              ; set bit 1 in EDI

The value 0x40 is what the O_CREAT macro equals. The open_flag is being checked to see if bit 0x40 is present, and if that bit is 1, the JNZ instruction that follows will fire.

ARM

The O_CREAT bit is checked differently in Linux kernel 3.8.0:

Listing 1.273: Linux kernel 3.8.0

1
struct file *do_filp_open(int dfd, struct filename *pathname,
2
    const struct open_flags *op)
3
{
4
    ...
5
    filp = path_openat(dfd, pathname, &nd, op, flags | LOOKUP_RCU); // pass flags into path_openat
6
    ...
7
}
8

9
static struct file *path_openat(int dfd, struct filename *pathname,
10
    struct nameidata *nd, const struct open_flags *op, int flags)
11
{
12
    ...
13
    error = do_last(nd, &path, file, op, &opened, pathname); // delegate to do_last
14
    ...
15
}
16

17
static int do_last(struct nameidata *nd, struct path *path,
18
    struct file *file, const struct open_flags *op,
19
    int *opened, struct filename *name)
20
{
21
    ...
22
    if (!(open_flag & O_CREAT)) {   // if O_CREAT flag is NOT set
23
        ...
24
        error = lookup_fast(nd, path, &inode); // just look up existing file
25
        ...
26
    } else {                        // O_CREAT IS set — need to create
27
        ...
28
        error = complete_walk(nd);  // complete path resolution for creation
29
    }
30
    ...
31
}

And here is how the ARM-compiled kernel looks in IDA:

Listing 1.274: do_last() from vmlinux (IDA)

1
.text:C0169EA8  MOV     R9, R3          ; R3 = 4th argument = open_flag pointer
2
...
3
.text:C0169ED4  LDR     R6, [R9]        ; R6 = open_flag value (load from pointer)
4
...
5
.text:C0169F68  TST     R6, #0x40       ; test O_CREAT bit (0x40) in open_flag
6
.text:C0169F6C  BNE     loc_C016A128    ; if O_CREAT is set, branch to creation path
7
.text:C0169F70  LDR     R2, [R4,#0x10]
8
.text:C0169F74  ADD     R12, R4, #8
9
.text:C0169F78  LDR     R3, [R4,#0xC]
10
.text:C0169F7C  MOV     R0, R4
11
.text:C0169F80  STR     R12, [R11,#var_50]
12
.text:C0169F84  LDRB    R3, [R2,R3]
13
.text:C0169F88  MOV     R2, R8
14
.text:C0169F8C  CMP     R3, #0
15
.text:C0169F90  ORRNE   R1, R1, #3
16
.text:C0169F94  STRNE   R1, [R4,#0x24]
17
.text:C0169F98  ANDS    R3, R6, #0x200000
18
.text:C0169F9C  MOV     R1, R12
19
.text:C0169FA0  LDRNE   R3, [R4,#0x24]
20
.text:C0169FA4  ANDNE   R3, R3, #1
21
.text:C0169FA8  EORNE   R3, R3, #1
22
.text:C0169FAC  STR     R3, [R11,#var_54]
23
.text:C0169FB0  SUB     R3, R11, #-var_38
24
.text:C0169FB4  BL      lookup_fast     ; O_CREAT not set → just look up existing file
25

26
...
27

28
.text:C016A128  loc_C016A128            ; CODE XREF: do_last.isra.14+DC
29
.text:C016A128  MOV     R0, R4
30
.text:C016A12C  BL      complete_walk   ; O_CREAT is set → complete path for file creation

Here TST is the ARM equivalent of the TEST instruction in x86. We can clearly see the two code paths — lookup_fast() executes in one case and complete_walk() in the other. This matches the source code of do_last() exactly. The O_CREAT macro equals 0x40 here as well.

Quick comparison — TST vs TEST in practice

* x86 TEST: test reg, imm — performs AND and updates SF, ZF, PF. Supports operands of any size (byte/word/dword).

* ARM TST: TST Rn, Rm or TST Rn, #imm — performs AND and updates the N (Negative), Z (Zero), and C (Carry in shift cases) flags. The most common use is checking the Z flag. It does not store the result either.

1.28.2 Setting and clearing specific bits

An example:

1
#include <stdio.h>
2

3
// check if a specific bit is set in flag
4
#define IS_SET(flag, bit) ((flag) & (bit))
5

6
// set a specific bit in var (force it to 1)
7
#define SET_BIT(var, bit) ((var) |= (bit))
8

9
// clear a specific bit in var (force it to 0)
10
#define REMOVE_BIT(var, bit) ((var) &= ~(bit))
11

12
int f(int a)
13
{
14
    int rt = a;
15
    SET_BIT(rt, 0x4000);    // set bit 14
16
    REMOVE_BIT(rt, 0x200);  // clear bit 9
17
    return rt;
18
}
19

20
int main()
21
{
22
    f(0x12340678);
23
}

Let's quickly explain the C code so we can connect everything together:

#define IS_SET(flag, bit) ((flag) & (bit)) — This macro checks whether the desired bit is set (i.e. equals 1). It uses the & (AND) operation between flag (the number containing the bits) and bit (the mask with only one bit set). If the result is non-zero, the bit is set. If zero, the bit is clear.

#define SET_BIT(var, bit) ((var) |= (bit)) — Uses the |= (OR assignment) operation, meaning var = var | bit. OR forces the bits in bit to become 1 in var, without affecting any other bits (if they were already 1 they stay 1, if 0 they stay 0).

#define REMOVE_BIT(var, bit) ((var) &= ~(bit)) — Uses &= with ~(bit). The ~ is bitwise NOT (every bit flips: 0 becomes 1, 1 becomes 0). So ~(bit) produces a mask where all bits are 1 except the one we want to clear which becomes 0. When we do var & mask, the bit that is 0 in the mask gets cleared in var, while all other bits stay unchanged.

x86

Non-optimizing MSVC

What we get (MSVC 2010):

1
_rt$ = -4       ; size = 4 ; local variable rt on stack
2
_a$  = 8        ; size = 4 ; argument a on stack
3

4
_f PROC
5
    push    ebp
6
    mov     ebp, esp
7
    push    ecx                             ; allocate space for local variable rt
8
    mov     eax, DWORD PTR _a$[ebp]         ; load argument a into EAX
9
    mov     DWORD PTR _rt$[ebp], eax        ; rt = a
10

11
    mov     ecx, DWORD PTR _rt$[ebp]        ; load rt into ECX
12
    or      ecx, 16384                      ; ECX |= 0x4000 — set bit 14 (SET_BIT)
13
    mov     DWORD PTR _rt$[ebp], ecx        ; store updated rt
14

15
    mov     edx, DWORD PTR _rt$[ebp]        ; reload rt into EDX
16
    and     edx, -513                       ; EDX &= 0xFFFFFDFF — clear bit 9 (REMOVE_BIT)
17
    mov     DWORD PTR _rt$[ebp], edx        ; store updated rt
18

19
    mov     eax, DWORD PTR _rt$[ebp]        ; load final rt into EAX (return value)
20
    mov     esp, ebp
21
    pop     ebp
22
    ret     0
23
_f ENDP

The OR instruction sets one bit in the register, leaving all other bits untouched.

The AND instruction clears one bit. We can say that AND copies all bits except one. In the second operand of AND, the bits you want to keep are set to 1, and the one bit you want to clear is set to 0 in the mask. That is the easiest way to remember the logic.

x32dbg

Let's try this example in x32dbg.

First, let's look at the binary representations of the constants we will be using:

The inverted value of 0x200 is 0xFFFFFDFF (0b11111111111111111110111111111).

0x4000 (0b00000000000000100000000000000) — bit 15.

The input value is: 0x12340678 (0b10010001101000000011001111000). Let's see how it loads:

And here is the result after the OR executes:

I was verifying the result with a calculator before it appeared, just to make sure I understood and was following along correctly.

Bit 15 got set: 0x12344678 (0b10010001101000100011001111000).

The value gets reloaded (because the compiler is not in optimization mode):

After the AND instruction executes:

Bit 10 got cleared (in other words, all bits were copied except bit 10), and the final value is now: 0x12344478 (0b10010001101000100010001111000).

Optimizing MSVC

If we compile it in MSVC with optimization enabled (/Ox), the code becomes shorter:

Listing 1.276: Optimizing MSVC

1
_a$ = 8        ; size = 4 ; argument a passed on stack
2
_f PROC
3
    mov     eax, DWORD PTR _a$[esp-4]  ; load argument a directly from stack into EAX
4
    and     eax, -513                  ; EAX &= 0xFFFFFDFF — clear bit 9 first
5
    or      eax, 16384                 ; EAX |= 0x4000 — set bit 14
6
    ret     0                          ; return EAX as result
7
_f ENDP

Non-optimizing GCC

Let's try GCC 4.4.1 without optimization:

1
public f
2
f proc near
3
var_4 = dword ptr -4    ; local variable rt on stack
4
arg_0 = dword ptr 8     ; argument a on stack
5

6
    push    ebp
7
    mov     ebp, esp
8
    sub     esp, 10h                    ; allocate local stack space
9
    mov     eax, [ebp+arg_0]            ; load argument a into EAX
10
    mov     [ebp+var_4], eax            ; rt = a
11
    or      [ebp+var_4], 4000h          ; rt |= 0x4000 — set bit 14 (directly on memory)
12
    and     [ebp+var_4], 0FFFFFDFFh     ; rt &= 0xFFFFFDFF — clear bit 9 (directly on memory)
13
    mov     eax, [ebp+var_4]            ; load final rt into EAX (return value)
14
    leave
15
    retn
16
f endp

There is some redundant code present, but it is shorter than the non-optimizing MSVC version.

Now let's try GCC with optimization -O3 enabled:

Optimizing GCC

Listing 1.278: Optimizing GCC

1
public f
2
f proc near
3
arg_0 = dword ptr 8     ; argument a on stack
4

5
    push    ebp
6
    mov     ebp, esp
7
    mov     eax, [ebp+arg_0]    ; load argument a into EAX
8
    pop     ebp
9
    or      ah, 40h             ; set bit 14 — operates on AH (bits 8-15 of EAX), 0x40 in AH = 0x4000 in EAX
10
    and     ah, 0FDh            ; clear bit 9 — 0xFD in AH = 0xFFFFFDFF mask applied to bits 8-15
11
    retn
12
f endp

That came out shorter. Worth noting that the compiler worked with a portion of the EAX register via the AH register — which is bits 8 through 15 (inclusive) of EAX.

Note: The old 16-bit 8086 processor had an accumulator called AX, which was composed of two 8-bit halves: AL (low byte) and AH (high byte). In the 80386, almost all registers were extended to 32 bits — the accumulator became EAX — but for compatibility, the old parts of it remain accessible as AX/AH/AL.

Since all x86 processors are descendants of the 16-bit 8086, the old 16-bit instructions are shorter in size than the new 32-bit ones. That is why or ah, 40h takes only 3 bytes. It would have been more natural to use or eax, 04000h but that takes 5 bytes, or even 6 (if the first operand register is not EAX).

Optimizing GCC and regparm

It becomes even shorter if we enable both the -O3 optimization flag and regparm=3.

Listing 1.279: Optimizing GCC

1
public f
2
f proc near
3
    push    ebp
4
    or      ah, 40h         ; argument a is already in EAX (regparm=3) — set bit 14 via AH
5
    mov     ebp, esp
6
    and     ah, 0FDh        ; clear bit 9 via AH
7
    pop     ebp
8
    retn
9
f endp

Indeed, the first argument is already loaded in EAX, so we can operate on it directly in place. Worth noting that both the function prologue (push ebp / mov ebp, esp) and epilogue (pop ebp) could easily be removed here, but GCC is apparently not aggressive enough to do that level of code-size optimization. Either way, short functions like this are best candidates for inlining.

ARM + Optimizing Keil 6/2013 (ARM mode)

Listing 1.280: Optimizing Keil 6/2013 (ARM mode)

1
02 0C C0 E3   BIC R0, R0, #0x200   ; clear bit 9 — BIC = Bitwise bit Clear (AND with inverted mask)
2
01 09 80 E3   ORR R0, R0, #0x4000  ; set bit 14 — ORR = logical OR
3
1E FF 2F E1   BX LR                ; return

BIC (Bitwise bit Clear) is an instruction for clearing specific bits. It works exactly like AND but with an inverted (NOT) operand — meaning it is equivalent to a NOT + AND pair.

ORR is "logical or", equivalent to OR in x86.

ARM + Optimizing Keil 6/2013 (Thumb mode)

Listing 1.281: Optimizing Keil 6/2013 (Thumb mode)

1
01 21 89 03   MOVS R1, #0x4000     ; R1 = 0x4000 (the bit we want to set)
2
08 43         ORRS R0, R1          ; R0 |= R1 — set bit 14
3
49 11         ASRS R1, R1, #5      ; R1 = 0x4000 >> 5 = 0x200 (generate 0x200 from 0x4000)
4
88 43         BICS R0, R1          ; R0 &= ~R1 — clear bit 9
5
70 47         BX LR                ; return

Here Keil decided in Thumb mode to generate 0x200 from 0x4000 rather than loading it directly — it is more compact that way. Using ASRS (arithmetic shift right), the value is computed as 0x4000 ≫ 5.

ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

Listing 1.282: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

1
42 0C C0 E3   BIC R0, R0, #0x4200  ; clear bits covered by 0x4200 mask (bits 14 and 9 together)
2
01 09 80 E3   ORR R0, R0, #0x4000  ; set bit 14
3
1E FF 2F E1   BX LR                 ; return

The code executed by LLVM, if written as C code, would look something like this:

1
REMOVE_BIT(rt, 0x4200); // clear bits 14 and 9 together using combined mask
2
SET_BIT(rt, 0x4000);    // then set bit 14 back

It does exactly what we want. But why 0x4200? This is probably an artifact from the LLVM optimizer — likely a quirk in the compiler's optimizer, but the compiled code works correctly regardless.

ARM: more about the BIC instruction

Let's reformulate the example in a simpler way:

1
int f(int a)
2
{
3
    int rt = a;
4
    REMOVE_BIT(rt, 0x1234); // clear bits defined by mask 0x1234
5
    return rt;
6
}

Then Keil 5.03 optimizing in ARM mode produces:

1
f PROC
2
    BIC r0, r0, #0x1000     ; clear upper part of mask (0x1000)
3
    BIC r0, r0, #0x234      ; clear lower part of mask (0x234)
4
    BX  lr                  ; return
5
ENDP

Two BIC instructions — meaning the bits of 0x1234 were cleared in two steps. This is because 0x1234 cannot be encoded into a single BIC instruction, but 0x1000 and 0x234 can each be encoded separately.

ARM64: Optimizing GCC (Linaro) 4.9

Optimizing GCC targeting ARM64 can use the AND instruction instead of BIC:

Listing 1.283: Optimizing GCC (Linaro) 4.9

1
f:
2
    and w0, w0, -513        ; W0 &= 0xFFFFFFFFFFFFFDFF — clear bit 9
3
    orr w0, w0, 16384       ; W0 |= 0x4000 — set bit 14
4
    ret                     ; return

ARM64: Non-optimizing GCC (Linaro) 4.9

Non-optimizing GCC generates more redundant code, but it works exactly the same as the optimizing version:

1
f:
2
    sub     sp, sp, #32             ; allocate local stack space
3
    str     w0, [sp, 12]            ; spill argument a onto stack
4
    ldr     w0, [sp, 12]            ; reload a
5
    str     w0, [sp, 28]            ; rt = a
6
    ldr     w0, [sp, 28]            ; reload rt
7
    orr     w0, w0, 16384           ; W0 |= 0x4000 — set bit 14
8
    str     w0, [sp, 28]            ; store rt
9
    ldr     w0, [sp, 28]            ; reload rt
10
    and     w0, w0, -513            ; W0 &= 0xFFFFFDFF — clear bit 9
11
    str     w0, [sp, 28]            ; store rt
12
    ldr     w0, [sp, 28]            ; reload rt (return value)
13
    add     sp, sp, 32              ; deallocate local stack
14
    ret                             ; return

MIPS

Listing 1.285: Optimizing GCC 4.4.5 (IDA)

1
f:
2
; $a0 = a (input argument)
3
    ori     $a0, 0x4000         ; $a0 = a | 0x4000 — set bit 14 (ORI = OR with Immediate)
4
    li      $v0, 0xFFFFFDFF     ; load mask 0xFFFFFDFF into $v0 (cannot embed in AND directly)
5
    jr      $ra                 ; return (branch delay slot executes next instruction first)
6
    and     $v0, $a0, $v0       ; $v0 = (a | 0x4000) & 0xFFFFFDFF — clear bit 9, final result

ORI is of course OR. The "I" in the instruction name means the value is embedded (immediate) in the machine code. But then we have AND. There is no way to use ANDI because the number 0xFFFFFDFF cannot be embedded in a single instruction, so the compiler had to load 0xFFFFFDFF into register $v0 first and then generate an AND that takes all its values from registers.

1.28.3 Shifts

Bit shifts in C/C++ are performed using the ≪ and ≫ operators. The x86 architecture has the SHL (Shift Left) and SHR (Shift Right) instructions for this purpose. Shift instructions are used extensively in division and multiplication by powers of 2 (such as 1, 2, 4, 8, etc.).

Shift operations are also very important because they are heavily used for isolating a specific bit or for building a value from several scattered bits.

Share

If this article helped you, please share it with others!

CH1.28 Manipulating specific bit(s) (Part1)

https://v3nn00m.github.io/posts/re4b/chapter1_28_part1/

Author

0xV3n0m

Published at

2026-05-01

License

0xV3n0m's Personal Blog License

Some information may be outdated

CH1.27 Example: a bug in Angband

0xV3n0m

1.28 Manipulating specific bit(s)

1.28.1 Specific bit checking

How does CreateFile() check these flags?

Quick comparison — TEST vs AND, and their relation to CMP vs SUB

ARM

Quick comparison — TST vs TEST in practice

1.28.2 Setting and clearing specific bits

x86

Non-optimizing MSVC

x32dbg

Optimizing MSVC

Non-optimizing GCC

Optimizing GCC

Optimizing GCC and regparm

ARM + Optimizing Keil 6/2013 (ARM mode)

ARM + Optimizing Keil 6/2013 (Thumb mode)

ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode)

ARM: more about the BIC instruction

ARM64: Optimizing GCC (Linaro) 4.9

ARM64: Non-optimizing GCC (Linaro) 4.9

MIPS

1.28.3 Shifts

Table of Contents