Assembly Language Succinctly^®
by Christopher Rose

CHAPTER 7

Instruction Reference

The following instruction reference is intended to summarize some of the information in the Intel and AMD programmer's manuals, and provide an easy-to-reference but detailed description of the most common and useful instructions. Full details of all instructions can be found in the Intel and AMD manuals (see the Recommended Reading section for a link to these documents).

This reference covers only application programming instruction sets. System programming instructions (or privileged instructions) are not included, nor are instructions that are now obsolete and have been removed from x64 assembly. Instructions are not included even where they are still supported in compatibility mode (for instance the Binary Coded Decimal (BCD) instructions, etc.). Only the most common and useful instructions have been included, but there are many hundreds more.

CISC Instruction Sets

Modern x64 CPUs are CISC (Complex Instruction Set Computing), as opposed to RISC (Reduced Instruction Set Computing). This means there are a very large number of specialized instructions, which are almost useless for general purpose programming, but have been added to the instruction sets for particular purposes such as 3-D graphics algorithms, encryption, and others.

There is almost no consistent logic to the naming of the instructions because they have been added over several decades and belong to different instruction sets.

Many instructions require hardware support from the CPU, such as each of the SIMD instruction sets and the conditional moves. Please refer to the CPUID instruction for details on how to detect if hardware is capable of particular instructions.

Parameter Format

The following table lists the shorthand for instruction parameter types I have used in this reference:

Table 9: Shorthand Instruction Parameters

Shorthand	Meaning
reg	x86 register eax, ebx etc.
mmx	64-bit MMX register
xmm	128-bit SSE register
ymm	256-bit AVX register
mem	Memory operand, data segment variable
imm	Immediate value, literal or constant
st	Floating point unity register

It is very important to note that you can never use two memory operands as parameters for an instruction, despite what the parameter shorthand appears to state. For instance, one of the parameters to the MOV instruction can be a memory operand, but both of them cannot; one must be an immediate or a register. There is only one address generation unit per arithmetic logic unit. By the time the arithmetic logic unit has the instruction to execute, it can only generate at most one memory address.

Unless expressly stated otherwise, all parameters to an instruction must match in size. You cannot move a word into a dword nor a dword into a word. There is no notion of implicit casting in assembly. There are some instructions (for instance, the move and sign/zero extend instructions) that are designed to convert one data size to another and must necessarily take operands of differing sizes, but almost all other instructions adhere to this rule.

The possible sizes of the operands have been included in the shorthand for the instructions. Some instructions do not work for all sized operands. For example, the mnemonic and parameters for the conditional move instructions might look like this:

CMOVcc [reg16/32/64], [reg16/32/64/mem16/32/64]

This means the instructions take two operands, and each operand is in square braces, though in the code they are not surrounded by square braces unless they are a pointer. The first can be an x86 register of sizes 16 bits, 32 bits, or 64 bits, and the second can be another x86 register of the same size or a memory operand.

CMOVE ax, bx; This would be fine, CMOVcc [reg16], [reg16]

CMOVE al, bl; This will not work because AL and BL are 8 bit registers

Note: As mentioned previously, the high byte forms of the original x86 registers cannot be used with the low byte forms of the new x64 registers. Something like “MOV AH, R8B” will not compile, as it uses the high byte form AH along with a new byte form R8B. The high byte forms are included in x64 only for backwards compatibility. The CPU knows no machine code to achieve “MOV AH, R8B” in a single instruction.

Flags Register

Many of the x86 instructions alter the state of the flags register so that subsequent conditional jumps or moves can be used based on the results of the previous instructions. The flags registers abbreviations appear differently in Visual Studio compared to almost all other sources. The meaning of the flags and their abbreviations in this manual and Visual Studio's Register Window are as follows:

Table 10: Flags Register Abbreviations

Flag Name	Abbreviation	Visual Studio
Carry Flag	CF	CY
Parity Flag	PF	PE
Zero Flag	ZF	ZR
Sign Flag	SF	PL
Direction Flag	DF	UP
Overflow Flag	OF	OV

A flags field of carry, zero would mean that both the carry flag and the zero flag are altered by the instruction in some way. This means that all other flags are either not altered or undefined. It is not safe to trust that an instruction will not modify a flag when the flag is undefined. Where it is not obvious how an instruction would alter the flags, see the instruction's description for details. If more information is required on whether flags are modified or left undefined, see the programmer's manuals of your CPU manufacturer.

If an instruction does not affect the flags register (such as the MOV instruction), the flags field will appear as Flags: (None). If the flags field to an instruction is (None), then the instruction will not alter the flags register at all.

Almost all of the SIMD instructions do not modify the x86 flags register, so the flags field has been left out for their descriptions.

Prefixes

Some instructions allow prefixes that alter the way the instructions work. If an instruction allows prefixes, it will have a prefix field in its description.

Repeat Prefixes

The repeat prefixes are used for the string instructions to enable blocks of memory to be searched. They are set to a particular value or copied. They are not designed to be used with any other instructions, even where the compiler allows. The results of using the repeat prefixes are undefined when they are used with non-string instructions.

REP: Repeats the following instruction the number of times in RCX. REP is used in conjunction with store string (STOS) and move string (MOVS) instructions. Although this prefix can also be used with LODS, there is no point in doing this. Each repetition of the instruction following the REP prefix decrements RCX.
REPZ, REPE: Repeat while zero or repeat while equal are two different prefixes for exactly the same thing. This means repeat the following instruction while the zero flag is set to 1 and while RCX is not zero. As in the REP prefix, this prefix also decrements RCX at each repetition of the instruction following it. This prefix is used to scan arrays (SCAS instruction) and compare arrays (CMPS instruction)
REPNZ, REPNE: Repeat while not zero or repeat while not equal are the opposites of REPZ or REPE. This prefix will repeatedly execute the instruction while the zero flag is set to 0 and RCX is not 0. Like the other repeat prefixes it decrements RCX at each repetition. This prefix is used with the same instructions as the REPZ and REPE prefixes.

Lock Prefix

Assembly instructions are not atomic (they do not happen in a single uninterruptible move by the CPU) by default.

add dword ptr [rcx], 2

This sample code will result in what is called a read-modify-write operation. The original value in RAM that RCX is pointing to will be read, 2 will added, and the result will be written. There are three steps to this operation (read-modify-write). In multithreaded applications, while one thread is in the middle of this three step operation, another thread may begin reading, writing, or modifying exactly the same address. This could lead to the second thread reading the same value as the first and only one of the threads successfully writing the actual result of the value +2.

This is known as a race condition; threads are racing to read-modify-write the same value. The problem is that the programmer is no longer in control of which threads will successfully complete their instructions and which will overlap and produce some other results. If there are race conditions in a multithreaded application, then by definition the output of the code cannot be ascertained and is potentially any one of a number of scenarios.

The LOCK prefix makes the following instruction atomic; it guarantees that only one thread is able to operate on some particular point in RAM at a time. While only valid for instructions that reference memory, it prevents another thread from accessing the memory until the current thread has finished the operation. This assures no race conditions occur, but at the cost of speed. Adding the LOCK prefix to every line of code will make any threads that try to access the same RAM work in sequence, not parallel, thus negating the performance increase that would otherwise be gained through multithreading.

lock add dword ptr [rcx], 2

In the example, the LOCK prefix has been placed beside the instruction. Now no matter how many threads try to access this dword, whether they are running this exact code or any other code that references this exact point in RAM, they will be queued and their accesses will become sequential. This ADD instruction is atomic; it is guaranteed not to be interrupted.

The LOCK prefix is useful for creating multithreading synchronization primitives such as mutexes and semaphores. There are no such primitives inherent to assembly and programmers must create their own or use a library.

x86 Data Movement Instructions

Move

MOV [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32/64]

The MOV instruction copies data from the second operand to the first. Both operands must be the same size. Although its name suggests that data will be moved, the data is actually copied; it will remain in the second operand after the instruction.

The MOV instruction is the standard assignment operator.

// C++ assignment

rax = 25

; Assembly equivalent

mov rax, 25

Note: When the first operand is a 32-bit register, this instruction clears the top 32 bits of the 64-bit version of the register to 0. This leads to a special use for MOV in x64. When you wish to clear the top 32 bits of an x86 register (for example, RDX), you can use the 32-bit version of the register as both operands: mov edx, edx; Clears the top 32 bits of RDX to 0

Flags: (None)

Conditional Moves

CMOVcc [reg16/32/64], [reg16/32/64mem16/32/64]

This moves data from the second operand to the first operand, but only if the condition specified is true. This instruction reads the flags register to determine whether to perform a MOV or not. The condition code is placed in the mnemonic where the cc is; some common condition codes are listed in the following table. Simply replace the cc with the appropriate condition code to find the mnemonic you require.

Table 11: Some Useful Conditions for CMOVcc

Condition Code	Meaning
O	Overflow, signed overflow
NO	Not overflow, no signed overflow
Z or E	Zero or equal to, signed and unsigned
NZ or NE	Not zero or not equal to, signed and unsigned
B	Below, unsigned less than
A	Above, unsigned greater than
AE	Above or equal, unsigned
BE	Below or equal, unsigned
L	Less, signed less than
G	Greater, signed greater than
GE	Greater or equal, signed
LE	Less or equal, signed
C	Carry, unsigned overflow
NC	Not carry, no unsigned overflow
S	Sign, answer was negative
NS	Not sign, answer was positive
PE	Parity even, 1's count in low byte is even
PO	Parity odd, 1's count in low byte is odd

If the second is a memory location, it must be readable whether the instruction's condition is true or not. These instructions cannot be used with 8-bit operands, only 16 bits and above.

It is often better to use conditional moves in place of conditional jumps. Conditional moves are much faster than branching (using Jcc instructions). A modern CPU reads ahead of the code it is actually executing, so that it can be sure the next instructions have been fetched from RAM when it requires them. When it finds a conditional branch, it guesses which of the two is most likely using a manufacturer-specific algorithm called a branch predictor. If it guesses incorrectly, there is a large performance penalty. All the machine code it had read and attempted to execute needs to be flushed from the CPU, and it has to read the code from the actual branch. It is for this reason that CMOVcc instructions were invented and why they are often so much faster than Jcc instructions.

; To move data from AX to CX only if the signed value in DX is

; equal to the value in R8W:

cmp dx, r8w

cmove cx, ax ; Only moves if dx = rw8

; To move data from AX to CX only if the unsigned value in DX is

; above (unsigned greater than) the value in R8W:

cmp dx, r8w

cmova cx, ax ; Only moves if dx > r8w?

Note: With a similar behavior to that of the MOV instruction, this instruction clears the top 32 bits of the 64-bit version of first operand when the operands are 32-bit registers. Even if the condition is false, the top will be cleared to 0, while the low 32 bits will be unchanged. If the condition is true, the top will be cleared to 0 and the value of the second operand will be moved into the low 32 bits.

Flags: (None)

CPUID: Function 1; read bit 15 of EDX to ensure a CPU is capable of conditional moves.

Nontemporal Move

MOVNTI [mem32/64], [reg32/64]

The nontemporal move instruction moves a value from a register into memory and lets the CPU know that the value will not be needed in cache. Different CPUs will do different things based on this. The CPU may ignore the nontemporal hint completely, placing the value in cache regardless of your instruction. Some CPUs will use the nontemporal hint to ensure the data is not cached, thus allowing more space in cache for data that will be needed again in the near future.

Flags: (None)

CPUID: Function1; read bit 26 (SSE2) of EDX to ensure a CPU is capable of the MOVNTI instruction.

Move and Zero Extend

MOVZX [reg16/32/64], [reg8/16/mem8/16]

This moves the value from the second operand into the first operand, but extends it to the second operand's size by adding zeros to the left. The source operand can only be 8 bits or 16 bits wide and it can be extended to 16 bits, 32 bits, or 64 bits.

There is no limitation on the difference between the operands. This means you can use a byte as the second and extend it to a 64-bit qword.

Flags: (None)

Move and Sign Extend

MOVSX [reg16/32/64], [reg8/16/mem8/16]

This converts a smaller signed integer to a larger type by copying the smaller source value to the destination's low half, and then copying the sign of the source across the upper half of the destination. This instruction cannot sign extend from a 32-bit source to a 64-bit destination, which requires using the MOVSXD instruction instead.

There is no limitation on the difference between the operands, meaning one can use a byte as the second and extend it to a 64-bit qword.

Flags: (None)

Move and Sign Extend Dword to Qword

MOVSXD [reg64], [reg32/mem32]

Converts a 32-bit signed integer to a 64-bit signed integer. The source is moved into the low half of the destination and the sign bit of the source is copied across all bits of the destination.

Flags: (None)

Exchange

XCHG [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64]

This swaps or exchanges the values of the two operands. This instruction can be used in place of BSWAP for the 16-bit registers since BSWAP does not allow 8-bit operands; instead of bswap ax you can use xchg al, ah.

This instruction is automatically atomic (applies the LOCK prefix automatically) if a memory operand is used.

Flags: (None)

Prefix: LOCK

Translate Table

XLAT [mem8]

XLATB

This instruction translates the value in AL to that of the table pointed to by RBX. Point RBX to a byte array of up to 256 different values and set AL to the index in the array you want to translate.

This instruction does not affect the top 7 bytes of RAX; only AL is altered. The instruction accomplishes something like the otherwise illegal address calculation of adding RBX to AL.

mov al, byte ptr [rbx+al]

The memory operand version does exactly the same thing, and the memory operand is completely ignored. The only purpose of the memory operand is to document where RBX may be pointing. Do not be misled; no matter what the memory operand, the table is specified by RBX as a pointer.

XLAT myTable ; myTable is completely ignored, [RBX] is always used!

Flags: (None)

Sign Extend AL, AX, and EAX

CBW

CWDE

CDQE

These instructions sign extend the various versions of RAX to larger versions. The operand is implied, and it is always AL for CBW, AX for CWDE, and EAX for CDQE.

CBW copies the sign of AL across AH, effectively making AX the sign extended version of what was in AL. CWDE copies the sign of AX across the upper half of EAX, effectively sign extending from AX to EAX. CDQE sign extends EAX to RAX by copying the sign of EAX across the upper half of RAX, and sign extending EAX to RAX.

Flags: (None)

Copy Sign of RAX across RDX

CWD

CDQ

CQO

These instructions create the signed combination of RDX:RAX used by the division instructions IDIV and DIV. They copy the sign of AX, EAX, or RAX across the same sized version of the RDX register.

CWD copies the sign of AX across DX such that DX is FFFFh if AX was negative, or 0000h if AX was positive. This creates the 32-bit composite register DX:AX. CDQ copies the sign of EAX across EDX such that EDX becomes FFFFFFFFh if EAX is negative, and 0 if AX is positive. This creates the composite register EDX:EAX. CQO copies the sign of RAX across all bits of RDX. This creates the 128-bit composite register RDX:RAX.

Flags: (None)

Push to Data to Stack

PUSH [reg16/32/64/mem16/32/64/imm16/32/64/seg16]

This pushes a value to the stack. This results in the stack pointer being decremented by the size of the value in bytes and the value being moved into RAM. This is used to pass parameters between procedures, and also to save the return address prior to calling a procedure.

In addition to its role as the backbone to passing parameters, the stack is also used to save temporary values to free the register for some other use.

8-bit operands cannot be pushed to the stack, but you can push the segment registers FS and ES. Pushing an odd number of 16-bit values results in a misaligned stack pointer (one that is not on an address divisible by 32). You should always push an even number of values, as a misaligned stack pointer will result in a crash.

Flags: (None)

Pop Data from Stack

POP [reg16/32/64/mem16/32/64/seg16]

This pops data previously pushed onto the stack. This results in incrementing the stack pointer to point to the next data to be popped, and the last pushed data item being read from memory.

Flags: (None)

Push Flags Register

PUSHF

PUSHFQ

This pushes the flags register to the stack. You have the option of pushing only the low 16 bits (PUSHF) or the entire 64-bit rflags register (PUSHFQ). This instruction is useful for saving the exact state of the flags register prior to calling procedures, since the procedures will most likely alter its state. This instruction and the pop flags register instructions can also be used to set the bits of the flags register:

PUSHF ; Push the state of the flags register

POP AX ; Pop the flags register into ax

OR AX, 64 ; Set the bits using OR, BTS, BTR, or AND

PUSH AX ; Push the altered flags

POPF ; Pop the altered flags back into the real flags register

There are instructions to easily set and clear the carry and direction flags. See CLD, CLC, STC, and STD. Pushing and popping the flags register need not be used to set or clear these particular flags.

Flags: (None)

Pop Flags Register

POPF

POPFQ

This pops the values from the stack into the flags register. The flags register cannot be directly manipulated like the general purpose registers (excepting the CLD, CLC, and STC instructions). If you need to set particular bits of rflags, follow the example in the push flags register instruction.

Flags: Carry, Parity, Zero, Sign, Direction, Overflow

Load Effective Address

LEA [reg16/32/64], [mem]

This loads the effective address of the memory location into the source. If the source is 16 bits, only the lowest 16 bits of the address are loaded; if the source is 32 bits, then only the low 32 bits of the address are loaded into the source. Usually the source is 64 bits and the LEA instruction loads the entire effective address of the memory operand.

This instruction actually calculates an address and moves this into the source. It is similar to the MOV instruction but LEA does not actually read memory. It just calculates an address.

.data

myVar dq 23 ; Define a variable and set it to 23

.code

MyProc proc

mov rax, myVar; MOV will read the contents of myVar into RAX

lea rax, myVar; LEA loads the address of myVar to RAX

; From the LEA instruction RAX has the address of myVar

mov qword ptr [rax], 0 ; So we could set myVar to 0 like this

ret

MyProc endp

End

Note: Because this instruction actually just calculates an address and complex addressing modes (e.g. [RBX+RCX*8]), the instruction can be used, but it does not make any attempt to read from the address, and it can be used to perform fast arithmetic.

For example, to set RAX to 5 * RCX you could use the following:

LEA RAX, [RCX+RCX*4]

To set RBX to R9+12 you could use the following:

LEA RBX, [R9+12]

This optimization technique and a multitude more are detailed in Michael Abrash's Black Book of Graphics Programming (see the Recommended Reading section for a link).

Flags: (None)

Byte Swap

BSWAP [reg32/64]

This reverses the order of the bytes in the source. This instruction was designed to swap the endianness of values. That is, it is used to change from little endian to big endian and vice versa. With the dominance of x86-based CPUs (x86 uses little endian), the desire to change endianness is almost gone, so the instruction is also useful in reversing character strings.

Note: If you need to "BSWAP reg16" you should use XCHG instruction. The BSWAP instruction does not allow for 16-bit parameters, so instead of “BSWAP AX” you can use “XCHG AL, AH”.

Flags: (None)

x86 Arithmetic Instructions

Addition and Subtraction

ADD [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

SUB [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

This adds or subtracts the second operand from the first and stores the result in the first operand. For addition it does not matter, but when using SUB it is important to note that the second operand is subtracted from the first, not the other way round.

These instructions can be used for both signed and unsigned arithmetic; it is important to know how to read the flags, since the flags reflect both the signed and unsigned result.

If you are doing unsigned arithmetic, you should read the carry flag. The carry flag will be 0 if there was no final overflow. If there was a final overflow (indicating a carry or borrow on the final bit of the operation), it will be set to 1.

If you are doing signed arithmetic, or if there was no final carry or borrow on the second to last bit of the operation (since the final bit is the sign bit), the overflow flag will be 0. If there was a final carry or borrow, the overflow flag will be set to 1.

If the result of the addition or subtraction is exactly 0, then the zero flag will be set.

If the result was a negative number (this can be ignored if unsigned arithmetic is being done), then the sign flag will be set.

Flags: Carry, Parity, Zero, Sign, Overflow

Prefix: LOCK

Add with Carry and Subtract with Borrow

ADC [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

SBB [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

These instructions do the same as the ADD and SUB instructions, except that they also add or subtract the carry flag (they add or subtract an additional 1 or 0 depending on the state of the carry flag). They are useful for performing arbitrarily large integer additions or subtractions where the number being worked on does not fit inside a 64-bit register, but is broken into multiple 64-bit digits.

You can also use these instructions to set a register to the carry flag. To set EAX to the carry flag you can use the following:

MOV EAX, 0 ; Clear EAX to 0.

; You can't use XOR here because that would clear

; the carry flag to 0.

ADC EAX, EAX ; Sets EAX to 1 or 0 based on the carry flag

Flags: Carry, Parity, Zero, Sign, Overflow

Prefix: LOCK

Increment and Decrement

INC [reg8/16/32/64/mem8/16/32/64]

DEC [reg8/16/32/64/mem8/16/32/64]

These instructions increment (add 1 to) or decrement (subtract 1 from) a register or memory variable. They are often used in conjunction with a register to create looping structures. A common pattern is something like the following which will loop 100 times:

mov cx, 100 ; Number of times to loop

LoopHead: ; Start of the loop

; Body of the loop

dec cx ; Decrement counter

jnz LoopHead ; Loop if there's more, i.e. 100 times

Note: INC and DEC do not set the carry flag. If you need to perform INC or DEC but also set the carry flag, it is recommended to use ADD or SUB with 1 as the second operand.

Flags: Parity, Zero, Sign, Overflow

Prefix: Lock

Negate

NEG [reg8/16/32/64/mem8/16/32/64]

This negates a signed number so that negative values become their positive counterparts and vice versa. This is the equivalent to flipping each of the bits and adding 1 to the result. This is called two’s complement of a number, as opposed to the one’s complement, which can be obtained with the NOT instruction.

Note: x86 and x64 CPUs perform multiplication slowly compared to many of the bit manipulation instructions; if you need to multiply a number by -1 it is always quicker to use NEG than IMUL.

Flags: Carry, Parity, Zero, Sign, Overflow

Prefix: LOCK

Compare

CMP [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

This compares the two operands and sets the flags register to indicate the relationship between the two operands.

This instruction actually does exactly the same as the SUB instruction, but it does not store the result, it just sets the flags. The second operand is subtracted from the first operand and the flags are set accordingly, but the destination operand is not altered. Usually the compare instruction is followed by conditional jumps or conditional moves.

This instruction is used to set the flags and subsequently perform some conditional operation based on the results. It is very important to note how the operands are being compared by the CMP instruction, since comparisons such as >, >=, <, and <= are important to the order of the operands.

cmp dx, ax

jg SomeLabel ; Jump if DX > AX

Note: CMP op1, op2 is the same as asking, “what relation does the first operand have to the second,“ not the other way round. The second operand is subtracted from the first.

Flags: Carry, Parity, Zero, Sign, Overflow

Multiply

MUL [reg8/16/32/64]

IMUL [reg8/16/32/64]

IMUL [reg8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm]

IMUL [reg8/16/32/64], [reg8/16/32/64/mem8/16/32/64], [imm8]

MUL performs unsigned integer multiplication and IMUL performs signed integer multiplication.

There is only a single-operand version of MUL, whereas IMUL has three versions. In the single-operand version of IMUL or MUL, the second operand is implied and the answer is stored in predefined implied registers. The implied second operand is the appropriate size of the RAX register, so if the operand is 8 bits, then the second implied operand is AL. If the source operand is 64 bits then the implied second operand is RAX.

The answer to the multiplication is stored in AX for 8-bit multiplications. For the other data sizes (16-bit, 32-bit, and 64-bit operands), the answer is stored with the upper half in the appropriate size of RDX and the lower half in the appropriate size of RAX. This is because the original 16-bit CPUs did not possess registers large enough to store the possible 32-bit result from a 16-bit multiplication, so the composite 32-bit of DX:AX was used. When 32-bit CPUs came about, exactly the same thing happened. The 64-bit answer from a 32-bit multiplication could not be stored in a 32-bit register, so the composite of EDX:EAX is used. And now with our 64-bit CPUs, the 128-bit answer is stored in the composite of RDX:RAX.

If anything ends up in the top half of the answer (AH, DX, EDX, or RDX), then the carry and overflow flags are set to 1, otherwise they are 0.

Table 12

Operand 1	Implied Operand 2	Answer
8 bits	AL	AX
16 bits	AX	DX:AX
32 bits	EAX	EDX:EAX
64 bits	RAX	RDX:RAX

The two-operand version of IMUL simply multiplies the second operand by the first and stores the result in the first. The overflow (any bits from the result that do not fit into the first operand) are lost and the carry and overflow flags are set to 1. If there is no overflow, the entire answer fits into the first operand and the carry and overflow flags are set to 0.

In the three-operand version of IMUL, the second operand is multiplied by the third operand (an immediate value) and the result is stored in the first operand. Once again, if the result overflows, both the carry and overflow flags are set to 1, otherwise they are cleared.

Note: These instructions are quite slow, so if it is possible it may be quicker to swap a multiplication for a shift (SHL) or use the LEA instruction.Flags: Carry, Overflow

Signed and Unsigned Division

DIV [reg8/16/32/64/mem8/16/32/64]

IDIV [reg8/16/32/64/mem8/16/32/64]

Unlike IMUL, there are only single-operand versions of the division instructions. DIV divides unsigned integers and IDIV divides signed integers. These instructions return both the quotient and remainder of the division.

The single operand given to the instruction is the divisor (the y in x/y of the division). The dividend (the x in the x/y division) is implied. See the examples in Table 13 for the location of the implied dividend. The quotient ends up in the appropriate size of RAX and the remainder goes in RDX.

Note: Division has always been one of the slowest instructions (perhaps 30–40 times slower than addition). This is still the case today. If possible, division should be avoided completely in tight loops. If a number is to be divided by a power of 2, use the SAR (Arithmetic Shift Right) instead of signed division and SHR instead of unsigned. If there are many divisions to be performed, consider using SSE.

Note: Be very careful about what is in RDX. If the number being divided is small enough to fit entirely in the appropriate size of RAX, you must remember RDX! Either clear RDX using XOR for unsigned division or copy RAX's sign across it using CWD, CDQ, or CQO.

For example, if we wanted to calculate 100/43 using signed dwords (this code would work for -100/43 as well), use something like the following:

mov eax, 100 ; Move implied dividend into EAX

mov ecx, 43 ; Move divisor into ECX

cdq ; Copy sign of EAX across EDX

idiv ecx ; Perform division, EAX gets quotient, EDX gets remainder!

Table 13: Summary of Divide Instruction Operands and Results

Operand 1 (Divisor)	Implied Dividend	Quotient	Remainder
8 bits	AX	AL	AH
16 bits	DX:AX	AX	DX
32 bits	EDX:EAX	EAX	EDX
64 bits	RDX:RAX	RAX	RDX

Flags: None (All flags are undefined after a divide!)

x86 Boolean Instructions

Boolean And, Or, Xor

AND [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

OR [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

XOR [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

These instructions AND, OR, or XOR the operands and store the result in the first operand. Each pair of bits (one from the source and the corresponding one from the destination) has the operation applied and the answer stored exactly the same as C++ Boolean operations.

Table 14: AND Truth Table

Bit 1	Bit 2	Result
0	0	0
0	1	0
0	0	0
1	1	1

Table 15: OR Truth Table

Bit 1	Bit 2	Result
0	0	0
0	1	1
0	0	1
1	1	1

Table 16: XOR Truth Table

Bit 1	Bit 2	Result
0	0	0
0	1	1
0	0	1
1	1	0

The overflow and carry flags are cleared to 0 while the sign and zero flags indicate the result.

Note: Traditionally the XOR instruction is faster than MOV so programmers usually use XOR to clear a register to 0. If both operands to an XOR have exactly the same value, then XOR returns 0, so to clear RAX to 0 “XOR RAX, RAX” is more common than “MOV RAX, 0” even though today's CPUs probably perform the XOR no faster.

Flags: Carry, Parity, Zero, Sign, Overflow

Prefix: LOCK

Boolean Not (Flip Every Bit)

NOT [reg8/16/32/64/mem8/16/32/64]

This instruction flips every bit in the operand given such that zeroes become ones and ones become zeroes. It is the bitwise or Boolean NOT and is sometimes called the one's complement, as opposed to the NEG instruction, which returns the two's complement.

Flags: (None)

Prefix: LOCK

Test Bits

TEST [reg8/16/32/64/mem8/16/32/64], [reg8/16/32/64/mem8/16/32/64/imm8/16/32]

This instruction is to bitwise tests as CMP is to arithmetic tests. It performs a Boolean AND instruction between the source and destination, but does not set the result in the destination. Instead it just alters the flags.

The carry flag is always reset to 0, the parity flag is set, and the zero and sign flags reflect the result of the Boolean AND.

For example, if you wish to know if any bits in the third byte of EAX are set to 1, you could use TEST as follows:

test eax, 00ff0000h ; 00ff0000h is only 1's in the 3rd byte

jnz ThereAreOnes ; If zero flag isn't set, EAX has something in 3rd byte

jz ThirdByteIsClear ; If zero flag is set then EAX has nothing in 3rd byte

If you wish to test whether RDX contains an even number, you can employ the TEST instruction as follows:

test rdx, 1 ; Is the first bit 1?

jz EvenNumber ; If it is not, the number is even

jnz OddNumber ; Otherwise the number in RDX is odd

Flags: Carry, Parity, Zero, Sign, Overflow

Shift Right and Left

SHL [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

SHR [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

SAR [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

This shifts the bits in the first operand by the amount specified in the second operand. These instructions shift the bits left (SHL), right (SHR), or arithmetically right (SAR). The second operand can be the CL register or an immediate 8-bit value (there is also a special version of this instruction when this operand is the immediate value 1).

SHL can be used to multiply a signed or unsigned number by a power of 2. SHR can be used to divide an unsigned number by a power of 2.

shl rax, 5 ; RAX = RAX * (2 ^ 5)

shr rdx, 3 ; RDX = RDX / (2 ^ 3) where RDX is unsigned, use SAR for signed

With the SHL and SHR instructions, the vacated bits on the right and left side are filled with 0 just as the shift operations in C++. The arithmetic right shift (SAR) shifts the bits right, but fills the vacant bits on the left with the sign bit, so it can be used to divide a signed number by a power of 2.

If the second operand is 0 (whether it is immediate or CL) the flags will not be set.

If the shift is not zero, then the flags are affected. The carry flag holds the final bit that was shifted out of the destination.

Flags: Carry, Parity, Zero, Sign, Overflow

Rotate Left and Right

ROL [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

ROR [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

This rotates the first operand by the number of bits specified in the source. The rotate operation is the same as bit shifting, only as bits are shifted out on the right (ROR) they reenter on the left, or as bits are shifted out on the left (ROL) they reenter on the right.

Note: There are special versions of these rotate and shift instructions. If the immediate operand is used and it is exactly 1, the overflow flag is set. This indicates whether the sign of the first operand has changed. If the overflow flag is 1 after the instruction then the sign of the destination operand has been changed, otherwise it has stayed the same.

Flags: Carry, Overflow

Rotate Left and Right Through the Carry Flag

RCL [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

RCR [reg8/16/32/64/mem8/16/32/64], [CL/imm8]

This rotates the destination left (RCL) or right (RCR) through the carry flag. These instructions are the same as the ROL and ROR rotate instructions, only they also rotate the carry flag from the flags register as if it was part of the destination.

For RCL (rotate left through the carry flag), the register being rotated can be thought of as having the carry flag as its ninth bit (most significant). For RCR (rotate right through the carry flag), the register being rotated has the carry flag as the first bit (least significant).

Flags: Carry, Overflow

Shift Double Left or Right

SHLD [reg/mem16/32/64], [reg16/32/64], [CL/imm8]

SHRD [reg/mem16/32/64], [reg16/32/64], [CL/imm8]

This shifts the first operand left (SHLD) or right (SHRD), and shifts in the bits of the second operand from the left (SHRD) or right (SHLD). The number of bits to shift is specified in the third operand. This instruction lets you shift the contents of a register into another register or memory location. The instruction does not alter the second operand.

Note: There is no version of these instructions that take 8-bit operands; if an 8-bit SHLD or SHRD is required, you should use one of the 16-bit x86 registers. For example, you can use AX to shift the bits from AL to and from the bits in AH.

Flags: Overflow, Sign, Zero, Parity, Carry

Bit Test

BT [reg16/32/64/mem16/32/64], [reg16/32/64/imm8]

BTC [reg16/32/64/mem16/32/64], [reg16/32/64/imm8]

BTR [reg16/32/64/mem16/32/64], [reg16/32/64/imm8]

BTS [reg16/32/64/mem16/32/64], [reg16/32/64/imm8]

This copies the bit at the zero-based index specified by the second operand from the first operand into the carry flag.

bt eax, 4 ; Copy the 4th bit of EAX into the Carry Flag

A special version of these instructions is used when the first operand is memory and the second is a register. In this case, the entirety of RAM becomes a bit array instead of the regular byte array! The parameter passed becomes the base of the bit array (its zero bit, the rightmost, is the start of the bit array whose expanse is the remainder of RAM). All the rules for accessing memory still apply and segmentation faults will be generated.

mov eax, 27873 ; We wish to know what the 27873th bit is.

bt MyVariable, eax ; Beginning from rightmost bit in MyVariable.

BT tests the bit and copies its value to the carry flag. BTC tests the bit and then complements it in the first operand. BTR tests the bit and then resets it to 0 in the first operand. BTS tests the bit and then sets it to 1 in the first operand.

Flags: Carry (all others except for direction are undefined)

Prefix: LOCK (But not on BT since it cannot write to memory)

Bit Scan Forward and Reverse

BSF [reg16/32/64], [reg16/32/64/mem16/32/64]

BSR [reg16/32/64], [reg16/32/64/mem16/32/64]

This searches the second operand right to left (forward, BSF) or left to right (reverse, BSR) for the first bit set to 1. If a bit is found set to 1, the first operand is set to its index and the zero flag is cleared. If there is no bit set to 1 at all in the second operand, the zero flag is set to 1.

The bit indices do not change regardless of the scan's direction. If there is only one bit set in the operand, both BSF and BSR will return exactly the same value. If there is more than one bit set, they will return different values.

mov ax, 2

bsf bx, ax ; Places 1 into bx

bsr bx, ax ; Places 1 into bx

Flags: Zero (all the rest except for direction are undefined)

Conditional Byte Set

SETO [reg8/mem8]	Overflow OF = 1
SETNO [reg8/mem8]	Overflow OF = 0
SETB, SETC, SETNAE [reg8/mem8]	Below, carry CF = 1
SETNB, SETNC, SETAE [reg8/mem8]	Above or equal, carry CF = 0
SETZ, SETE [reg8/mem8]	Equal, zero ZF = 1
SETNZ, SETNE [reg8/mem8]	Not equal, zero ZF = 0
SETBE, SETNA [reg8/mem8]	Below or equal, CF = 1 or ZF = 1
SETNBE, SETA [reg8/mem8]	Above, CF = 0 and ZF = 0
SETS [reg8/mem8]	Sign SF = 1
SETNS [reg8/mem8]	Sign SF = 0
SETP, SETPE [reg8/mem8]	Parity is even PF = 1
SETNP, SETPO [reg8/mem8]	Parity is odd PF = 0
SETL, SETNGE [reg8/mem8]	Less than SF <> OF
SETNL, SETGE [reg8/mem8]	Not less than SF = OF
SETLE, SETNG [reg8/mem8]	Less or equal ZF = 1 or SF <> OF
SETNLE, SETG [reg8/mem8]	Greater than ZF = 0 and SF <> OF

These instructions set the operand to 0 or 1 based on whether the flags meet the specified condition. The destination becomes 1 if the condition is met, otherwise it becomes 0. The conditions all reference the flags so this instruction is usually placed after a CMP or TEST; it is similar to the CMOVcc instructions, only it moves 0 or 1 instead of moving the second operand into the first like the CMOVcc instructions.

Flags: (None)

Set and Clear the Carry or Direction Flags

STC	Set carry flag CF = 1
CLC	Clears the carry flag to 0
STD	Set direction flag DF = 1
CLD	Clears the direction flag

STC sets the carry flag to 1 while CLC clears it to 0. Likewise, STD sets the direction flag to 1 while CLD clears it to 0. Setting or clearing the direction flag is useful for setting the direction the string instructions move their automatic pointers.

Flags: Carry (STC and CLC), Direction (STD and CLD)

Jumps

JMP	Unconditionally jump
JO	Jump on overflow
JNO	Jump on no overflow
JB,JC,JNAE	Jump if below, CF = 1, not above or equal
JNB,JNC,JAE	Jump if not below, CF = 0, above or equal
JZ,JE	ZF = 1, jump if equal
JNZ,JNE	ZF = 0, jump if not equal
JBE,JNA	Jump if below or equal, not above, CF or ZF = 1
JNBE,JA	Jump if not below or equal, above, CF and ZF = 0
JS	Jump on sign, SF = 1
JNS	Jump on no sign, SF = 0
JP,JPE	Jump on parity, parity even, PF = 1
JNP,JPO	Jump on no parity, parity odd, PF = 0
JL,JNGE	Jump if less, not greater or equal, SF != OF
JNL,JGE	Jump if not less, greater, or equal, SF = OF
JLE,JNG	Jump if less or equal, not greater than, ZF = 1 or SF != OF
JNLE,JG	Jump if not less or equal, greater than, ZF = 0 and SF = OF
JCXZ	Jump if CX = 0
JECXZ	Jump if ECX = 0
JRCXZ	Jump if RCX = 0

Each of the jump instructions takes a single operand. This operand is usually a label defined somewhere in the code but it is actually fairly flexible. The addressing modes available to the Jxx instructions are as follows:

JMP [reg/mem/imm]

Jcc [imm8/16/32]

JcCX [imm/8/16/32]

The instructions are sometimes called branching; the RIP register will fall through to the operand if the condition is true, otherwise the RIP will fall through to the next line of code.

Usually the operand is a label.

cmp edx, eax

jg SomeLabel ; Jump if greater

; Some code to skip

SomeLabel:

Flags: (None)

Call a Function

CALL [reg16/32/64/mem16/32/64]

CALL [imm16/32]

This calls a procedure. This instruction pushes the offset of the next instruction to the stack and jumps the RIP register to the procedure or label given as the first operand. It is essentially exactly the same as a jump instruction, only the return address is pushed to the stack so the RIP can return and resume execution of the calling function using a RET instruction from within the body of the subprocedure.

Note: There used to be a distinction between near and far calls. Far calls ended up in another code segment. However, since x64 uses a flat memory model, all calls are near calls.

Flags: (None)

Return from Function

RET

This instruction returns from a function called with the CALL instruction. This is achieved by popping from the return address into the RIP.

Flags: (None)

x86 String Instructions

Load String

LODS [mem8/16/32/64]
LODSB	Load byte
LODSW	Load word
LODSD	Load dword
LODSQ	Load qword

These instructions load a byte, word, dword, or qword into the appropriate size of the RAX register, and then they increment (or decrement depending on the direction flag) RSI to point to the next byte, word, dword, or qword. They read whatever RSI (the source index register) is pointing to in RAX and then move RSI to point to the next data of the same size.

The REP prefix can be used, but it is pointless since no operation can be performed on the consecutive values being stored in RAX; the loop will simply run through the string and leave you with only the final value in RAX.

Note: Even the versions with a memory operand read only from whatever RSI is pointing to. The memory operand is almost completely ignored. Its only purpose is to indicate both what size data should be read and into what version of RAX it should be placed.

Note: If the direction flag, DF, is 1 as set by the STD instruction, the string instructions will decrement RDI and RSI instead of incrementing. Otherwise the instruction will increment.

Flags: (None)

Prefix: REP

Store String

STOS [mem8/16/32/64]
STOSB	Store byte
STOSW	Store word
STOSD	Store dword
STOSQ	Store qword

This stores AL, AX, EAX, or RAX to the memory pointed to by RDI and increments (or decrements depending on the direction flag) RDI. This instruction can be used to quickly set a large number of values to the same thing. RDI is incremented or decremented by the size of the data type each repetition.

To set 100 words to 56, make sure RDI is pointing to the start of the 100 words in memory.

lea rdi, someArray ; Point RDI to the start of the array

mov rcx, 100

mov ax, 56

rep stosw

Note: Even the versions with a memory operand only store to RDI. The memory operand is almost completely ignored. The memory operand‘s only purpose is to indicate which of AL, AX, EAX, or RAX should be stored and how much to increment RDI.

Note: If the direction flag, DF, is 1 as set by the STD instruction, the string instructions will decrement RDI and RSI instead of incrementing. Otherwise the instruction will increment.

Flags: (None)

Prefix: REP

Move String

MOVS [mem8/16/32/64], [mem8/16/32/64]

MOVSB

MOVSW

MOVSD

MOVSQ

This moves the byte, word, dword, or qword pointed to by RSI to that pointed to by RDI and increments both RSI and RDI to point to the next (or decrements depending on the direction flag). Both RSI and RDI are incremented by the size of the data type each repetition. This instruction can be used to quickly move data from one array to another. Set RSI at the start of the source array and RDI to the start of the destination and place the number of elements to copy into RCX.

lea rsi, SomeArray

lea rdi, SomeOtherArray

mov rcx, 1000

rep movsq ; Copy 8000 byes

Note: Even the versions with memory operands copy data from RSI to RDI; the memory operand's only use is to specify the size of the data to copy.

Note: If the direction flag, DF, is 1, as set by the STD instruction, the string instructions will decrement RDI and RSI instead of incrementing. Otherwise the instruction will increment.

Prefix: REP

Scan String

SCAS [mem8/16/32/64], [mem8/16/32/64]

SCASB

SCASW

SCASD

SCASQ

This compares the byte, word, dword, or qword pointed to by RDI to the appropriate size of RAX and sets the flags accordingly. It then increments (or decrements depending on the direction flag) RDI to point to the next element of the same size in RAM. This instruction is meant to be used with the REPE, Z, NE, and NZ prefixes, and it scans a string for the element in RAX or until the count in RCX reaches 0.

To scan an array of bytes up to 100 bytes to find the first occurrence of the character “a,” use the following:

lea rdi, arr ; Point RDI to some array

mov rcx, 100 ; Load max into RCX

mov al, 'a' ; Load value to seek into AL

repnz scasb ; Search for AL in *RDI

jnz NotFound ; If the zero flag is not set after the

; scan AL is not in arr

lea rax, [arr+1] ; Otherwise we can find the index of the

; first occurrence of AL

sub rdi, rax ; By subtracting arr+1 from the address where we found AL

Note: Even the versions with a memory operand scan only whatever RDI is pointing to. The memory operand is almost completely ignored. The memory operand’s only purpose is to indicate which of AL, AX, EAX, or RAX should be compared to RDI and how much to increment RDI.

Note: If the direction flag, DF, is 1 as set by the STD instruction, the string instructions will decrement RDI and RSI instead of incrementing. Otherwise the instruction will increment.

Flags: Overflow, Sign, Zero, Parity, Carry

Prefix: REPE, REPZ, REPNE, REPNZ

Compare String

CMPS [mem8/16/32/64], [mem8/16/32/64]

CMPSB

CMPSW

CMPSD

CMPSQ

These instructions compare the data pointed to by RSI to the data pointed to by RDI, and set the flags accordingly. They increment (or decrement depending on the direction flag) RSI and RDI depending on the operand size. They can be used to scan *RSI and *RDI for the first byte, word, dword, or qword that is different or the first that is the same between the two arrays.

Note: Even the versions with a memory operand compare only RSI to RDI. The memory operand is almost completely ignored. The memory operand‘s only purpose is to indicate how much RDI and RSI should be incremented or decremented each round.

Note: If the direction flag, DF, is 1 as set by the STD instruction, the string instructions will decrement RDI and RSI instead of incrementing. Otherwise the instruction will increment.

Prefix: REPE, REPZ, REPNE, REPNZ

x86 Miscellaneous Instructions

No Operation

NOP

This instruction does nothing but wait for a clock cycle. However, it is useful for optimizing pipeline usage and patching code.

Flags: (None)

Pause

This instruction is similar to NOP, but it also indicates to the CPU that the thread is in a spin loop so that the CPU can use any power-saving features it has.

Flags: (None)

Read Time Stamp Counter

RDTSC

This instruction loads the time stamp counter into EDX:EAX. The time stamp counter is the number of clock cycles that have elapsed since the CPU was reset. This is useful for getting extremely fine grained timing readings.

The following could be a small function to read the time stamp counter:

ReadTimeStamp proc

rdtsc

shl rdx, 32

or rax, rdx

ret

ReadTimeStamp endp

Getting performance readings at the level of single clock cycles is difficult, since Windows is constantly switching between the running applications and multitasking. The best thing to do is run tests repeatedly. You should test how long the ReadTimeStamp procedure takes, and subtract this from subsequent tests, and then take the average or best clock cycle readings as the benchmark.

Flags: (None)

Loop

LOOP [Label]

LOOPE [Label]

LOOPNE [Label]

LOOPZ [Label]

LOOPNZ [Label]

These will decrement the RCX counter and jump to the specified label if a condition is met. For example, the LOOP instruction decrements RCX and repeats from the label until it is 0. Then it does not branch, but falls through to execute the code following the loop. The LOOP instructions are almost never used, because the manual decrement and jump is faster.

dec rcx

jnz LoopTop

In addition to being faster, the manual two-line version allows the programmer to specify which register is used as the counter where LOOPxx makes use of RCX.

The LOOP instructions are interesting, but they do not set the flags register at all where the manual DEC and JNZ does. When RCX reaches 0 in the LOOP, the RIP register will fall through, but the zero flag will not be altered from the last setting it had in the body of the loop.

If it is important for a loop’s structural components not to alter the flags register, then using the LOOP instruction in place of the manual two-line loops may be worth investigating. With LOOPE and LOOPZ, if the zero flag is 1, the loop falls through. With LOOPNE and LOOPNZ, if the zero flag is 0, the loop falls through.

The loops can be broken either by RCX becoming 0 or on the condition of the zero flag. This may lead to some confusion. If the zero flag happens to be set during the first iteration of a long loop, then the LOOPE instruction will not decrement RCX and repeat the loop. The loop will break on the condition of the zero flag.

As mentioned previously, the LOOP instructions are often not used. They are slower than the two-line manual loop tail in the last sample because they do more than simply DEC the counter and jump. If the LOOP instructions happen to perform exactly what you need, they may give a good performance increase as opposed to checking the zero flag. However, in the vast majority of cases a simple DEC and JNZ will be faster.

CPUID

MOV EAX, [function] ; Move the function number into EAX first

CPUID

This instruction returns information on the executing CPU, including data on the CPU vendor, cache size, number of cores, and the available instruction sets.

The CPUID instruction itself may not be available on older CPUs. The recommended method for testing if the CPUID instruction can be executed is to toggle the 21st bit of the flags register. If this bit can be set to 1 by the program, then the CPU understands the CPUID instruction. Otherwise, the CPU does not understand the CPUID instruction.

The following is an example of testing for the availability of the CPUID instruction:

pushfq ; Save the flags register

push 200000h ; Push nothing but bit 21

popfq ; Pop this into the flags

pushfq ; Push the flags again

pop rax ; This time popping it back into RAX

popfq ; Restore the original flag's state

cmp rax, 0 ; Check if our bit 21 was changed or stuck

je No_CPUID ; If it reverted back to 0, there's no CPUID

Not all CPUs are able to execute all instructions. Modern CPUs are usually capable of executing more instruction sets than older ones. In order to know if the CPU executing your code is aware of any particular instruction set, you can call the special CPUID instruction.

The CPUID instruction takes no operands, but EAX is implied. The value in EAX when the instruction is called is read by the CPU as the function number.

There is a great number of different functions, and each CPU manufacturer is able to define its own. Manufacturer-specific functions usually have the top 16 bits of EAX set to 8000, for example. The functions for determining many instruction sets are standard across Intel, AMD, and VIA.

To call a particular function, first MOV the function number into EAX and then use the CPUID instruction.

mov eax, 1 ; Function number 1

cpuid ; No formal parameters but EAX is implied!

CPUID function 1 (calling CPUID when EAX is set to 1) lists the feature identifiers; feature identifiers are the instruction sets that the CPU knows. It lists the possible instruction sets by storing a series of bit flags in ECX and EDX. Bits are set to 1 to indicate that the CPU is capable of a particular feature, and 0 to indicate that it is not. In the following table, some of the most useful features have been listed with the register (ECX or EDX) and the bit number to check for the feature. There are many more features with additional features added with each new generation of CPU.

Table 17: Abridged Feature Identifiers

Function Number (EAX)	Register (ECX/EDX)	Bit Index in ECX/EDX	Feature
1	ECX	28	AVX
1	ECX	25	AES
1	ECX	23	Pop Count, POPCNT
1	ECX	20	SSE4.2
1	ECX	19	SSE4.1
1	ECX	9	SSSE3
1	ECX	0	SSE3
1	EDX	26	SSE2
1	EDX	25	SSE
1	EDX	23	MMX
1	EDX	15	Conditional Moves
1	EDX	4	RDTSC
1	EDX	0	x87 FPU

The following example tests for MMX and SSE4.2. In the assembly file, the register (ECX or EDX) and the bit number can be changed to test for any feature. For a full list of what CPUID can do on AMD chips, consult CPUID Specification by AMD. For a full list of what CPUID can do on Intel chips, consult Intel Processor Identification and the CPUID Instruction by Intel. Links to the manuals and other recommended reading can be found at the end of this book.

// This is the C++ file

#include <iostream>

using namespace std;

extern "C" bool MMXCapable();

extern "C" bool SSE42Capable();

int main()

{

if(MMXCapable()) cout<<"This CPU is MMX capable!"<<endl;

else cout<<"This CPU does not have the MMX instruction set :("<<endl;

if(SSE42Capable()) cout<<"This CPU is SSE4.2 capable!"<<endl;

else cout<<"This CPU does not have the SSE4.2 instruction set"<<endl;

cin.get();

return 0;

}

; This is the assembly file

.code

; bool MMXCapable()

; Returns true if the current CPU knows MMX

; else returns false

MMXCapable proc

mov eax, 1 ; Move function 1 into EAX

cpuid ; Call CPUID

shr edx, 23 ; Shift the MMX bit to position 0

and edx, 1 ; Mask out all but this bit in EDX

mov eax, edx ; Move this answer, 1 or 0, into EAX

ret ; And return it

MMXCapable endp

; bool SSE42Capable()

; Returns true if the current CPU knows SSE4.2

; else returns false

SSE42Capable proc

mov eax, 1 ; Move function 1 into EAX

cpuid ; Call CPUID

shr ecx, 20 ; Shift SSE4.2 bit to position 0

and ecx, 1 ; Mask out all but this bit

mov eax, ecx ; Move this bit into EAX

ret ; And return it

SSE42Capable endp

end

Note: It was common in the past to simply call an instruction and let an exception be thrown by the CPU if the instruction set was not supported. There is a slight possibility that a given machine code will execute on an older CPU without throwing an exception, but it will actually execute some other instruction. For this reason, the CPUID instruction is the recommended method for testing if instruction sets are available.

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now

Instruction Reference

CISC Instruction Sets

Parameter Format

Flags Register

Prefixes

Repeat Prefixes

Lock Prefix

x86 Data Movement Instructions

Move

Conditional Moves

Nontemporal Move

Move and Zero Extend

Move and Sign Extend

Move and Sign Extend Dword to Qword

Exchange

Translate Table

Sign Extend AL, AX, and EAX

Copy Sign of RAX across RDX

Push to Data to Stack

Pop Data from Stack

Push Flags Register

Pop Flags Register

Load Effective Address

Byte Swap

x86 Arithmetic Instructions

Addition and Subtraction

Add with Carry and Subtract with Borrow

Increment and Decrement

Negate

Compare

Multiply

Signed and Unsigned Division

x86 Boolean Instructions

Boolean And, Or, Xor

Boolean Not (Flip Every Bit)

Test Bits

Shift Right and Left

Rotate Left and Right

Rotate Left and Right Through the Carry Flag

Shift Double Left or Right

Bit Test

Bit Scan Forward and Reverse

Conditional Byte Set

Set and Clear the Carry or Direction Flags

Jumps

Call a Function

Return from Function

x86 String Instructions

Load String

Store String

Move String

Scan String

Compare String

x86 Miscellaneous Instructions

No Operation

Pause

Read Time Stamp Counter

Loop

CPUID

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.