Assembly Language Succinctly^®
by Christopher Rose

CHAPTER 6

C Calling Convention

A calling convention is a set of steps that must be undertaken by a caller (the code calling the procedure) and a callee (the procedure being called). High-level languages take care of all the calling convention intricacies, and one can simply pass parameters to and from functions without caring about how they are being passed. When programming in assembly, the callee needs to know where or how the caller has passed the function's parameters, and the caller needs to know how the callee will return the answer. At the assembly level, the calling convention is not restricted at all, and programmers are free to define their own. The C++ compilers that ship with Visual Studio use the C calling convention, so it is usually advantageous to adopt this when programming assembly routines, especially if the routines are called from C++ or if they themselves call procedures written in C++.

The Stack

The stack is a portion of memory that is used as a semiautomatic last-in-first-out data structure for passing parameters to functions. It allows function calls to be recursive, handles parameter passing, return addresses, and is used to save registers or other values temporarily. Values are added to the stack using the PUSH and CALL instructions, and they are removed from the stack using the POP and RET instructions in the opposite order they were pushed. The stack is used to save the address in the .code segment of the caller of the function, such that when the subroutine is finished, the return address can be popped from the stack (using the RET instruction) and control can resume from the caller's position in code.

The stack is pointed to by a special pointer, the RSP (stack pointer). The instructions PUSH and POP both MOV data to the point RSP points to and the decrement (PUSH) or increment (POP) the stack pointer, such that the next value to be pushed will be done at the next address in the stack segment.

In the past, passing parameters and saving the return addresses was exclusively the task of the stack, but in x64 some parameters are passed via the registers. It is common to avoid the PUSH and POP instructions in favor of incrementing and decrementing the stack pointer manually and using MOV instructions. Manually manipulating the stack is common in x64, since the PUSH and POP instructions do not allow operands of any size. It is often faster to set the position of the RSP using ADD and SUB and using MOV instead of repeatedly calling PUSH. The stack is simply another segment in RAM that has been marked as read/write. The only difference between the stack and any other segment in the program is that the stack pointer (RSP) happens to point to it.

Scratch versus Non-Scratch Registers

In the C calling convention used by Visual Studio, some of the registers are expected to maintain the same values across function calls. Functions should not change the value of these registers in their code without restoring the original values prior to returning. These registers are called non-scratch.

Table 7: Register's Scratch/Non-Scratch Status

Register	Scratch/Non-Scratch
RAX	Scratch
RBX	Non-Scratch
RCX	Scratch
RDX	Scratch
RSI	Non-Scratch
RDI	Non-Scratch
RBP	Non-Scratch
RSP	Non-Scratch
R8 to R11	Scratch
R12 to R15	Non-Scratch
XMM0 to XMM5	Scratch
XMM6 to XMM15	Non-Scratch
ST(0) to ST(7)	Scratch
MM0 to MM7	Scratch
YMM0 to YMM5	Scratch
YMM6 to YMM15	Non-Scratch

Some of the registers can be modified at will by a subprocedure or function, and the caller does not expect that the subprocedure will maintain any particular values. These registers are called scratch.

There is nothing wrong with using a non-scratch register in your code. The following example uses RBX and RSI to sum the values from 100 to 1 together (both RBX and RSI are non-scratch). The important thing to note is that the non-scratch registers are pushed to the stack at the start of the procedure and popped just prior to returning.

Sum100 proc

push rbx ; Save RBX

push rsi ; Save RSI

xor rsi, rsi

mov rbx, 100

MyLoop:

add rsi, rbx

dec rbx

jnz MyLoop

mov rax, rsi

pop rsi ; Restore RSI

pop rbx ; Restore RBX

ret

Sum100 endp

The push instruction saves the value of the register to the stack, and the pop instruction pops it back into the register again. By the time the subprocedure returns, all of the non-scratch registers will have exactly the same values they had when the subprocedure was called.

It is often better to use scratch registers instead of pushing and popping non-scratch registers. Pushing and popping requires reading and writing to RAM, which is always slower than using the registers.

Passing Parameters

When we specify a procedure as using the C calling convention in x64 applications, Microsoft's C++ compiler uses fastcall, which means that some parameters are passed via the registers instead of using the stack. Only the first four parameters are passed via registers. Any additional parameters are passed via the stack.

Table 8: Integer and Float Parameters

Parameter Number	If integer	If float
1	RCX	XMM0
2	RDX	XMM1
3	R8	XMM2
4	R9	XMM3

Integer parameters are passed in RCX, RDX, R8, and R9 while floating point parameters use the first four SSE registers (XMM0 to XMM3). The appropriate size of the register is used such that if you are passing integers (32-bit values), then ECX, EDX, R8D, and R9D will be used. If you are passing bytes, then CL, DL, R8B, and R9B will be used. Likewise, if a floating point parameter is 32 bits (float in C++), it will occupy the lowest 32 bits of the appropriate SSE register, and if it is 64 bits (a C++ double), then it will occupy the lowest 64 bits of the SSE register.

Note: The first parameter is always passed in RCX or XMM0; the second is always passed in RDX or XMM2. If the first parameter is an integer and the second is a float, then the second will be passed in XMM1 and XMM0 will go unused. If the first parameter is a floating point value and the second is an integer, then the second will be passed in RDX and RCX will go unused.

As an example, consider the following C++ function prototype:

int SomeProc(int a, int b, float c, int d);

This procedure takes four parameters, which are floating point or integer values, so all of them are going to be passed via the registers (only the 5^th and subsequent parameters require the stack).

The following is how the C++ compiler will pass the parameters, or how it will expect you to pass them if you are calling a C++ procedure from assembly:

a will be passed in ECX
b will be passed in EDX
c will be passed in the lowest dword of XMM2
d will be passed in R9D

Integer values are always returned in RAX and floating point values are returned in XMM0. Pointers or references are also always returned in RAX.

The following example takes two integer parameters from a caller and adds them together, returning the result in RAX:

; First parameter is passed in ECX, second is passed in EDX

; The prototype would be something like: int AddInts(int a, int b);

AddInts proc

add ecx, edx ; Add the second parameter's value to the first

mov eax, ecx ; Place this result into EAX for return

ret ; Caller will read EAX for the return value

AddInts endp

Shadow Space

In the past, all parameters were passed to procedures via the stack. In the C calling convention, the caller still has to allocate blank stack space as if parameters were being passed on the stack, even though the values are being passed in the registers. The space you create on the stack in place of passing parameters when calling a function or subprocedure is called shadow space. It is the space where the parameters would have been passed had they not been placed into registers instead.

The amount of shadow space is supposed to be no less than 32 bytes, regardless of the number of parameters being passed. Even if you are passing a single byte, you reserve 32 bytes on the stack.

Note: This wasteful use of the stack is possibly due to it being easier to program the C++ compiler. Many things on this level of programming have little to no clear documentation or explanation available. The exact reasons for the Microsoft C calling convention using shadow space the way it does are not clear.

To call a function with the following prototype, use the following:

void Uppercase(char a);

The C++ compiler would use something like the following:

sub rsp, 20h ; Make 32 bytes of shadow space

mov cl, 'a' ; Move parameter in to cl

call Uppercase ; Call the function

add rsp, 20h ; Deallocate the shadow space from the stack

To call a function with six parameters, use the following:

void Sum(int a, int b, int c, int d, int e, int f);

Some parameters must be passed on the stack; only the first four will be passed using the registers.

sub rsp 20h ; Allocate 32 bytes of shadow space

mov ecx, a ; Move the four register parameters into their registers

mov edx, b

mov r8d, c

mov r9d, d

push f ; Push the remaining parameters onto the stack

push e

call Sum ; Call the function

add rsp, 28h; Delete shadow space and the parameters we passed via the stack

Note: Parameters passed via the stack are not actually removed from memory when the subroutine returns. The stack pointer is simply incremented such that newly pushed parameters will overwrite the old values.

To call a function written in C++ from an external assembly file, both C++ and assembly must have an extern keyword to say the function is available externally.

// C++ File:

#include <iostream>

using namespace std;

extern "C" void SubProc();

extern "C" int SumIntegers(int a, int b, int c, int d, int e, int f)

{

return a + b + c + d + e + f;

}

int main()

{

SubProc();

return 0;

}

; Assembly file in the same project

extern SumIntegers: proc

.code

SubProc proc

push 60 ; Push two params that don't

push 50 ; fit int regs. Opposite order!

sub rsp, 20h ; Allocate shadow space

mov ecx, 10 ; Move the first four params

mov edx, 20 ; into their regs in any order

mov r8d, 30

mov r9d, 40

call SumIntegers

add rsp, 30h ; Deallocate shadow space

; and space from params

; this is 6x8=48 bytes.

ret

SubProc endp

End

The stack is decreased as parameters are pushed onto it. Parameters are pushed from right to left (reverse order to that of a function's C++ prototype).

Bytes and dwords cannot be pushed onto the stack, as the PUSH instruction only takes a word or qword for its operand. For this reason, the stack pointer can be decremented to its final position (this is the number of operands multiplied by 8) in the instruction where shadow space is allocated. Parameters can then be moved into their appropriate positions in the stack segment with MOV instructions in place of the pushes.

The first parameter is moved into RCX, then the second into RDX, the third into R8, and the fourth into R9. The subsequent parameters are moved into RAM starting at RSP+20h, then RSP+28h, RSP+30h, and so on, leaving 8 bytes of space for each parameter on the stack whether they are qwords or bytes. Each additional parameter is RSP+xxh where xx is 8 multiplied by the parameter index.

Note: As an alternate to hexadecimal, it may be more natural to use octal. In octal, the fourth parameter is passed at RSP+40o, the fifth is RSP+50o, and the sixth is RSP+60o. This pattern continues until RSP+100o, which is the 8^th parameter.

; Assembly file alternate version without PUSH

extern SumIntegers: proc

.code

SubProc proc

sub rsp, 30h ; Sub enough for 6 parameters from RSP

mov ecx, 10 ; Move the first four params

mov edx, 20 ; into their regs in any order

mov r8d, 30

mov r9d, 40

; And we can use MOV to move dwords

; bytes or whatever we need to the stack

; as if we'd pushed them!

mov dword ptr [rsp+20h], 50

mov dword ptr [rsp+28h], 60