CHAPTER 5
The data segment is the place in RAM that a program stores its global and static data. This data is defined at compile time. The data segment does not hold variables that are allocated at run time (the heap is used for this purpose) or variables that are local to subprocedures (the stack is used to hold these). Most of the information presented here is to be used in any segment. For instance, variables can be declared in the uninitialized data segment (.data) or the constant data segment (.constant).
Note: All variables declared in your data segment will become bytes in your actual .exe file. They are not generated when the program is run; they are read from the .exe file. Creating a data segment with 150-MB of variables will generate a 150-MB .exe file and will take a very long time to compile.
Scalar data defined in the data segment is given a name, size, and optional initial value. To declare a variable in the data segment is to name an offset from the start of the data segment. All the variable names are actually pointers; they are a number referring to an offset into the data segment in RAM, so programmers do not have to remember the offsets as numbers.
Note: Variables in assembly are sometimes referred to as labels, but to avoid confusion with labels in the code segment, I will refer to data segment labels as variables.
To define a variable in the data segment, the general layout is as follows:
[VarName] [Type] [Initial Value]
Where [VarName] is any legal variable name and will be the name of the point in data that you wish to use to as a reference.
Note: The rules for variable names are the same as those for C++. They cannot begin with a digit, and they can contain letters, digits, and underscores. You can also use some additional symbols that are illegal in C++, such as @ and ?.
[Type] is the data type and can be any one of the data types or short versions in the ASM column of the Fundamental Data Types table.
The initial value can be either a literal value or it can be “?”. The question mark means the data is not given an initial value. In actuality, data will be given a value even if it is uninitialized. The point of the “?” is to declare that the programmer does not care what value the data is set to initially and presumably the program will set some other initial value prior to using the data.
Here are some examples of defining simple scalar data types in a data segment.
.data myByte db 0 ; Defines a byte set to 0 called myByte patientID dw ? ; Defines a word, uninitialized called patientID averageSpeed dt 0.0 ; Defines 10-byte real, reals must have a decimal ; point if initialized totalCost sdword 5000 ; Defines signed dword set to 5000, called totalCost |
Note: The first variable is placed at the DS:0 (it is placed at the first byte of the data segment) and the second straight after that (there is no padding paced between variables). If the first variable was 1 byte then the second would be at DS:1. If the first variable was a word then the second would be at DS:2. The way consecutive variables are stored in RAM is called alignment and it is important for performance as some of the fastest data processing instructions require data to be aligned to 16 bytes.
After scalar data types, the next most fundamental data type is probably the array. An array is a list of elements of the same data type in contiguous memory. In assembly, an array is just a block of memory and the first element of the array is given a name.
You can declare the elements of an array separated by commas.
MyWord dw 1, 2, 3, 4 ; Makes a 4 word array with 1, 2, 3, and 4 as elements |
If you need to use more than one line, finish the line with a comma and continue on the next.
MyWord dw 1, 2, 3, 4, ; Four words in an array 5, 6, 7, 8 ; Another four words in the same array! |
This is legal because you actually do not need the label at all. The MyWord name of the variable is completely optional.
You can create larger arrays in your data segment using the duplicate syntax, but remember that every byte in your data segment is a byte in your final file.
To create larger arrays you can declare an array of values with the following pattern (the duplicate syntax):
[Name] [type] [n] [dup (?)]
Where [Name] is the array name, any legal variable name. [Type] is one of the data types from the Fundamental Data Types table, and [n] is the number of items in the array. DUP is short for duplicate, and the data that it duplicates is in the parentheses. To make 50 words all set to 25 in an array called MyArray, the array declaration using the duplicate syntax would be the following:
MyArray word 50 dup (25) |
You can combine the simple comma separated array definition syntax with the duplicate syntax and produce arrays of repeating patterns.
MyArray byte 50 dup (1, 6, 8) |
This will define an array 150 bytes long (50×3) with the repeating pattern 1, 6, 8, 1, 6, 8, 1, 6, 8....
You can nest the duplicate directive to create multidimensional arrays. For example, to create a 10×25 dimensional byte array and to set all elements to A, you could use the following:
MyMArray byte 10 dup (25 dup ('A')) |
Note: RAM is linear. Whether the sample code actually defines a 10×25 or a 25×10 array must be decided by the programmer. To the CPU, it is just a block of linear RAM and there is no such thing as a multidimensional array.
For a three-dimensional array, you could use something like this:
My3dArray byte 10 dup (25 dup (100 dup (0))) |
This will create a 10×25×100 3-D array of bytes all set to 0. From the CPU's point of view, this 3-D array is exactly the same as the following:
My3dArray byte 25000 dup (0) |
Once defined, MASM has some directives to retrieve information about the array:
lengthof: Returns the length of the array in elements.
sizeof: Returns the length of the array in bytes.
type: Returns the element size of the array in bytes.
For example, if you have an array called myArray and you want to move information about it into AX, you would do the following:
mov ax, lengthof myArray ; Move length in elements of the array mov ax, sizeof myArray ; Move the size in bytes of the array mov ax, type myArray ; Move the element size into AX |
Note: These directives are translated to immediate values prior to assembling the file; therefore, "mov lengthof myArray, 200" is actually translated to "mov 16, 200". Moving a value into a literal constant means nothing (we cannot change the meaning of 16, even in assembly), so the line is illegal.
In MASM, single and double quotes are exactly the same. They are used for both strings and single characters.
Note: A string in C and C++ is a byte array often with a null, 0, at the end. These types of strings are called zero delimited strings. Many C++ functions are designed to work with zero delimited strings.
To define a string of text characters, you can use this string syntax:
errMess db 'You do not have permission to do this thing, lol', 0 |
This is equivalent to defining a byte array with the values set to the ASCII numbers of the characters in the string. The Y at the start will be the first (least significant) byte of the array.
Note: The comma and zero at the end are the final null. This makes the string a null-terminated string as understood by cout and other C++ functions. Cout stops writing a string when it reaches the 0 character. In C++, the 0 is added automatically when we use the double quotes; in assembly, it must be explicit.
To define a string that does not fit on a single line, you can use the following:
myLongString db "This is a ", "string that does not ", "fit on a single lion!", 0 |
In each of the previous examples, the single quote could also have been used to the same effect:
myLongString db 'This is a ', 'string that does not', 'fit on a single lion!', 0 |
If you need to use either a single quote in a single quote array or a double quote in a double quote array, place two of them together:
myArr1 db "This ""IS"" Tom's array!", 0 ; This "IS" Tom's array! myArr2 db 'That''s good, who''s Tom?', 0 ; That's good, who's Tom? |
You can declare your own names for data types with the type definition (typedef) directive.
integer typedef sword ; Defines “integer” to mean sword
MyInteger integer ? ; Defines a new sword called MyInteger
You cannot use reserved words for your typedefs, so trying to make a signed dword type called “int” will not work, since “int” is the x86 instruction to call an interrupt.
Note: You can use typedef to define new names for user-defined types, fundamental types, structures, unions, and records.
To define a structure (analogous to a C++ struct), you can use the struc (or struct) directive.
ExampleStructure struct ; Structure name followed by "struct" or "struc" X word 0 Y word 0 Z word 0 ID byte 0 ExampleStructure ends ; The name followed by “ends” closes the definition |
This would create a structure with four variables; three are words set to 0 and called X, Y, and Z, and the final variable is a byte called ID, also set to 0 by default.
The previous example was the prototype. To create an instance of the previous structure prototype, you can use the following:
person1 ExampleStructure { } ; Declares person1 with default values person2 ExampleStructure { 10, 25, 8, ? } ; Declares person2 with ; specific values ; and ID of ?, or 0 probably |
Each field of the instance of the structure can be initialized with the value supplied in respective order in curly brackets. Use “?” to initialize to MASM's default value (0). You can initialize less than the amount of values the structure and the rest are automatically given. These are their default values as per the structure's prototype.
person2 ExampleStructure { 10 } ; Declares person2 with 10 for x ; but the rest are as per the ; structure's prototype |
With a prototype declaration, you can create an instance of this structure with some of the values initialized, and others with their defaults, by not including any value. Just place whitespace with a comma to indicate where the value would have been.
MyStructure struct x word 5 y word 7 MyStructure ends InstanceOfStruct MyStructure { 9, } ; Change x to 9 but keep y ; as 5 as per prototype |
To change the values of a previously instantiated structure from code, you can use a period in a similar manner to accessing structure elements in C++.
mov person1.X, 25 ; Moves 25 into person1's X mov person2.ID, 90 ; Moves 90 into person2's ID |
Note: When structures are passed to functions from C++, they are not passed by reference. They are copied to the registers and stack depending on the size of the structure. If a structure has two integers, then the whole instance of the structure will be copied to RCX (since two 32-bit dwords fit into the 64-bit RCX). This is awkward because you cannot reference the separate elements of the structure when they are in a register. For instance, there is no way to reference the top dword of RCX. For this reason, it may be easier to pass structures from C++ as pointers.
You can load the effective address of a previously instantiated structure with the LEA instruction (load effective address). To use a register (RCX in this example) as a pointer to an instance of a structure, you must tell MASM the address, type of structure being pointed to, and the field.
lea rcx, person1 ; Loads the address of person1 into RCX mov [rcx].ExampleStructure.X, 200 ; Moves 200 into person1.X using ; RCX as a pointer |
The CPU does not check to make sure RCX is actually pointing to an ExampleStructure instance. RCX could be pointing to anything. [RCX].ExampleStructure.X simply means find what RCX is pointing to and add the amount that X was offset in the ExampleStructure prototype to this address. In other words, [RCX].ExampleStructure.X translates to RCX+0, since X was at byte number 0 in the prototype of ExampleStructure. [RCX].ExampleStructure.Y translates to RCX+2, since Y was the second element after the two byte word X.
To pass an instance of a structure as a parameter to a function, it is usual to pass its address and manipulate it as per the previous point. This is passing by reference, and the initial object will be changed, but it is much faster than copying the data of the structure to the registers and stack in the manner of C++.
; This is the function that is initially called Function1 proc lea rcx, person2 ; Load *person2 into RCX to be passed to Fiddle call Fiddle ; Call Fiddle with RCX param 1 ret Function1 endp ; Fiddle, RCX = *ExampleStructure Fiddle proc mov [rcx].ExampleStructure.Y, 89 ; Change something ret Fiddle endp |
To define a structure that has a smaller substructure as its member variables, declare the smaller one first. Then place the instances of the substructure inside the declaration of the larger structure.
; This is the smaller sub-structure Point struct X word 0 y word 0 Point ends ; This is the larger structure that owns a Point as one of its parameters: Square struct cnr1 Point { 7, 4 } ; This one uses 7 and 4 cnr2 Point { } ; Use default parameters! Square ends |
To declare an instance of a structure that contains substructures in the data segment, you can use nested curly braces.
MySquare Square { { 9, 8 }, { ?, 7 } } |
Note: If you do not want to set any of the values of a struct, you can use {} to mean defaults for all values, even if the structure has substructures within it.
To set the value of a structure's substructure, append a period to specify which variable you wish to change.
mov MySquare.cnr1.Y, 5 |
You can use a register as a pointer and reference the nested structure’s elements as follows:
mov word ptr [rcx].Square.cnr1.X, 10 |
A union is similar to a structure, except the actual memory used for each of the elements in the union is physically at the same place in RAM. Unions are a way to reference the same address in RAM as more than one data type.
MyUnion union w1 word 0 d1 dword 0 MyUnion ends ; Note that it is ends, not endu! |
Here, MyUnion.w1 has exactly the same address as MyUnion.w2. The dword version is 4 bytes long and the word is only 2 bytes, but the least significant byte of both has the same address.
Records are another complex data type of MASM. They are like structures, but they work on and are defined at the bit level. The syntax for definition is as follows:
[name] RECORD [fldName:sz], [fldName:sz]...
Where [name] is the name of the record, [fldName] is the name of a field, and [sz] is the size of the field in bits.
color RECORD blBit:1, hueNib:4 |
The sample code in the data segment is the prototype to a record called color. The record can then be accessed by the following:
mov cl, blBit |
This would move 4 into CL, since blBit was defined as bit number 4 in the record. hueNib takes bits 0, 1, 2, and 3, and blBit comes after this.
You cannot use a record to access bits directly.
mov [rax].color.blBit, 1 ; Won't change the 4th bit from RAX to 1 |
A record is just a form of directive; it defines a set of constants to be used with bitwise operations. The constants are bit indices. You can use a record for rotating.
mov cl, blBit ; Move the index of the record's blBit into cl rol rax, cl ; Use this to rotate the bits in RAX |
You can define records in your data segment and initialize the values of the bit fields just as you can with a structure. This is the only time you can set each element of a record without using bitwise operations.
.data color RECORD qlBit:3, blBit:1, hueNib:4 ; Defines a record ; Following defines a new byte with the bits set as specified ; by the record declaration: ; qlBit gets 0, blBit gets 1 and the hueNib gets 2 ; So MyColor will actuall be a byte set to 00010010b MyColor color { 0, 1, 2 } ; Declare a color record with initializers .code _text Function1 proc mov cl, MyColor ; Moves 000:1:0010b, or 18 in decimal ret Function1 endp |
Note: The qlBit, blBit and hueBit from the previous record become constants of their bit indices: hueBit = 0, blBit = 4, qlBit = 5.
You can get the width in bits of a field in a record by using MASM's WIDTH directive.
mov ax, WIDTH color.hueNib |
You can get a bit mask of the record's field by using MASM's MASK directive.
and al, mask myCol.blBit; AND contents of AL with bit mask of defined color ; record |
You can specify NOT prior to the MASK directive to flip the bit mask.
and al, NOT MASK myCol.blBit |
You can define a numerical constant using the = symbol, and you can define numerical and text constants using the equ directive. This is short for “equates to.”
Somevar = 25 ; Somevar becomes a constant immediate value 25 name equ 237 ; "name" is the symbol for the constant mov eax, name ; Translates to “mov eax, 237” moc ecx, SomeVar ; Sets ECX to 25 |
You can also use the EQU directive to define text constants by surrounding the value with triangle braces.
quickMove equ <mov eax, 23> quickMove ; Translates to “mov eax, 23” |
You can use the equates directive to define machine code by using a db (define byte) in the value.
NoOperation equ <db 90h> ; 90h is machine code for the NOP instruction NoOperation ; Translates to NOP or 90h |
This usage of db in the code segment is fine because db does nothing more than place the exact byte values you specify at the position in the file. Using db in the code segment effectively enables us to program in pure machine code.
; This procedure returns 1 if ECX is odd ; otherwise it returns 0, it is programmed ; in pure machine code using db. IsOdd proc db 83h, 0E1h, 01h, ; and ecx, 1 8Bh, 0C1h, ; mov eax, ecx 0C3h ; ret IsOdd endp |
The point of using pure machine code is that sometimes an assembler may not understand some instructions that the CPU can understand. An older assembler may not understand the SSE instructions. By using EQU and db in the manner described previously, a programmer can define his or her own way of specifying SSE instructions, whether the assembler understands them naturally or not.
You can define macro functions using the macro directive.
[name] MACRO [p1], [p2]...
; Macro body
ENDM
Where [name] is the symbol associated with the macro, MACRO and ENDM are keywords, and [p1], [p2], and any other symbols are the parameters as they are referred to in the body of the macro.
|
Halve macro dest, input ; dest and input are the parameters mov dest, input ;; Refer to parameters in body shr dest, 1 endm ; endm with no macro name preceding ; And later in your code: Halve ecx, 50 ; Moves 25 into ecx Halve eax, ecx ; Moves 12 into eax Halve ecx, ecx ; Moves 12 into ecx Halve 25, ecx ; Error, ecx/2 cannot be stored in 25! |
The symbol name is swapped for the corresponding code each time MASM finds the macro name when assembling. This means that if there are labels in the macro code (if the code has jumps to other points within its code), MASM will write the labels again and again. Each time the macro is used, the labels will appear. Since MASM cannot allow duplicate labels and still know where to jump, you can define labels as local in the macro definition. Labels defined as local will actually be replaced by an automatically generated, unique label.
SomeMacro macro dest, input local label1, label2 test dest, 1 jnz label1 jz label2 label1: ;; Automatically renamed ??0000 mov eax, 3 label2: ;; Automatically renamed ??0001 mov ecx, 12 ;; Each label each time SomeMacro is ;; called will increment the counter, ;; next will be ??0002 the ??0003 etc. Endm |
Note: You may have noticed the “;;” comments in the body of the macros; these are macro comments. They are useful when generating listing files, since these comments will not appear every time a macro function is referenced in code, only once at the macro's definition. If you use the single “;” comments the same comments will appear over and over throughout the generated listing file.
In your macro definition you can specify default values for any parameters, allowing you to call the macro without specifying every parameter (place := and then the default value after the parameter's name, somevariable:=<eax>). You can also indicate that particular parameters are required (place a colon followed by req, somevariable:req).
Note: When specifying the default values, the syntax is similar to the “equ” directive; instead of “eax” you must use “<eax>”.
SomeMacro macro p1:=<eax>, p2:req, p3:=<49> ;; Macro body Endm |
The macro definition in the sample code would allow us to omit values for both first and third parameters. Only the second is required, and the others can be left to defaults.
; Specify all parameters: SomeMacro ecx, 389, 12 ; p1 = ecx ; p2 = 389 ; p3 = 12 ; Just specify parameter 2: SomeMacro , ebx, ; p1 = eax from default ; p2 = ebx ; p3 = 49 from default |