Assembly Language Succinctly^®
by Christopher Rose

CHAPTER 5

Data Segment

The data segment is the place in RAM that a program stores its global and static data. This data is defined at compile time. The data segment does not hold variables that are allocated at run time (the heap is used for this purpose) or variables that are local to subprocedures (the stack is used to hold these). Most of the information presented here is to be used in any segment. For instance, variables can be declared in the uninitialized data segment (.data) or the constant data segment (.constant).

Note: All variables declared in your data segment will become bytes in your actual .exe file. They are not generated when the program is run; they are read from the .exe file. Creating a data segment with 150-MB of variables will generate a 150-MB .exe file and will take a very long time to compile.

Scalar Data

Scalar data defined in the data segment is given a name, size, and optional initial value. To declare a variable in the data segment is to name an offset from the start of the data segment. All the variable names are actually pointers; they are a number referring to an offset into the data segment in RAM, so programmers do not have to remember the offsets as numbers.

Note: Variables in assembly are sometimes referred to as labels, but to avoid confusion with labels in the code segment, I will refer to data segment labels as variables.

To define a variable in the data segment, the general layout is as follows:

[VarName] [Type] [Initial Value]

Where [VarName] is any legal variable name and will be the name of the point in data that you wish to use to as a reference.

Note: The rules for variable names are the same as those for C++. They cannot begin with a digit, and they can contain letters, digits, and underscores. You can also use some additional symbols that are illegal in C++, such as @ and ?.

[Type] is the data type and can be any one of the data types or short versions in the ASM column of the Fundamental Data Types table.

The initial value can be either a literal value or it can be “?”. The question mark means the data is not given an initial value. In actuality, data will be given a value even if it is uninitialized. The point of the “?” is to declare that the programmer does not care what value the data is set to initially and presumably the program will set some other initial value prior to using the data.

Here are some examples of defining simple scalar data types in a data segment.

.data

myByte db 0 ; Defines a byte set to 0 called myByte

patientID dw ? ; Defines a word, uninitialized called patientID

averageSpeed dt 0.0 ; Defines 10-byte real, reals must have a decimal

; point if initialized

totalCost sdword 5000 ; Defines signed dword set to 5000, called totalCost

Note: The first variable is placed at the DS:0 (it is placed at the first byte of the data segment) and the second straight after that (there is no padding paced between variables). If the first variable was 1 byte then the second would be at DS:1. If the first variable was a word then the second would be at DS:2. The way consecutive variables are stored in RAM is called alignment and it is important for performance as some of the fastest data processing instructions require data to be aligned to 16 bytes.

Arrays

After scalar data types, the next most fundamental data type is probably the array. An array is a list of elements of the same data type in contiguous memory. In assembly, an array is just a block of memory and the first element of the array is given a name.

Arrays Declared with Commas

You can declare the elements of an array separated by commas.

MyWord dw 1, 2, 3, 4 ; Makes a 4 word array with 1, 2, 3, and 4 as elements

If you need to use more than one line, finish the line with a comma and continue on the next.

MyWord dw 1, 2, 3, 4, ; Four words in an array

5, 6, 7, 8 ; Another four words in the same array!

This is legal because you actually do not need the label at all. The MyWord name of the variable is completely optional.

Duplicate Syntax for Larger Arrays

You can create larger arrays in your data segment using the duplicate syntax, but remember that every byte in your data segment is a byte in your final file.

To create larger arrays you can declare an array of values with the following pattern (the duplicate syntax):

[Name] [type] [n] [dup (?)]

Where [Name] is the array name, any legal variable name. [Type] is one of the data types from the Fundamental Data Types table, and [n] is the number of items in the array. DUP is short for duplicate, and the data that it duplicates is in the parentheses. To make 50 words all set to 25 in an array called MyArray, the array declaration using the duplicate syntax would be the following:

MyArray word 50 dup (25)

You can combine the simple comma separated array definition syntax with the duplicate syntax and produce arrays of repeating patterns.

MyArray byte 50 dup (1, 6, 8)

This will define an array 150 bytes long (50×3) with the repeating pattern 1, 6, 8, 1, 6, 8, 1, 6, 8....

You can nest the duplicate directive to create multidimensional arrays. For example, to create a 10×25 dimensional byte array and to set all elements to A, you could use the following:

MyMArray byte 10 dup (25 dup ('A'))

Note: RAM is linear. Whether the sample code actually defines a 10×25 or a 25×10 array must be decided by the programmer. To the CPU, it is just a block of linear RAM and there is no such thing as a multidimensional array.

For a three-dimensional array, you could use something like this:

My3dArray byte 10 dup (25 dup (100 dup (0)))

This will create a 10×25×100 3-D array of bytes all set to 0. From the CPU's point of view, this 3-D array is exactly the same as the following:

My3dArray byte 25000 dup (0)

Getting Information about an Array

Once defined, MASM has some directives to retrieve information about the array:

lengthof: Returns the length of the array in elements.

sizeof: Returns the length of the array in bytes.

type: Returns the element size of the array in bytes.

For example, if you have an array called myArray and you want to move information about it into AX, you would do the following:

mov ax, lengthof myArray ; Move length in elements of the array

mov ax, sizeof myArray ; Move the size in bytes of the array

mov ax, type myArray ; Move the element size into AX

Note: These directives are translated to immediate values prior to assembling the file; therefore, "mov lengthof myArray, 200" is actually translated to "mov 16, 200". Moving a value into a literal constant means nothing (we cannot change the meaning of 16, even in assembly), so the line is illegal.

Defining Strings

In MASM, single and double quotes are exactly the same. They are used for both strings and single characters.

Note: A string in C and C++ is a byte array often with a null, 0, at the end. These types of strings are called zero delimited strings. Many C++ functions are designed to work with zero delimited strings.

To define a string of text characters, you can use this string syntax:

errMess db 'You do not have permission to do this thing, lol', 0

This is equivalent to defining a byte array with the values set to the ASCII numbers of the characters in the string. The Y at the start will be the first (least significant) byte of the array.

Note: The comma and zero at the end are the final null. This makes the string a null-terminated string as understood by cout and other C++ functions. Cout stops writing a string when it reaches the 0 character. In C++, the 0 is added automatically when we use the double quotes; in assembly, it must be explicit.

To define a string that does not fit on a single line, you can use the following:

myLongString db "This is a ",

"string that does not ",

"fit on a single lion!", 0

In each of the previous examples, the single quote could also have been used to the same effect:

myLongString db 'This is a ',

'string that does not',

'fit on a single lion!', 0

If you need to use either a single quote in a single quote array or a double quote in a double quote array, place two of them together:

myArr1 db "This ""IS"" Tom's array!", 0 ; This "IS" Tom's array!

myArr2 db 'That''s good, who''s Tom?', 0 ; That's good, who's Tom?

Typedef

You can declare your own names for data types with the type definition (typedef) directive.

integer typedef sword ; Defines “integer” to mean sword

MyInteger integer ? ; Defines a new sword called MyInteger

You cannot use reserved words for your typedefs, so trying to make a signed dword type called “int” will not work, since “int” is the x86 instruction to call an interrupt.

Note: You can use typedef to define new names for user-defined types, fundamental types, structures, unions, and records.

Structures and Unions

To define a structure (analogous to a C++ struct), you can use the struc (or struct) directive.

ExampleStructure struct ; Structure name followed by "struct" or "struc"

X word 0

Y word 0

Z word 0

ID byte 0

ExampleStructure ends ; The name followed by “ends” closes the definition

This would create a structure with four variables; three are words set to 0 and called X, Y, and Z, and the final variable is a byte called ID, also set to 0 by default.

The previous example was the prototype. To create an instance of the previous structure prototype, you can use the following:

person1 ExampleStructure { } ; Declares person1 with default values

person2 ExampleStructure { 10, 25, 8, ? } ; Declares person2 with

; specific values

; and ID of ?, or 0 probably

Each field of the instance of the structure can be initialized with the value supplied in respective order in curly brackets. Use “?” to initialize to MASM's default value (0). You can initialize less than the amount of values the structure and the rest are automatically given. These are their default values as per the structure's prototype.

person2 ExampleStructure { 10 } ; Declares person2 with 10 for x

; but the rest are as per the

; structure's prototype

With a prototype declaration, you can create an instance of this structure with some of the values initialized, and others with their defaults, by not including any value. Just place whitespace with a comma to indicate where the value would have been.

MyStructure struct

x word 5

y word 7

MyStructure ends

InstanceOfStruct MyStructure { 9, } ; Change x to 9 but keep y

; as 5 as per prototype

To change the values of a previously instantiated structure from code, you can use a period in a similar manner to accessing structure elements in C++.

mov person1.X, 25 ; Moves 25 into person1's X

mov person2.ID, 90 ; Moves 90 into person2's ID

Note: When structures are passed to functions from C++, they are not passed by reference. They are copied to the registers and stack depending on the size of the structure. If a structure has two integers, then the whole instance of the structure will be copied to RCX (since two 32-bit dwords fit into the 64-bit RCX). This is awkward because you cannot reference the separate elements of the structure when they are in a register. For instance, there is no way to reference the top dword of RCX. For this reason, it may be easier to pass structures from C++ as pointers.

You can load the effective address of a previously instantiated structure with the LEA instruction (load effective address). To use a register (RCX in this example) as a pointer to an instance of a structure, you must tell MASM the address, type of structure being pointed to, and the field.

lea rcx, person1 ; Loads the address of person1 into RCX

mov [rcx].ExampleStructure.X, 200 ; Moves 200 into person1.X using

; RCX as a pointer

The CPU does not check to make sure RCX is actually pointing to an ExampleStructure instance. RCX could be pointing to anything. [RCX].ExampleStructure.X simply means find what RCX is pointing to and add the amount that X was offset in the ExampleStructure prototype to this address. In other words, [RCX].ExampleStructure.X translates to RCX+0, since X was at byte number 0 in the prototype of ExampleStructure. [RCX].ExampleStructure.Y translates to RCX+2, since Y was the second element after the two byte word X.

To pass an instance of a structure as a parameter to a function, it is usual to pass its address and manipulate it as per the previous point. This is passing by reference, and the initial object will be changed, but it is much faster than copying the data of the structure to the registers and stack in the manner of C++.

; This is the function that is initially called

Function1 proc

lea rcx, person2 ; Load *person2 into RCX to be passed to Fiddle

call Fiddle ; Call Fiddle with RCX param 1

ret

Function1 endp

; Fiddle, RCX = *ExampleStructure

Fiddle proc

mov [rcx].ExampleStructure.Y, 89 ; Change something

ret

Fiddle endp

Structures of Structures

To define a structure that has a smaller substructure as its member variables, declare the smaller one first. Then place the instances of the substructure inside the declaration of the larger structure.

; This is the smaller sub-structure

Point struct

X word 0

y word 0

Point ends

; This is the larger structure that owns a Point as one of its parameters:

Square struct

cnr1 Point { 7, 4 } ; This one uses 7 and 4

cnr2 Point { } ; Use default parameters!

Square ends

To declare an instance of a structure that contains substructures in the data segment, you can use nested curly braces.

MySquare Square { { 9, 8 }, { ?, 7 } }

Note: If you do not want to set any of the values of a struct, you can use {} to mean defaults for all values, even if the structure has substructures within it.

To set the value of a structure's substructure, append a period to specify which variable you wish to change.

mov MySquare.cnr1.Y, 5

You can use a register as a pointer and reference the nested structure’s elements as follows:

mov word ptr [rcx].Square.cnr1.X, 10

Unions

A union is similar to a structure, except the actual memory used for each of the elements in the union is physically at the same place in RAM. Unions are a way to reference the same address in RAM as more than one data type.

MyUnion union

w1 word 0

d1 dword 0

MyUnion ends ; Note that it is ends, not endu!

Here, MyUnion.w1 has exactly the same address as MyUnion.w2. The dword version is 4 bytes long and the word is only 2 bytes, but the least significant byte of both has the same address.

Records

Records are another complex data type of MASM. They are like structures, but they work on and are defined at the bit level. The syntax for definition is as follows:

[name] RECORD [fldName:sz], [fldName:sz]...

Where [name] is the name of the record, [fldName] is the name of a field, and [sz] is the size of the field in bits.

color RECORD blBit:1, hueNib:4

The sample code in the data segment is the prototype to a record called color. The record can then be accessed by the following:

mov cl, blBit

This would move 4 into CL, since blBit was defined as bit number 4 in the record. hueNib takes bits 0, 1, 2, and 3, and blBit comes after this.

You cannot use a record to access bits directly.

mov [rax].color.blBit, 1 ; Won't change the 4th bit from RAX to 1

A record is just a form of directive; it defines a set of constants to be used with bitwise operations. The constants are bit indices. You can use a record for rotating.

mov cl, blBit ; Move the index of the record's blBit into cl

rol rax, cl ; Use this to rotate the bits in RAX

You can define records in your data segment and initialize the values of the bit fields just as you can with a structure. This is the only time you can set each element of a record without using bitwise operations.

.data

color RECORD qlBit:3, blBit:1, hueNib:4 ; Defines a record

; Following defines a new byte with the bits set as specified

; by the record declaration:

; qlBit gets 0, blBit gets 1 and the hueNib gets 2

; So MyColor will actuall be a byte set to 00010010b

MyColor color { 0, 1, 2 } ; Declare a color record with initializers

.code _text

Function1 proc

mov cl, MyColor ; Moves 000:1:0010b, or 18 in decimal

ret

Function1 endp

Note: The qlBit, blBit and hueBit from the previous record become constants of their bit indices: hueBit = 0, blBit = 4, qlBit = 5.

You can get the width in bits of a field in a record by using MASM's WIDTH directive.

mov ax, WIDTH color.hueNib

You can get a bit mask of the record's field by using MASM's MASK directive.

and al, mask myCol.blBit; AND contents of AL with bit mask of defined color

; record

You can specify NOT prior to the MASK directive to flip the bit mask.

and al, NOT MASK myCol.blBit

Constants Using Equates To

You can define a numerical constant using the = symbol, and you can define numerical and text constants using the equ directive. This is short for “equates to.”

Somevar = 25 ; Somevar becomes a constant immediate value 25

name equ 237 ; "name" is the symbol for the constant

mov eax, name ; Translates to “mov eax, 237”

moc ecx, SomeVar ; Sets ECX to 25

You can also use the EQU directive to define text constants by surrounding the value with triangle braces.

quickMove equ <mov eax, 23>

quickMove ; Translates to “mov eax, 23”

You can use the equates directive to define machine code by using a db (define byte) in the value.

NoOperation equ <db 90h> ; 90h is machine code for the NOP instruction

NoOperation ; Translates to NOP or 90h

This usage of db in the code segment is fine because db does nothing more than place the exact byte values you specify at the position in the file. Using db in the code segment effectively enables us to program in pure machine code.

; This procedure returns 1 if ECX is odd

; otherwise it returns 0, it is programmed

; in pure machine code using db.

IsOdd proc

db 83h, 0E1h, 01h, ; and ecx, 1

8Bh, 0C1h, ; mov eax, ecx

0C3h ; ret

IsOdd endp

The point of using pure machine code is that sometimes an assembler may not understand some instructions that the CPU can understand. An older assembler may not understand the SSE instructions. By using EQU and db in the manner described previously, a programmer can define his or her own way of specifying SSE instructions, whether the assembler understands them naturally or not.

Macros

You can define macro functions using the macro directive.

[name] MACRO [p1], [p2]...

; Macro body

ENDM

Where [name] is the symbol associated with the macro, MACRO and ENDM are keywords, and [p1], [p2], and any other symbols are the parameters as they are referred to in the body of the macro.

Halve macro dest, input ; dest and input are the parameters

mov dest, input ;; Refer to parameters in body

shr dest, 1

endm ; endm with no macro name preceding

; And later in your code:

Halve ecx, 50 ; Moves 25 into ecx

Halve eax, ecx ; Moves 12 into eax

Halve ecx, ecx ; Moves 12 into ecx

Halve 25, ecx ; Error, ecx/2 cannot be stored in 25!

The symbol name is swapped for the corresponding code each time MASM finds the macro name when assembling. This means that if there are labels in the macro code (if the code has jumps to other points within its code), MASM will write the labels again and again. Each time the macro is used, the labels will appear. Since MASM cannot allow duplicate labels and still know where to jump, you can define labels as local in the macro definition. Labels defined as local will actually be replaced by an automatically generated, unique label.

SomeMacro macro dest, input

local label1, label2

test dest, 1

jnz label1

jz label2

label1: ;; Automatically renamed ??0000

mov eax, 3

label2: ;; Automatically renamed ??0001

mov ecx, 12 ;; Each label each time SomeMacro is

;; called will increment the counter,

;; next will be ??0002 the ??0003 etc.

Endm

Note: You may have noticed the “;;” comments in the body of the macros; these are macro comments. They are useful when generating listing files, since these comments will not appear every time a macro function is referenced in code, only once at the macro's definition. If you use the single “;” comments the same comments will appear over and over throughout the generated listing file.

In your macro definition you can specify default values for any parameters, allowing you to call the macro without specifying every parameter (place := and then the default value after the parameter's name, somevariable:=<eax>). You can also indicate that particular parameters are required (place a colon followed by req, somevariable:req).

Note: When specifying the default values, the syntax is similar to the “equ” directive; instead of “eax” you must use “<eax>”.

SomeMacro macro p1:=<eax>, p2:req, p3:=<49>

;; Macro body

Endm

The macro definition in the sample code would allow us to omit values for both first and third parameters. Only the second is required, and the others can be left to defaults.

; Specify all parameters:

SomeMacro ecx, 389, 12 ; p1 = ecx

; p2 = 389

; p3 = 12

; Just specify parameter 2:

SomeMacro , ebx, ; p1 = eax from default

; p2 = ebx

; p3 = 49 from default

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now