CHAPTER 7
We’ve used the high level shader language a little in our code so far, and now we’ll look in more detail at the language. It is a C based language, but it is designed specifically for the parallel architecture of the GPU. Code written in HLSL usually runs many times at once. For instance, a vertex shader’s code might execute thousands of times, once for every vertex in a 3-D scene. The GPU does not do this sequentially, instead it executes thousands in parallel.
HLSL also has special data types and functions that are designed to make 3-D programming easier. The GPU has its own machine code that is completely different from the CPU’s, and its instructions set is full of fast methods for vector and matrix operations.
Most of the regular fundamental data types from C are available in HLSL. Depending on the hardware you are targeting, some may not be available. For instance, the double is only available on newer hardware.
The majority of HLSL variables are structures, vectors, or matrices. HLSL is far better at dealing with large amounts of similarly structured data than it is at dealing with single values. This is the opposite of the CPU. A CPU would tend to treat a 4x4 floating point matrix as 16 distinct floating point variables, whereas the GPU tends to operate on the 16 floating point at once in SIMD.
The scalar types are single values like int or bool. They are used either as single values or in small collections as matrices, vectors, and structures:
Note: The 16 bit IEEE half floating type was a way to compress 32 bit floats. With a loss in precision, 32 bit floats could be compressed to 16 bits. The data type is excellent for storing large amounts of data but it is now deprecated and it is recommended that you do not use it
We have used HLSL semantic names already; these names describe what a particular element of a structure will be used for. We describe the semantic names when we specify the layout of data in the C++ code, and we also describe it in the structure definitions in the shader’s HLSL code. For instance, the following code is used in our VertexShader class. By the end of the last chapter we’d not yet included the NORMAL element shown in the following code table. We will add this element in Chapter 8 Lighting when we look at lighting.
// Describe the layout of the data const D3D11_INPUT_ELEMENT_DESC vertexDesc[] = { { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 }, { "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 }, { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 24, D3D11_INPUT_PER_VERTEX_DATA, 0 }, }; |
The semantic names here are POSITION, NORMAL and TEXCOORD. These names are fairly self-explanatory. Elements with a POSITION semantic are generally going to be used to specify the positions of vertices and I’m sure you can guess the use of the other two semantics. For a complete list of the semantic names, visit the MSDN website:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb509647%28v=vs.85%29.aspx.
The second parameter for each element in the previous code is a number that can be used to differentiate the values where the same semantic is used more than once. For instance, if you have a vertex type which uses more than one texture, like for bump mapping or various other tricks, you might include two TEXCOORD elements in the layout description. In this case, the first one could have a number 0 and the second could be number 1. These numbers are referenced in the shaders after the semantic names. Note the zeros after the semantic names in the code below. This code is the HLSL structure corresponding to the C++ description above as it might appear in a vertex shader, as shown in the following code table.
// The input vertices struct VertexShaderInput { float3 position : POSITION0; float3 normal : NORMAL0; float2 tex : TEXCOORD0; }; |
You may have noticed that the vertex shader input specifies POSITION as the semantic and the output specifies SV_POSITION. This SV_POSITION is also used as the semantic in the pixel shader’s HLSL code. The SV stands for system value. The output of a vertex shader, or the input of a pixel shader, must specify all four components of an SV_POSITION at a very minimum. The pipeline inherently knows the meaning of an SV_POSITION output from the vertex shader. An SV_POSITION, when it is the output from a vertex shader, has been transformed by the vertex shader to its final state; it is ready to be rasterized by the rasterizer stage of the pipeline.
Vectors are small arrays, between one and four components, of the same scalar data type. The term vector here has absolutely nothing to do with the C++ STL vector class that we used to read our model data earlier. Operations on vectors by the GPU are executed in a similar style to an SIMD, and they are very efficient. We have seen and used vectors in our code so far to store and manipulate the positions of vertices, the colors of the vertices, and the texture coordinates. The term vector, as used here, is more flexible than the standard definition in mathematics. In mathematics, a vector often specifies a magnitude or length and a direction. Vectors in DirectX are simply small collections of data. We can use a vector of floats in any way we please, and they need not specify a direction and magnitude. We have used DirectX vectors already in code to specify positions, texture coordinates, and colors.
Any of the scalar types mentioned above can be used to create a vector. There are two syntaxes, and the following is the first.
typeCount variableName;
Where type is one of the scalar types, count is 1, 2, 3, or 4, and represents the number of components in the vector; and variableName is the name of the variable being declared. You can initialize the values for a vector using curly braces. Here are some examples.
int4 myIntVector;
float2 fv = { 0.5f, 0.5f };
double1 d = 0.6;
The other syntax uses the vector keyword. Other than the syntax, this does exactly the same thing.
Vector <type, count> variableName;
The following examples are the same as those above, only the syntax uses the vector keyword.
vector<int, 4> myIntVector;
vector<float, 2> fv = { 0.5f, 0.5f };
vector<double, 1> d = 0.6;
You can also initialize vectors using the following syntax.
int4 myVector = int4(1, 2, 3, 4);
The elements can be accessed using the standard array notation, or you can use the color space notation (RGBA) or the coordinate notation (XYZW). These notations are sometimes called color component namespace or coordinate component namespace.
Vector <int, 4> myVector; // Referencing the first element: myVector[0] = 50; // These three lines all do the same thing. myVector.r = 50; // They set the first element of the vector to 50. myVector.x = 50; myVector[1] = 10; // These three lines all set the second element to 10. myVector.g = 10; myVector.y = 10; myVector[2] = 20; // These three lines set the third element to 20. myVector.b = 20; myVector.z = 20; myVector[3] = 42; // These lines set the final element of the vector to 42. myVector.a = 42; myVector.z = 42; |
A swizzle is a method of accessing the components of a vector or matrix as a collection.
int4 v1 = { 1, 3, 9, 27 }; int4 v2 = { 2, 4, 6, 8 ;} v1.rb = v2.gr; |
In the above code there are two vectors being declared. The third line contains the swizzle. It assigns the g and r components of the v2 vector to the r and b components of the v1 vector. The v1 vector will contain { 4, 3, 2, 27 } after this operation; the vector v2 is not changed.
You cannot mix the color and coordinate component namespaces in a swizzle. The following is illegal because it uses both r and a from RGBA and X and Y from XYZW.
v1.rx = v2.ya; |
You can reuse the same element more than once as the source.
v1.rgba = v2.rrrr; // Broadcast the r element of v2 across v1 |
You cannot reuse the same element more than once as the destination (v1 in the following) because it is nonsense.
v1.rr = v2.gb; // Illegal, cannot set v1.r to multiple values |
Matrix types are similar to vectors, only they can store more values and they can be 2-D. Each dimension of the matrices can range from one to four elements, so matrices can be anything from one to sixteen elements wide (which is a 1x1 or a 4x4 matrix). Like vectors, matrices are created as a collection of scalar data types, and every element in a matrix must have the same type. Also like vectors, there are two syntaxes for declaring a matrix, with and without the matrix keyword.
typeRowsxCols variableName; |
Where type is the data type, Rows is the number of rows in the matrix, and Cols is the number of columns.
int3x2 neo; float4x4 trinity; float1x2 morpheus; |
The other syntax using the matrix keyword achieves exactly the same thing.
matrix <type, rows, cols> variableName; |
For example:
matrix <int, 3, 2> neo; matrix <float, 4, 4> trinity; matrix <float,1, 2> morpheus; |
You can also initialize the values in a matrix using the curly braces, { and }.
matrix <int, 3, 2> neo = { 1, 2, // First row 3, 4, // Second row 5, 6 // Third row }; |
Matrix elements can be accessed using either a zero based notation or a one based notation.
matrix<float, 4, 4> myMatrix = { 1.0f, 2.0f, 3.0f, 4.0f, 2.0f, 3.0f, 5.0f, 7.0f, 1.0f, 2.0f, 6.0f, 24.0f 1.0f, 1.0f, 2.0f, 3.0f }; myMatrix._m00 = 0.0f; // Change first element to 0, 0 based notation myMatrix._11 = 12.9f; // Change first element to 12.9, 1 based notation |
The element indexes begin with an underscore. For zero based notation, the indexes are prefixed with ‘_m’. For one based notation, the indexes are prefixed with ‘_’ only. Note that depending on your display, some of the underscores ‘_’ may appear as spaces in this text; be very careful not to mix the two. In both cases, the prefixes are followed by the index of the row and then column of the element to access.
To set the value of the element in the previous matrix that presently has the 7.0 to 100.0, we could do either of the following.
myMatrix._m23 = 100.0f; myMatrix._34 = 100.0f; |
Matrix elements can also be addressed using the standard C array notation. This is always zero based.
myMatrix[2][3] = 100.0f; |
You can use swizzles to access and set the elements of matrices.
someMatrix._32_12 = someOtherMatrix._11_33; |
This will copy elements [1][1] and [3][3] from someOtherMatrix to elements [3][2] and [1][2] of someMatrix.
Once again, you cannot mix the namespaces, so the one based indexing cannot be mixed with the zero based indexing in a single swizzle. This is illegal.
someMatrix._32_m12 = someOtherMatrix._11_33; |
Also, it does not make sense to set the same element in a matrix to two or more different values. This is also illegal.
someMatrix._32_32 = someOtherMatrix._11_33; |
There are other data types, some of which we have seen. The cbuffer is used to hold data that is constant (from the GPU’s perspective). The Texture2D is used to hold the texture for a pixel shader, and the SamplerState is used to hold data governing how a texture is sampled.
You can create structures with a syntax similar to C structures; the struct keyword is followed by the name and then the elements of the structure in curly braces.
struct someStructure { float x; float y; }; |
Scalar, vector, and matrix data types can all be manipulated using the standard operators (+, -, *, and /). With scalar data these operations perform as expected, but it is important to know that with vector and matrix types, the operators are element by element. This means the standard matrix product is not calculated using the multiplication operator “*”, instead this operator will multiply corresponding elements from the two source operands. To calculate a standard matrix product, we must use the mul intrinsic. We have seen this when multiplying by the model, world, and projection matrices in our code. For a detailed explanation of matrix multiplication, have a look at the following websites.
From Wolfram:
http://mathworld.wolfram.com/MatrixMultiplication.html
From Wikipedia:
http://en.wikipedia.org/wiki/Matrix_multiplication
From Khan Academy:
The GPU has its own processing unit, memory, and architecture. It even has its own machine code with a completely different set of instructions to the CPU. Many of the machine code instructions the GPU understands are designed specifically to assist in 3-D graphics. It is very efficient at matrix operations and operations on vector data types. Many of the instructions the GPU is capable of performing have no operators (like + or * can be used for addition or multiplication). The instructions are invoked using intrinsics that resemble regular function calls. These are special functions designed to closely match the machine code of the hardware. Not all GPUs are capable of the same instructions, and each new generation of GPU is usually capable of more instructions than the previous one.
The following is a small reference of some useful intrinsic instructions available in HLSL. The intrinsics are organized in alphabetical order based on the mnemonic or method name. There are around 140 intrinsics available in the HLSL language in total. Many of the intrinsics not mentioned can save a lot of time. For instance, the “lit” intrinsic calculates the lighting coefficient with a single, extremely fast instruction. There are hyperbolic trigonometric functions, as well as arcsine, arccosine, and arctangent. There are also functions that calculate both sine and the cosine at once. In short, once you are familiar with some of these fundamental intrinsics, you might like to look at the complete list and see some of the more advanced capabilities of the GPU. The entire list of intrinsics can be found at Microsoft’s MSDN.
http://msdn.microsoft.com/en-us/library/windows/desktop/ff471376%28v=vs.85%29.aspx
Mnemonic: ret abs(x) Shader Model: 1.1
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: Returns the absolute value of X. The absolute value is the positive version of the scalar or scalars contains in x. If you use a vector or matrix, the absolute values of each element will be calculated.
Mnemonic: ret ceil(x) Shader Model: 1.1
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: This function returns the smallest integer greater than or equal to the value or values of x. Ceiling rounds numbers up to the nearest integer. If x is a vector or matrix, each element will be rounded up to the nearest integer.
Clamp
Mnemonic: ret clamp(x, min, max) Shader Model: 1.1
Parameters: x, min and max can be scalar, vector, or matrix, but they must be the same type; the calculated ret value will have the same type as the inputs.
Description: This function clamps the value or values of the elements in x to between the corresponding values specified by the min and max parameters. Any element in x that is less than the corresponding element in min will be set to min. Any elements in x that are greater than the corresponding element in max will be set the value in max.
Mnemonic: ret cos(x) Shader Model: 1.1
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function cosine in radians of the value or values in x.
Mnemonic: ret cross(x, y) Shader Model: 1.1
Parameters: x and y must be 3-D float vectors; ret is also a 3-D float vector.
Description: This function calculates and returns the cross product of two 3-D vectors. The return value is a vector that is perpendicular to both the inputs; it returns the normal to a plane that contains the input vectors.
Mnemonic: ret degrees(x) Shader Model: 1.1
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: This function converts the angles in x, which are read as radians to degrees. If x is a vector or matrix, all elements in x have the conversion performed. This is the same as multiplying the values in x by 180/Pi.
Mnemonic: ret distance(x, y) Shader Model: 1.1
Parameters: x and y are vectors of any size; ret is a scalar float.
Description: This function calculates the distance between the two points x and y. x and y must have the same number of elements.
Mnemonic: ret dot(x, y) Shader Model: 1
Parameters: x and y are vectors of any size; ret is a scalar float.
Description: This function calculates the dot product between the two vectors x and y. This is the sum of the products of each of the corresponding elements in x and y. x can be any vector size, but y must match it.
Mnemonic: ret floor(x) Shader Model: 1.1
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: This function returns the largest integer less than or equal to the value or values of x. Floor rounds numbers down to the nearest integer. If x is a vector or matrix, each element will be rounded down to the nearest integer.
Mnemonic: ret length(x) Shader Model: 1.1
Parameters: x is a float vector; ret will be a scalar float.
Description: This function calculates the length or magnitude of the vector x.
Mnemonic: ret max(x, y) Shader Model: 1.1
Parameters: x, y, and ret must all be the same type; they can be scalar, vector, or matrix.
Description: This function selects the maximums or larger of each of the two corresponding elements in x and y and returns them.
Mnemonic: ret min(x, y) Shader Model: 1.1
Parameters: x, y, and ret must all be the same type; they can be scalar, vector, or matrix.
Description: This function selects the minimums or smaller of each of the two corresponding elements in x and y and returns them.
Mnemonic: ret mul(x, y) Shader Model: 1.0
Parameters: There are many overloaded versions of mul for different input types.
Description: This function multiplies matrices, scalars, or vectors. x and y need not be the same data type. Depending on their data types, different operations are performed by the function. For instance, if x is a scalar and y is matrix, then each component in y will be multiplied by the x, thus the matrix will be scaled.
This function is used to perform a standard matrix multiplication. If both inputs are matrices, then the resulting output will be the standard matrix product of the two inputs. Note that the multiplication operator (*) with two matrices as operands will perform an element by element multiplication; it does not calculate the matrix product.
Mnemonic: ret normalize(x) Shader Model: 1.1
Parameters: x is a vector; ret will have the same size and type as x.
Description: This function calculates the normalized vector of x. This vector has the same angle as the input vector, but it has a length of exactly 1.0. This is very useful, as many algorithms and other functions expect normalized vectors.
Mnemonic: ret pow(x, y) Shader Model: 1.1
Parameters: Returns x to the power of y; x and y can be scalars, vectors, or matrices.
Description: Raises x to the power of y; x, y, and ret must all be the same type and size.
Mnemonic: ret radians(x) Shader Model: 1.0
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: This function converts the angles in x, which are read as degrees to radians. If x is a vector or matrix, all elements in x have the conversion performed. This is the same as multiplying the values in x by Pi/180.
Mnemonic: ret rcp(x) Shader Model: 5.0
Parameters: x can be scalar, vector, or matrix; ret will have the same type as x.
Description: This function calculates an approximation of the reciprocal of the element or elements in x; the reciprocal is 1.0 divided by the elements in the parameter x.
Mnemonic: ret reflect(x, y) Shader Model: 1.0
Parameters: x and y are float vectors of the same size; x is the vector of incidence and y is the normal of the surface that x is striking.
Description: This function calculates and returns the reflection vector given a ray of incidence (the x value) and a surface normal (the y value). It calculates the result from the following formula: ret = x-2*y*dot(x, y).
Y is a surface normal vector and should be normalized, or have a length of exactly 1.0.
Mnemonic: ret refract(x, y, z) Shader Model: 1.1
Parameters: x and y are float vectors of the same size; x is the vector of the entering ray, and y is the normal of the surface the ray is entering. z is a scalar that is the refraction index.
Description: This function calculates and returns the refraction vector given the vector of the entering ray, the surface normal that the ray is entering, and the z is the refractive index of the substance which the ray is entering. For instance, water has a refraction index of about 1.333f, air is around 1.000f, and diamond is around 2.419f.
Mnemonic: ret round(x) Shader Model: 1.1
Parameters: x can be floating scalar, vector, or matrix; ret will have the same type as x.
Description: This function rounds the value or values in x to the nearest integers and returns the resulting values.
Mnemonic: ret rsqrt(x) Shader Model: 1.1
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function calculates the reciprocal of the square root of the value or values in x. That is, 1.0/sqrt(x).
Saturate
Mnemonic: ret saturate(x) Shader Model: 1.0
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function saturates and returns the value or values in x. To saturate is to clamp between 0.0 and 1.0. Any values less than 0.0 will become 0.0, and any values greater than 1.0 will become 1.0.
Mnemonic: ret sin(x) Shader Model: 1.1
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function sine in radians of the value or values in x.
Mnemonic: ret tan(x) Shader Model: 1.1
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function tangent in radians of the value or values in x.
Mnemonic: ret sqrt(x) Shader Model: 1.1
Parameters: x can be floating point scalar, vector, or matrix; ret will have the same type as x.
Description: This function calculates and returns the square root of the value or values in x.
Mnemonic: ret trunc(x) Shader Model: 1.0
Parameters: x can be floating scalar, vector, or matrix; ret will have the same type as x.
Description: This function rounds floating point values to integers by truncating or chopping off any digits right of the radix point; in other words, it performs a float to int cast then back to float. It rounds number towards 0.0.