CHAPTER 6
“Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.”
Stan Kelly-Bootle
In this chapter we are going to see how to implement custom data structures in CSCS. These data structures are added to the Variable class. In particular, we are going to see how to add multidimensional arrays and Dictionaries.
We’ll start by looking at actions. We call an action a special class, also deriving from the ParserFunction class, that adds the functionality of using the assignment, increment, and decrement operators.
All of the functionality of CSCS that we’ve seen so far revolves around functions—all of the features of the language that we develop have been by using classes deriving from the ParserFunction class.
But what about simple statements, like a = 1, a++, or a +=5? It’s not immediately clear how we can use the ParserFunction class here, so we need to do a bit more of work here.
First, we define an abstract ActionFunction class, deriving from the ParserFunction class, in Code Listing 55. Then we create three concrete implementations of this class. Check out the UML diagram in Figure 4.
Code Listing 55: The abstract ActionFunction class
public abstract class ActionFunction : ParserFunction |
We register the new functions with the parser as follows:
public const string ASSIGNMENT = "=";
public const string INCREMENT = "++";
public const string DECREMENT = "--";
ParserFunction.AddAction(Constants.ASSIGNMENT, new AssignFunction());
ParserFunction.AddAction(Constants.INCREMENT,
new IncrementDecrementFunction());
ParserFunction.AddAction(Constants.DECREMENT,
new IncrementDecrementFunction());
Each time we extract a =, ++, or -- token, its corresponding function will be called. For that to work, we need to introduce actions in the ParserFunction class. Code Listing 56 extends the “virtual” constructor of the ParserFunction class that we previously saw in Code Listing 7.
There are two additions to the constructor: when working with extracted token, we first check whether it’s a registered action, and then, whether it’s an array. We’ll be dealing with arrays in the next section.
Code Listing 56: The ParserFunction “virtual” constructor revisited and Actions
public class ParserFunction { // Global actions - function: new Dictionary<string, ActionFunction>(); // A "virtual" constructor char ch, ref string action) { (ch == Constants.START_ARG || !script.StillValid ())) { // New code: m_impl = GetRegisteredAction(item, ref action); m_impl = GetArrayFunction(item, script, action); // As before... } public static void AddAction(string name, ActionFunction action) public static ActionFunction GetAction(string action) public static ParserFunction GetRegisteredAction(string name, ref string action) { if (actionFunction == null) { ActionFunction; } |
The GetRegisteredAction method in Code Listing 56 is just a wrapper over the GetAction method, which checks if the passed token corresponds to an action that was registered before with the parser. In addition, the GetRegisteredAction method creates a new instance of the object that implements the action.
Why would we need a new instance of an object? Because actions can be called recursively one from another, and if we used the same object, then there could be discrepancies with its class member variables. If we used the same instance of the object for the assignment operator, we could have the following problem. For the expression a = (b = 1), the member variable m_name will be first assigned the value of a, but then m_name will be changed to b (when parsing the expression b=1. See Code Listing 57).
I actually spent some time on this issue (using the same object in the assignment operator), so I hope you won’t step on the same rake.
The same problem can occur in general with any class deriving from the ParserFunction class; that’s why the ParserFunction class has a method called NewInstance, which by default returns the same object. So by default, all ParserFunction use the same instance of an object. Generally, it’s not a problem—even if we have an expression like sin(sin(sin(x))), the same SinFunction object instance will be called three times. But SinFunction class has no internal members that can be modified by calling the Evaluate method, so we are safe when using the default version of the NewInstance method. If you are going to use and modify member variables inside of the Evaluate method, then you should also return a new instance of the object in the NewInstance member, analogous to Code Listing 56.
Code Listing 57 shows an implementation of the assignment operator. There are two cases. If an assignment looks like a = b (if the variable b does not have any array indices), we just register (or, possibly, re-register) the variable named a with the parser, matching it with the value of b. The value of b can be any complex expression. We get it in this statement:
Variable varValue = Utils.GetItem(script);
The Utils.GetItem can recursively call the whole Split-and-Merge algorithm.
Another case is when the assignment looks like a = b[i], where we are dealing with an array element on the right-hand side. We are going to see this case in the next section.
Code Listing 57: The implementation of the AssignFunction class
class AssignFunction : ActionFunction Variable varValue = Utils.GetItem(script); new GetVarFunction(varValue)); ExtendArray(array, arrayIndices, 0, varValue); new GetVarFunction(array)); override public ParserFunction NewInstance() } |
Code Listing 57 doesn’t have a definition of the ExtendArray method. We are going to look at it when we study arrays in the next section.
“Programming without arrays is like swimming without trunks. It works, but for most people, it's ugly.”
Unknown
Arrays in CSCS don’t have to be declared—everything will be deduced from the context. Also, when assigning a value to an array element, the previous elements don’t have to exist. Why should they? Code Listing 3 already contains a class member variable, which we called m_tuple:
private List<Variable> m_tuple;
This is how we internally represent an array. But we haven’t seen yet how to use arrays in our language. Here we are going to explore how to define and use arrays with an unlimited number of dimensions.
First of all, here are two definitions from the Constants class, which tell the parser how to define an index of an array:
public const char START_ARRAY = '[';
public const char END_ARRAY = ']';
With these definitions, we define how to access an array element: a[0][1] accesses the first element of the variable a, which must be of type ARRAY, and then the second element of the variable a[0], which also must be of type ARRAY. (We decided to start array indices at 0, not 1, nor Stan Kelly-Bottle’s compromise of 0.5. Besides, CSCS is based on C#, so starting at zero is best).
There are a few places where we need to do some changes. One of them is the GetVarFunction class, which we saw in Code Listing 48. Code Listing 58 shows a more complete version of the GetVarFunction class, now also for the array elements.
Code Listing 58: The implementation of the GetVarFunction class
class GetVarFunction : ParserFunction public List<Variable> Indices { |
Code Listing 58Error! Reference source not found. uses an auxiliary Utils.GetArrayIndices method, which is shown in Code Listing 59. It works as follows: if the argument varName is, for instance, a[0][1], then it will return a list of two variables (one for each index); it will also set argument varName to string a (the name of the array), and the argument end to 6 (the index of the last closing square bracket).
An empty list will be returned if the varName argument isn’t an array element. Note that the Split-and-Merge algorithm can be called recursively when an index of an array is an expression itself.
Code Listing 59: The implementation of the Utils.GetArrayIndices method
public static List<Variable> GetArrayIndices(ref string varName, ref int end) { |
The argument end in the GetArraIndices method will be assigned to the m_delta member variable of the GetVarFunction class. We need delta in order to know the length of the variable, so we can skip that many characters, once the assignment is done.
The ParserFunction virtual constructor in Code Listing 56Error! Reference source not found. also calls the GetArrayFunction method. The implementation of this method is shown in Code Listing 60. This method checks if the passed argument name is an array element, and, if it is, returns the corresponding GetVarFunction object. Remember that the GetVarFunction class is a wrapper over a CSCS variable, and it’s used whenever we register a local or a global variable with the parser. We are going to see more examples of using this class later in this chapter.
Code Listing 60: ParserFunction.GetArrayFunction method
public static ParserFunction GetArrayFunction(string name, ParsingScript script, string action) { List<Variable> arrayIndices = Utils.GetArrayIndices(ref arrayName, ref delta); |
When assigning values to an element of an array in the AssignFunction.Evaluate method (see Code Listing 57), the ExtendArray method is invoked. It’s a crucial method for dealing with multidimensional arrays; check it out in Code Listing 61. This method may call itself recursively and permits creating a new array with any number of dimensions on the fly!
Code Listing 61: The implementation of the AssignFunction.ExtendArray method
public static void ExtendArray(Variable parent, { Variable index = arrayIndices[indexPtr]; static int ExtendArrayHelper(Variable parent, Variable indexVar) int arrayIndex = parent.GetArrayIndex(indexVar); |
For example, if you have an expression in CSCS:
x[3][7][4] = 10;
Note: A three-dimensional array will be created on the fly (even if you never mentioned the variable x before in the CSCS code).
The ExtendArray method would be invoked recursively three times for this example, each time for each of the dimensions. The ExtendArrayHelper method will initialize all the unused array elements to an EmptyVariable (with variable type NONE).
For example, the array element x[1][1][1] will be used as an empty string, if used in a string manipulation, or as a number 0, if used in a numerical calculation. The ExtendArrayHelper method also invokes the Variable.GetArrayIndex and SetHashVariable methods. We’ll show them later on, when learning how to implement dictionaries in CSCS (see Code Listing 66).
Another thing you could do is to use the array indexes as the hash keys to a dictionary—even though the access to an element of a dictionary is a bit slower, you could save quite a bit on memory usage, especially if you do assignments like a[100]=x, and never access elements of the array having indices less than 100. We’ll discuss dictionaries at the end of this chapter.
So one can start using an array even without declaring it. Note that the array elements don’t have to be of the same type. It’s perfectly legal in CSCS to have statements like:
x[3][7][5] = “blah”;
If you think this treatment of arrays is way too bold, you can add your restrictions in the following AssignFunction methods: Evaluate, ExtendArray, and ExtendArrayHelper, shown in Code Listing 57 and Code Listing 61.
The IncrementDecrementFunction.Evaluate method shown in Code Listing 62 implements both the prefix and the postfix operators (each one using either the ++ or the -- form).
In case of a prefix form (when the operator looks like ++a), we have just collected the ++ token, so we need to collect the variable name next. On the other hand, in the postfix form (when the operator looks like a++), we already know the variable name that has been assigned to the m_name member variable. This happened in the ParserFunction.GetRegisteredAction function (which was invoked from the ParserFunction virtual constructor in Code Listing 56).
Code Listing 62: The implementation of the IncrementDecrementFunction.Evaluate method
protected override Variable Evaluate(ParsingScript script) if (prefix) { // If it is a prefix we do not have the variable name yet. // Value to be added to the variable: script.TryCurrent () == Constants.START_ARRAY) { Variable element = Utils.ExtractArrayElement(currentValue, arrayIndices); else { // A normal variable. |
By compound assigning operators, we mean all possible variants of a += b, a -= b, a *= b, and so on. Code Listing 63 implements the OperatorAssignFunction class. The OperatorAssignFunction.Evaluate method will be invoked as soon as we get any of +=, -=, *=, etc. tokens. Therefore, we must register each of these tokens with the parser:
for (int i = 0; i < Constants.OPER_ACTIONS.Length; i++) {
ParserFunction.AddAction(Constants.OPER_ACTIONS[i],
new OperatorAssignFunction());
}
Where the operator actions string array is defined in the Constants class as:
public static string[] OPER_ACTIONS = { "+=", "-=", "*=", "/=",
"%=", "&=", "|=", "^="};
The implementation of all these actions is in the OperatorAssignFunction.NumberOperator method in Code Listing 63. The StringOperator contains just the implementation of the += operator since the rest of the operators did not make much sense to me, but you can add as many fancy operators as you like.
Code Listing 63: The implementation of the OperatorAssignFunction class
class OperatorAssignFunction : ActionFunction if (arrayIndices.Count > 0) { // array element static void StringOperator(Variable valueA, override public ParserFunction NewInstance() |
Let’s see how to implement the for control flow statement. We are going to look at it here, since besides the canonical loop for (initStep; loopCondition; loopUpdate), we’re also going to implement a simpler for loop for arrays, already present in many languages: for (item : array). Code Listing 64 implements both for loops.
Code Listing 64: The implementation of the Internal.ProcessFor method
internal Variable ProcessFor(ParsingScript script) Constants.END_ARG); script.Forward(); void ProcessArrayFor(ParsingScript script, string forString) string varName = forString.Substring (0, index); int cycles = varValue.TotalElements(); void ProcessCanonicalFor(ParsingScript script, string forString) "Expecting: for(init; condition; loopStatement)"); int startForCondition = script.Pointer; Constants.END_STATEMENT); Constants.END_STATEMENT); Constants.END_STATEMENT); while(stillValid) { if (MAX_LOOPS > 0 && ++cycles >= MAX_LOOPS) { script.Pointer = startForCondition; if (result.IsReturn || result.Type == Variable.VarType.BREAK) { loopScript.Execute(); |
The declaration of the for loop and registering it with the parser is very similar to what we did in “Implementing the while ” section in Chapter 3 Basic Control Flow Statements.
Code Listing 65 shows examples of using the for loops in CSCS. It also has a degraded version of a for loop, where all of the arguments are empty.
Code Listing 65: Different for loops, implemented in CSCS
for (i = 10; i >= 0; i--) { i = 10; arr[i] = 2 * i; for (item : arr) { // Output: |
Recall that a dictionary is a data structure that maps a key to a value. A hash table is a particular case of a dictionary, where each key is mapped to a unique value using hash codes. We’ll use the C# dictionary data structure that is implemented as a hash table. We’ll use strings as keys, and delegate the work of converting a string to a hash code to C#. So basically, our dictionary will be a thin wrapper over the C# dictionary. We are going to reuse the same data structure we used for arrays to store the dictionary values. The dictionary will just map the key to the array index where the value is stored. Code Listing 66 shows how to do this.
Code Listing 66: The implementation of the dictionary in the Variable class
private Dictionary<string, int> m_dictionary = new Dictionary<string, int>(); public int SetHashVariable(string hash, Variable var) public int GetArrayIndex(Variable indexVar) if (indexVar.Type == VarType.NUMBER) { if (m_dictionary.TryGetValue(hash, out ptr) && return -1; |
It defines a dictionary m_dictionary with the key being a string and the value being an integer, which is a pointer to the index of the List<Variable> m_tuple data structure for storing the array elements that we defined earlier in Code Listing 3. So everything is stored in the m_tuple list. m_dictionary is just an auxiliary data structure to keep track of where a particular variable is stored in m_tuple.
If indexVar is a number passed to GetArrayIndex, it’s treated as an index in m_tuple, and the corresponding value is returned. When indexVar is a string, we assume it’s a key to m_dictionary and get its corresponding value first. Then we use that value as an index to access m_tuple. The method SetHashValue does the opposite of what GetArrayIndex does. Both methods were already used within the ExtendArrayHelper method in Code Listing 61.
Using this implementation, we can combine string keys and integer indices in the same data structure in CSCS, for example:
a["bla"][1] = 100;
Note that we can use string keys and integer indices interchangeably, even on the same dimension in an array. So, for example, it’s also legal to declare:
b["bla"] = 1; b[5] = 2;
Isn’t that cool?
Dynamic programming is a special software engineering technique that consists in splitting a large problem into smaller pieces, solving each small piece and storing its result somewhere to be reused later on. As an example of applying this technique in CSCS, let’s see the implementation of the Fibonacci numbers. Recall that each Fibonacci number is equal to the sum of the two previous Fibonacci numbers (the first two Fibonacci numbers are defined as 1). The recursive calculation of the Fibonacci numbers in CSCS is shown in Code Listing 67.
Code Listing 67: The implementation of the Fibonacci numbers in CSCS
cache["fib"] = 0; return result; a=fibonacci(10); print(a); |
We use the cache variable as a dictionary on the first dimension, and as an array on the second dimension. It stores intermediate results in memory for later use. Note that without using the cache data structure, the calculation would’ve taken much longer, because for each number n we split calculation in two: fibonacci (n – 2) and fibonacci (n - 1), each of them using same previous Fibonacci numbers. But if we store all of the intermediate results, those values should be already in cache, so we do not have to recursively calculate each of them.
In this chapter we saw how to implement some of the data structures in CSCS. Partularly, we saw the implementation of arrays with an unlimited number of dimensions, and how to combine an array and a dictionary in the same data structure.
In the next chapter we are going to see how to write a CSCS script in any human language.