left-icon

ScriptDOM Succinctly®
by Joseph D. Booth

Previous
Chapter

of
A
A
A

CHAPTER 3

Tokens


The abstract syntax tree returned by the parser call contains a list of statements from the SQL code. In this chapter, we will explore the script and statement objects and the information they contain.

TSqlScript class

The tree object returned from the Parse method is a type TSqlScript. The following properties are defined for the class:

Batches property

If more than one script appears (separated by the GO command), each script will have its own TSqlBatch object. Table 1 shows the properties of the TSqlBatch class.

Table 1: TSqlBatch class

Property

Type

Description

FirstTokenIndex

Int

First element # in the list of tokens found in the script

FragmentLength

Int

Length of the text from the script

LastTokenIndex

Int

The last element # in the list of tokens

ScriptTokenStream

List

Collection of parser tokens

StartColumn

Int

Column position in the source file

StartLine

Int

Line in the source file

StartOffset

Int

Offset (byte #) in the original source file

Statements

List

List of statement types (Create procedure, select, etc.)

The Statements list will contain a variety of different statements. This will be a nested tree of the type of SQL statements it finds. We will explore this in detail in Chapter 4.

Position properties

The StartOffset and FragmentLength integer properties provide the offset and size in the script of the specific element. The StartLine and StartColumn integer properties provide similar information for the line and column number in the script.

Token properties

The FirstTokenIndex and LastTokenIndex integer properties provide the starting and ending element numbers in the ScriptTokenStream collection. The ScriptTokenStream property is a list of TSqlParserToken objects derived from the script.

TSqlParserToken class

The ScriptTokenStream list contains a collection of all parser tokens found in the script. Each element is a TSqlParserToken object, which contains the properties listed in Table 2.

Table 2: TSqlParserToken properties

Property

Type

Description

Column

Int

Column position in the source file

Line

Int

Line in the source file

Offset

Int

Offset (byte #) in the original source file

Text

String

Text content of the token

TokenType

TSql Token Type

Enumerated token type; for example:

·     Single line comment

·     White space, text contains space, CR, LF, etc.

·     Identifier (schema, procedure name, etc.)

·     Variable: a variable or parameter

·     Declare: declare variable statement

·     Various statements and fragments:

o     SELECT, FROM, WHERE, etc.

The list of tokens is arranged as found in the script. There is no implied relationship between the list; it is simply a sequential list of the tokens from the code. You can see the complete list of token types at TSqlTokenType Enum.

Using tokens

Although you could process the token list for individual code snippets, the statements and visitor pattern (see later chapters) are better options for processing code statements. Tokens can be used for aggregate data about the code, such as the number of variables or the percentage of comments.

God procedures

A God procedure (or code) is a routine that is generally doing too much. In programming, it is usually better to write small code modules rather than a single monolithic code component. Smaller modules are easier to debug and much more likely to be reusable. We might want to see a couple of things to check for to determine if our procedure might be too big. How big is it? How many tables?

Listing 12 shows a sample stored procedure to compute payroll. We are going to use the token list to look for certain parameters that might suggest a code review is needed.

Code Listing 12: Compute payroll

        --  Chap 3 Sample procedure

        /*

        Purpose: Compute payroll amount, update employee records,

                 email data file to vendor, process result from vendor

           Date: 02-FEB-2021

        */

        CREATE PROCEDURE dbo.ComputeEntirePayroll(@PayrollDate date)

        AS

        BEGIN

             DECLARE @15DaysAgo Date = DATEADD(day,-15,@payrollDate )

             DECLARE @PayAmount MONEY;

       

             SELECT em.Id,em.HourlyRate*ts.HoursWorked as PayAmount,em.SSN

             INTO #tmpPayroll

             FROM employeeMaster em

             JOIN timesheets  ts on ts.employeeId =  em.id

             WHERE ts.workdate between @15DaysAgo and @payrollDate

       

             -- Flag timesheet as processed

             UPDATE timesheets SET payDate =@payrollDate

             WHERE workdate between @15DaysAgo and @payrollDate

       

             -- Update employee and send payroll data to vendor

            DECLARE payCursor CURSOR

               FOR SELECT * FROM #tmpPayroll;

       

            OPEN payCursor

            -- Code to use cursor...

            CLOSE payCursor

            DEALLOCATE payCursor

        END

You could visually look at the code and probably decide whether to review it or not, but with a large number of procedures, it could be helpful to have some automated ways to suggest which ones to review.

Tip: For a quick test, see if you can determine the bug that causes people to be overpaid. See Chapter 18 if you want the answer.

Analyzing the code

We are going to take our tree (returned from the Parse method) and compute some summary values about the code.

Code Listing 13: Create the token list

        // Ask ScriptDOM to parse it

        TSqlScript tree = (TSqlScript)parser.Parse(rdr, out errors);

Once we have the tree object, we can use LINQ (Language Integrated Query) to collect some statistics about the code. Listing 14 illustrates some queries.

Code Listing 14: LINQ to get info

// Gather information about the code

var sm = tree.ScriptTokenStream;   

int numberComments = sm.Count((cc) =>

                     cc.TokenType ==  TSqlTokenType.SingleLineComment ||

                     cc.TokenType == TSqlTokenType.MultilineComment);

int numberSelect = sm.Count((cc) => cc.TokenType == TSqlTokenType.Select);

bool anyCursors = sm.Any((cc) => cc.TokenType == TSqlTokenType.Cursor);

// Since the table is an identifier (which can be multiple items),

// we simply count FROM and JOIN to get table references)

int numberTables = sm.Count((cc) =>

                   cc.TokenType == TSqlTokenType.From ||

                   cc.TokenType == TSqlTokenType.Join);

// Compute percent of comments over entire script length

double commentsRatio = sm.Where((cc) =>

cc.TokenType == TSqlTokenType.SingleLineComment ||

cc.TokenType == TSqlTokenType.MultilineComment).Sum((cc) => cc.Text.Length) * 1.0 / tree.FragmentLength;

// Number of statements that manipulate data

int numberUpdateStatements = sm.Count((cc) =>

cc.TokenType == TSqlTokenType.Insert ||

                   cc.TokenType == TSqlTokenType.Delete ||

                   cc.TokenType == TSqlTokenType.Update); 

Note: You can download the LINQ Succinctly ebook by Jason Roberts from the Syncfusion website if you want to learn more about LINQ.

With these values, we can now make a quick decision about whether we should review the code. Figure 5 shows sample console output from the analysis of the stored procedure.

Stored proc analysis

Figure 5: Stored proc analysis

You can use the various values to determine if the procedure might be too large or needs a review. In this example, issues like cursors, multiple select statements, and low comment percentage might suggest a code review.

Summary

The Parse method provides both a sequential list of tokens (Script Token Stream) and a list of statements (Statements property in the batches). Tokens in a list collection can be very useful for getting basic information about the code, such as how big it is and how many comments it has.

You can write code to explore the statements collection as part of any script analysis (see Chapter 4), and the visitor pattern (Chapter 5), which makes processing individual statements easier.

Scroll To Top
Disclaimer

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.