CHAPTER 5
Although we covered the information schema views back in Chapter 2, SQL Server provides additional views and procedures we can use to view our tables and columns in more depth. In this chapter, we will look at the sys.tables and various sys column views to provide some table and column analysis.
Note: Many of these views are derived from sys.objects. Every “object” in a SQL database (tables, procedures, triggers, etc.) has an object ID associated with it, and sys objects hold the information about the object. The OBJECT_NAME() SQL function returns an object name from the object_id parameter.
We can join the sys.tables view with other views to determine information about our table design. This allows us to perform additional table analysis beyond the basic information schema views. When referencing the sys.tables view, we can use the object_schema_name and object_name SQL functions to create a table name (schema + table name).
This table holds information about tables that have identity columns present.
Code Listing 27: Identity columns
--------------------------------------------- -- Script: Identity_columns.sql -- List all identity columns --------------------------------------------- SELECT OBJECT_SCHEMA_NAME(st.object_id)+'.'+st.name AS [TableName], ic.name AS KeyName, t.name AS dataType, ic.seed_value,ic.increment_value, isNull(ic.last_value,0) AS Last_Value FROM sys.tables st JOIN sys.identity_columns ic ON ic.object_id=st.object_id JOIN sys.types t ON t.system_type_id=ic.system_type_id ORDER BY [TableName] |
This listing shows tables that have an Identity column, along with the seed value and increment for the column. If the table has rows, the Last_value column will report the last seed number.
Tip: You can use the DBCC CHECKIDENT ('[TableName]', RESEED, 0) command to reset an identity column’s value.
This view allows you to get the definitions for any computed columns in the database. The definition will be a SQL expression that is used to provide the column value.
Code Listing 28: Computed columns
--------------------------------------------- -- Script: Computed_Columns.sql -- List all computed columns --------------------------------------------- SELECT OBJECT_SCHEMA_NAME(st.object_id)+'.'+st.name AS [TableName], cc.name AS KeyName, t.name AS dataType, cc.definition AS [Column_Definition],cc.is_persisted FROM sys.tables st JOIN sys.computed_columns cc on cc.object_id=st.object_id JOIN sys.types t on t.system_type_id=cc.system_type_id ORDER BY [TableName] |
Note: If a computed column is persisted, the value will be stored on disk, and can be used for indexing, checking constraints, etc. It will be updated when the data is updated. If it is not persisted, the value will be virtual and will be computed every time the column is referenced.
A database table may have a default value to provide during an INSERT when the corresponding field is NULL. The following script lists all the default constraints and the table and columns they are found in.
Code Listing 29: Default constraints
--------------------------------------------- -- Script: Default_Constraints.sql -- List all default constraints --------------------------------------------- SELECT object_schema_name(dc.parent_object_id)+'.'+ object_name(dc.parent_object_id) as TableName, c.name as ColumnName, dc.definition FROM sys.default_constraints AS dc INNER JOIN sys.columns AS c ON dc.parent_object_id = c.object_id AND dc.parent_column_id = c.column_id ORDER BY tableName,columnName |
Note: If a column has a default of NewID() to generate a new UniqueIdentifier and this column is part of an index, you should consider using NewSequentialId() instead. Because NewID() generates a random unique identifier, it is likely to cause an index split, as it attempts to insert the record into the index. (The NewSequentialID() function generates a unique identifier higher than the prior one.) So while you can improve creation performance with NewSequentialID, it is also a privacy risk to use sequential (hence predictable) identifiers.
This script will identify all the columns that are used as indexes in the various tables. This can be a handy way to determine whether a new search you want to add is already indexed in the table (improving performance).
Code Listing 30: Index columns
--------------------------------------------- -- Script: Index_columns.sql -- List all indexed columns --------------------------------------------- SELECT st.name as TableName ,i.name as IndexName ,COL_NAME(ic.object_id,ic.column_id) AS ColumnName FROM sys.indexes AS i INNER JOIN sys.index_columns AS ic ON i.object_id = ic.object_id AND i.index_id = ic.index_id JOIN sys.tables st on st.object_id=ic.object_id ORDER BY [TableName],ic.index_column_id,ColumnName |
You can add the WHERE expression ic.index_column = 1 to find only those columns that are the first expression in the index. You could also specify all columns you need to retrieve to see whether there is a covering index for your search parameters.
This script is used to identify the columns that are being used as primary keys in a table. It shows the table and index name, as well as the column(s) making up the key.
Code Listing 31: Key constraints
--------------------------------------------- -- Script: Key_constraints.sql -- List all key constraint columns --------------------------------------------- select object_schema_name(tb.object_id)+'.'+tb.name as [TableName], object_name(kc.object_id) as IndexName, sc.name as ColumnName from sys.tables tb join sys.key_constraints kc on kc.parent_object_id=tb.object_id join sys.index_columns ic on ic.object_id=kc.parent_object_id and kc.unique_index_id=ic.index_id join sys.columns sc on sc.object_id=ic.object_id and ic.index_column_id=sc.column_id where tb.type='U' and kc.type='PK' order by tableName,ic.index_column_id |
A check constraint is a SQL expression that is applied to a column to validate the type of data allowed in that column. For example, you might use the following LIKE expression to ensure a zip code field only contains five digits.
zip LIKE '[0-9][0-9][0-9][0-9][0-9]'
Code Listing 32 will list all columns in the database that have check constraints applied to them.
Code Listing 32: Check constraints
--------------------------------------------- -- Script: Check_constraints.sql -- List all check constraints --------------------------------------------- SELECT object_schema_name(dc.parent_object_id)+'.'+ object_name(dc.parent_object_id) as TableName, c.name as ColumnName, dc.definition FROM sys.check_constraints AS dc INNER JOIN sys.columns AS c ON dc.parent_object_id = c.object_id AND dc.parent_column_id = c.column_id ORDER BY tableName,columnName |
In SQL 2016, a new feature called Dynamic Data Masking was added, which provides the ability to apply masked to a column, so users querying the data will not see the actual column contents. To create a masked column, you add the MASKED WITH (expression) to the table column. For example, to make an email column, you could use the following.
Email VARCHAR(150) MASKED WITH (Function = ‘email()’)
When the Email column appears in a query, it will be displayed as [email protected]. It is a handy feature for simple security in a database. You can identify all the masked columns in a database using the following query.
Code Listing 33: Masked columns
--------------------------------------------- -- Script: Masked_Columns.sql -- List all masked columns --------------------------------------------- if SERVERPROPERTY('ProductMajorVersion')>='13' SELECT object_schema_name(tb.object_id)+'.'+tb.name as [TableName], c.name AS column_name, c.masking_function FROM sys.masked_columns AS c JOIN sys.tables AS tb ON c.object_id = tb.object_id WHERE is_masked = 1 ELSE SELECT 'Requires SQL 2016 or higher' |
You can combine the various queries to produce a report of tables and columns in your system, along with column information. Code Listing 34 shows a table/column reporting script.
Code Listing 34: Table/column detail script
--------------------------------------------- -- Script: Column_Report.sql -- List column details --------------------------------------------- SELECT object_schema_name(tb.object_id)+'.'+ object_name(tb.object_id) as TableName, c.name as ColumnName, isNull(pk.PK,'') as IsKey, isNull(ic.IdentityColumn,'') as Identity_Column, isNull(cc.CheckConstraint,'') as Check_Constraint, isNull(dc.DefaultConstraint,'') as Default_Constraint FROM sys.tables as tb JOIN sys.columns AS c on c.object_id=tb.object_id LEFT JOIN ( select ic.object_id,ic.index_column_id,'PRIMARY' as PK from sys.tables tb join sys.key_constraints kc on kc.parent_object_id=tb.object_id join sys.index_columns ic on ic.object_id=kc.parent_object_id and kc.unique_index_id=ic.index_id where tb.type='U' and kc.type='PK' ) pk on pk.object_id=tb.object_id and pk.index_column_id=c.column_id LEFT JOIN ( SELECT ic.object_id,ic.name,ic.name+' identity('+ cast(ic.seed_value as varchar(10))+','+ cast(ic.increment_value as varchar(10))+') ' as IdentityColumn from sys.tables st join sys.identity_columns ic on ic.object_id=st.object_id ) ic ON tb.object_id=ic.object_id and c.name=ic.name LEFT JOIN ( SELECT dc.parent_object_id,dc.parent_column_id, dc.definition as CheckConstraint FROM sys.check_constraints AS dc INNER JOIN sys.columns AS c ON dc.parent_object_id = c.object_id AND dc.parent_column_id = c.column_id ) cc on cc.parent_object_id=tb.object_id and cc.parent_column_id=c.column_id LEFT JOIN ( SELECT dc.parent_object_id,dc.parent_column_id, dc.definition as DefaultConstraint FROM sys.default_constraints AS dc INNER JOIN sys.columns AS c ON dc.parent_object_id = c.object_id AND dc.parent_column_id = c.column_id ) dc on dc.parent_object_id=tb.object_id and dc.parent_column_id=c.column_id ORDER BY tableName,column_id |
When this script is run, it will produce a report of all table names and columns, and indicate which columns are primary keys, identity columns, constraints, and so on.
In early versions of SQL, there were text and image columns called text, nText, and image. These columns (while still supported) were deprecated in SQL Server 2005. The following script allows you to search for deprecated column types and indicates the appropriate replacement column type.
Code Listing 35: Deprecated columns
--------------------------------------------- -- Script: Deprecated_columns.sql -- List all columns with deprecated types --------------------------------------------- select t.name,c.name as ColName, 'Deprecated: '+ CASE WHEN tp.name = 'text' then 'Replace [text] with varchar(max)' WHEN tp.name = 'ntext' then 'Replace [ntext] with nvarchar(max)' WHEN tp.name = 'image' then 'Replace [image] with varbinary(max)' ELSE 'Table contains Text,nText, or Image fields' END as Msg from sys.columns c join sys.tables t on c.object_id=t.object_id join sys.types tp on tp.user_type_id=c.user_type_id where t.is_ms_shipped=0 and tp.name in ('text','ntext','image') |
If you are using any of these column data types, you should plan on changing the data type to keep current with SQL Server.
SQL Server and most database servers perform much better with integer values, rather than a numeric data type. If a decimal or numeric column has a scale of 0 (no decimal place), you should consider replacing that column with the equivalent integer column. Code Listing 36 will search for any numeric columns with a zero scale and suggest the equivalent integer column type.
Code Listing 36: Suggest integer columns
--------------------------------------------- -- Script: Suggest_integers.sql -- Convert numeric columns to integers --------------------------------------------- SELECT tb.table_schema, tb.table_name, tc.column_name as colname, CASE WHEN numeric_precision <= 2 THEN 'Convert '+tc.Data_type+'('+ CAST(numeric_precision as varchar)+',0) to tinyint data type' WHEN numeric_precision <= 4 THEN 'Convert '+tc.Data_type+ '('+CAST(numeric_precision as varchar)+',0) to smallint data type' WHEN numeric_precision <= 9 THEN 'Convert '+tc.Data_type+ '('+CAST(numeric_precision as varchar)+',0) to int data type' WHEN numeric_precision <= 18 THEN 'Convert '+tc.Data_type+ '('+CAST(numeric_precision as varchar)+',0) to bigint data type' ELSE 'Consider using an integer data type' END as Msg FROM INFORMATION_SCHEMA.columns tc JOIN INFORMATION_SCHEMA.tables tb ON tb.table_name = tc.table_name and tb.table_schema = tc.table_schema WHERE tb.Table_Type='BASE TABLE' AND tc.data_type IN('numeric','decimal') AND tc.numeric_scale = 0 AND tc.numeric_precision <= 18 ORDER BY tb.table_schema,tb.table_name,tc.column_name |
The float and real column types in a database are approximations, rather than exact values. Generally, graphic applications use floats for smaller storage requirement, and can accept the loss of precision. So, while a float or real data type might be necessary, you should review your usage to make sure it is necessary.
Code Listing 37 searches for float and real columns in your database tables.
Code Listing 37: Real and float columns
--------------------------------------------- -- Script: SearchFloatColumns.sql -- Identify float and real columns --------------------------------------------- SELECT schema_name(o.schema_id) AS SchemaName, o.name AS TableName,c.name AS columnName, t.Name AS ColumnType FROM sys.all_columns c JOIN sys.objects o ON c.object_id=o.object_id JOIN sys.types t ON t.user_type_id=c.user_type_id WHERE t.name IN ('float','real') AND o.type='U' |
You can read this article to see if floating point arithmetic is necessary for your application.
There are often columns in a database table that hold standard information (such as phone numbers, email addresses, and state codes). Sometimes, these columns have unexpected sizes. (For example, one system used a varchar(max) to store phone numbers).
Code Listing 38 shows all columns using varchar or nvarchar max, even though the column name suggests it is not needed. You should adjust the list of searched column names for more common suggestions, based on your knowledge of your application.
Code Listing 38: Max columns check
--------------------------------------------- -- Script: VarChar_Max_check.sql -- Maximum columns that might not be needed --------------------------------------------- SELECT schema_name(o.schema_id) AS SchemaName, o.name AS TableName,c.name AS columnName, t.Name+'(max)' AS ColumnType FROM sys.all_columns c JOIN sys.objects o on c.object_id=o.object_id JOIN sys.types t on t.user_type_id=c.user_type_id WHERE t.name like '%varchar%' and c.max_length < 0 AND o.schema_id <> 4 AND (c.name LIKE '%phone%' or c.name LIKE '%address%') |
Another scenario seen in databases is columns that are holding dates, but not using a date column type. Code Listing 39 searches for such a column type.
Code Listing 39: Date columns using other data types
------------------------------------------------------- -- Script: Date_columns_check.sql -- Date values possibly stored in nondate columns ------------------------------------------------------- SELECT schema_name(o.schema_id) AS SchemaName, o.name AS TableName,c.name AS columnName, t.Name AS ColumnType FROM sys.all_columns c JOIN sys.objects o on c.object_id=o.object_id JOIN sys.types t on t.user_type_id=c.user_type_id WHERE t.name not like '%date%' AND (c.name LIKE '%date%') AND o.schema_id <> 4 ORDER BY SchemaName,TableName,columnName |
Note that in both these queries, we are filtering out schema_id 4 (the sys schema).
You can use the various views to optimize your columns, hopefully identifying problematic columns and, where possible, simplifying the data types. SQL Server is a powerful tool, and by giving it the best column types and size, you can improve database integrity and performance.