CHAPTER 6
SQL Server is a dynamic system that is constantly running queries, scheduled jobs, and system maintenance. In this chapter, we are going to look at views and functions that allow us a peek into some of the processes and work happening on the server.
The view sys.sysprocesses provides a list of all connections currently open on the server. Table 29 lists some of the columns you can use to query this view.
Table 29: Sys Processes
Column name | Description |
|---|---|
spid | SQL Server session ID. |
kpid | Windows thread ID. |
blocked | spid of session blocking this process. |
waittime | How long process has been waiting (milliseconds) or 0. |
lastwaittype | String description of last wait encountered. |
dbid | Database ID (use db_name() to see name) of database. |
cpu | Cumulative CPU usage time for this process. |
physical_io | Cumulative disk reads/writes. |
memusage | Current number of memory pages allocated to process. |
login_time | When this process was logged in. |
last_batch | Last time a statement was run by process. |
open_tran | Current number of open transactions used by this process. |
status | String description of current status: Running Background Runnable Sleeping |
hostname | Name of the workstation. |
program_name | Name of the application. |
cmd | Type of command being executed (SELECT, DELETE, etc.). |
nt_domain | Windows domain, if using Windows authentication. |
nt_username | Username, if Windows authentication or trusted connection. |
loginname | User’s login name. |
sql_handle | Memory pointer to the currently executing command. |
stmt_start | Offset into handle of current statement. |
stmt_end | Ending offset for current statement. |
The information in this table provides the ability to determine what exactly the server is doing, and who is doing it. Some example usages appear in the next few queries. Note that dbid of 1 through 4 are system databases, so activity in those databases is typically done by SQL Services. Database ID number 2 is tempdb, which might be worth checking out if you hit performance issues.
SQL Management Studio allows users to run queries, updates, etc., in a database. Typically, developers and database administrators will be using this tool. Any other users might be worth reviewing.
Code Listing 40: Who is using SSMS?
-------------------------------------------------------- -- Script: Find_SMSS.SQL -- Find users running SQL Server Management Studio -------------------------------------------------------- select loginame,login_time,cmd from sys.sysprocesses where dbid>4 and program_name like '%SQL Server Man%' order by loginame |
Note: We once had a user who didn’t know the difference between NULL and “NULL”, and set all of a particular field to “NULL” (string). It took a bit of digging to realize she had SSMS installed and ran the query, invalidating all the records. (She now has read-only access.)
Similarly, you can identify .NET applications by looking for a program name like ‘.Net%’.
You can see who might be blocking other processes using the code in Code Listing 41.
Code Listing 41: Who is blocking?
-------------------------------------------------------- -- Script: WhoIsBlocking.sql -- Report users blocking other users --------------------------------------------------------- select 'Process '+str(sp.spid)+', user '+ sp.loginame+' is being blocked by '+str(bl.spid)+ ' user '+bl.loginame as BlockedMsg from sys.sysprocesses sp join sys.sysprocesses bl on sp.blocked=bl.spid where sp.dbid>4 and sp.blocked <> 0 |
If a user has a transaction open, the tables impacted within that transaction will block other update operations (and possibly select statements, depending on isolation level). You can use the code in Code Listing 42 to identify processes with open transactions.
Code Listing 42: Who has open transactions?
-------------------------------------------------------- -- Script: OpenTransactions.sql -- Sessions with open transactions --------------------------------------------------------- SELECT 'Process '+ltrim(str(sp.spid))+', user '+ sp.loginame+' has '+str(sp.open_tran)+' open transactions' FROM sys.sysprocesses sp WHERE sp.dbid>4 AND sp.open_tran <> 0 |
You’ve seen from these examples that you can see what is happening on the server using the view, and possibly diagnose some sessions that could be impacting performance.
The sql_handle column in the view provides the ability to see what is being done by the session. You can use the sys.dm_exec_sql_text table-valued function to look at the actual work being done. Code Listing 43 shows a simple example using the function to see what a spid is doing.
Code Listing 43: Look at entire query text
-------------------------------------------------------- -- Script: EntireQueryText.sql -- Look at the content of a particular session --------------------------------------------------------- SELECT sp.spid,sp.loginame, st.text FROM sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) st WHERE sp.dbid>4 AND sql_handle <> 0 AND spid = @YourSpid |
This will return the entire code being executed by the session. You can also drill down further, if the statement starting offset is known. Code Listing 44 shows the statement being extracted from the full query text.
Code Listing 44: Extracting statement from query
-------------------------------------------------------- -- Script: ExtractStatement.sql -- Look at the statement within the session -------------------------------------------------------- select sp.spid,sp.loginame, case when sp.stmt_start >=0 and sp.stmt_end>0 then substring(st.text,sp.stmt_start,(sp.stmt_end-sp.stmt_start)+1) else st.text end as QueryText from sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) st where sp.dbid>4 and sql_handle <> 0 and sp.spid = @YourSpid |
You can use the dm_exec_sql_text function to look at cached plans as well. The view sys.dm_exec_cached_plans holds query plans that SQL has cached. The objType column indicates the type of code, such as a trigger or ad hoc query. Code Listing 45 shows an example of how to look at the top 10 ad hoc queries based on project CPU usage.
Code Listing 45: Top 10 cached plans
-------------------------------------------------------- -- Script: Top10CachedPlans.sql -- Top 10 cached plans by usage -------------------------------------------------------- select top 10 cacheobjtype,objtype,st.text as Query,* from sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) st where objtype = 'Adhoc' order by usecounts desc |
SQL Server keeps query stats in a dynamic management view called dm_exec_query_stats. This view is very handy for looking at the queries that are potentially causing issues in your server. This view provides the pointer to the code (plan_handle) as well as counters for key values of the query. Table 30 shows some of the key fields in the view.
Table 30: dm_exec_query_stats
Field | Description |
|---|---|
plan_handle | Binary pointer to query code. |
total_worker_time | Time used by the CPU. |
total_physical_reads | Disk reads performed by the query. |
total_physical_writes | Disk writes performed. |
total_logical_reads | Logical reads by query. |
total_logical_writes | Logical writes by query. |
total_CLR_time | Time spent in CLR procedures. |
total_elapsed_time | Time (milliseconds) query takes. |
total_rows | Number of rows query takes. |
While we are looking at the total values, there are corresponding fields for last, min, and max values, as well.
CPU time is measured by the worker_time value, so we can order by total worker time to identify those plans using a lot of CPU time.
Code Listing 46: Worst five plans based on CPU usage
-------------------------------------------------------- -- Script: WorstPlansByCPU.SQL -- Worst 5 plans based on CPU usage -------------------------------------------------------- SELECT TOP 5 st.text as SrcCode, qp.query_plan, qs.execution_count,qs.last_execution_time FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY total_worker_time DESC |
The input/output (I/O) totals indicate how often a query needs to read something from the disk. Ideally, in a query, you should read the minimum amount of data needed. When a query uses SELECT * or a lot of table scans, SQL is bringing back more data than is needed. For example, imagine a personnel table that includes a binary image of the person. If your code does a SELECT * from this table, but only displays the name and phone number, you’ve had SQL bring back extra data (the binary image, among other fields), when all it needed was two fields.
One of the statistics that tracks how much I/O a query uses is called logical reads.
Code Listing 47: Worst 5 by logical I/O
-------------------------------------------------------- -- Script: WorstPlansByIO.SQL -- Worst 5 plans based on I/O -------------------------------------------------------- SELECT TOP 5 st.text as SrcCode, qp.query_plan, qs.execution_count,qs.last_execution_time FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY total_logical_reads DESC |
Note: Logical reads versus physical reads: A physical read means the data was pulled from the disk subsystem, while a logical read means the data could have been pulled from the disk. However, it might have come from the memory cache, instead. The amount of memory SQL has helps determine its cache content, so when optimizing a query, focus on logical reads, and let the hardware guys make sure your server has a lot of memory.
When exploring the query stats for optimization purposes, keep in mind that there can be many factors making up a good versus bad query. For example, imagine a query has a high number of logical reads, but returns very few rows. This would suggest bringing in extra fields that are not needed, so you might want to look for SELECT * statements, or tables with large varchar fields.
Also, pay attention to the execution count and last execution time. If you are looking to optimize queries, a query that is frequently and recently run should be more of a focus than a query that is run once every month.
Sometimes a query runs fine, but occasionally will slow down. There can be other factors besides the query itself that can impact performance. Your query does not run in isolation; many other things can be happening on the server. Check these items out before focusing on the fast query itself.
You can investigate the sys.dm_os_waiting_tasks view for the blocked column of your target spid. If some other process is blocking your query, focus on the blocker first. Similarly, if another query has open transactions, that might be the culprit slowing your query down.
You can use the SQL global variable @@SPID to get your session ID in SQL.
Code Listing 48: Who might be blocking me?
-------------------------------------------------------- -- Script: WhoMightBeBlocking.SQL -- Who might be blocking me? --------------------------------------------------------- SELECT wait_type, blocking_session_id,p.program_name,p.loginame FROM sys.dm_os_waiting_tasks wt JOIN sys.sysprocesses p on p.sid=wt.blocking_session_id WHERE session_id=@@spid |
Note: You might occasionally see a CXPACKET wait type, and it looks like you are blocking yourself. This can occur when SQL is using hyperthreading, and it broke your query into pieces for each processor to handle, Basically, one piece of your query is waiting for the other piece to complete. These waits will almost always clear themselves.
TempDB is a shared database file that all databases can use for sorting, temporary tables, etc. If TempDB is busy, or the data files making up TempDB have grown large, this can impact overall query performance. Code Listing 49 shows a sample query to check out TempDB usage.
Code Listing 49: How busy is TempDB?
-------------------------------------------------------- -- Script: TempDB_Usage.sql -- Anything slowing tempDB? --------------------------------------------------------- SELECT 'Blocked processes' as Msg,count(*) as Total FROM sys.sysprocesses WHERE db_name(dbid)='tempDB' AND blocked <>0 UNION SELECT 'Waiting processes',count(*) as Total FROM sys.sysprocesses WHERE db_name(dbid)='tempDB' AND waittime >0 |
There are configuration settings that allow the server to grow or shrink the database automatically and to update statistics. If this process gets started by the server, your query might slow down during this time frame. If the autogrowth parameter is too small, for example, the server may frequently slow down to grow the database. Code Listing 50 reports these potential settings.
Code Listing 50: Autogrowth settings
-------------------------------------------------------- -- Script: AutoGrowth.sql -- Autogrowth settings that might impact performance --------------------------------------------------------- SELECT CASE WHEN growth = 0 then 'No growth allowed' WHEN status>100 then Cast(growth as varchar(3))+'% growth%' ELSE cast((growth*8.0/1024) as varchar)+' MB' END AS growth, CASE WHEN maxsize = 0 THEN 'No growth' WHEN maxsize < 0 THEN 'Unlimited' WHEN maxsize*8.0/1024 >=cast(268435456/8 as bigint) THEN '2 TerraBytes' ELSE CAST(Round(maxsize*8.0/1024,0) as varchar(20))+'MB' END AS MaxGrowth FROM sys.sysfiles |
Indexes are one of the key ways to increase performance in a SQL Server application. An index allows one or more columns in a database table to be maintained so that queries can use the smaller index to find the larger data row from the table. Developers will often create an index on the most commonly used fields to improve performance in their applications.
While indexes speed up performance during queries, they negatively impact table update operations. Every time a row is inserted, modified, or removed, the index needs to be updated to reflect the change. There needs to be a reasonable balance between indexes that are needed for querying performance, but without having too many indexes to slow down the CRUD (CREATE, READ, UPDATE, DELETE) operations.
Sometimes, indexes get created that are duplicates (same columns) of other indexes associated with the table. The following script can be used to identify indexes that contain the exact same column. Note that we also check to make sure the duplicate is not only duplicated by column name, but also whether the key is descending.
Code Listing 51: Duplicated indexes
-------------------------------------------------------- -- Script: Duplicated_indexes.sql -- List all indexes that contain same keys and order --------------------------------------------------------- SELECT OBJECT_SCHEMA_NAME(tb.object_id)+'.'+tb.name AS [TableName], ix.name AS FirstIndex, Dupes.name AS SecondIndex, c.name AS ColumnName FROM sys.tables AS tb JOIN sys.indexes AS ix ON tb.object_id = ix.object_id JOIN sys.index_columns ic ON ic.object_id = ix.object_id AND ic.index_id = ix.index_id AND ic.index_column_id = 1 JOIN sys.columns AS c ON c.object_id = ic.object_id AND c.column_id = ic.column_id CROSS APPLY ( SELECT ind.index_id,ind.name FROM sys.indexes AS ind JOIN sys.index_columns AS ico ON ico.object_id = ind.object_id AND ico.index_id = ind.index_id AND ico.index_column_id = 1 WHERE ind.object_id = ix.object_id AND ind.index_id > ix.index_id AND ico.column_id = ic.column_id AND ico.is_descending_key= ic.is_descending_key ) Dupes ORDER BY [TableName],ix.index_id |
If the script detects any duplicate indexes, they should be reviewed, and one of the indexes should be removed. You should not remove a clustered index if you have the choice, since the clustered index is generally the fastest indexing option.
SQL Server has a very useful view, called sys.dm_db_index_usage_stats. This view keeps track of activity performed against an index file, such as when the index was last used (for a seek, scan, or lookup operation). By using this view, we can create a script to remove indexes that are not used.
Code Listing 52: Unused indexes
-------------------------------------------------------- -- Script: Unused_indexes.sql -- List all indexes that have not been used --------------------------------------------------------- SELECT OBJECT_SCHEMA_NAME(i.object_id)+'.'+ OBJECT_NAME(i.OBJECT_ID) AS TableName, i.name AS UnusedIndexName, i.type_desc AS index_type FROM sys.indexes AS i LEFT JOIN sys.dm_db_index_usage_stats AS usage ON usage.OBJECT_ID = i.OBJECT_ID AND i.index_id = usage.index_id AND usage.database_id = DB_ID() WHERE OBJECTPROPERTY(i.object_id, 'IsIndexed') = 1 AND usage.index_id IS NULL -- No entry in usage table OR (usage.user_updates > 0 -- Updated by DML command AND usage.user_seeks = 0 -- But never used in a query AND usage.user_scans = 0 AND usage.user_lookups = 0) GROUP BY i.object_id, i.name, i.type_desc ORDER BY TableName |
We are confirming that the table has at least one index (ObjectProperty). If the index never appears in the usages stats table, or appears (has been updated), but never used in a query, we consider the index unused. You can review the indexes reported and consider removing them to increase performance during your DML operations.
Note: Some indexes might be created in anticipation of how a user might be querying the data, and early in a deployment lifecycle, those indexes might not have been used. Be careful removing “unused” indexes, particularly on a new system.
When SQL gets a query to run, it invokes the SQL optimizer to determine the most efficient way to run the query. The optimizer takes a lot of factors into consideration, such as which indexes are available, or the size of the tables. One of the first steps is to make a guess as to how much this query will cost (some internal measurement to the optimizer).
Tip: When we see cost times from the optimizer, we tend to want to associate them with real-world costs (time? dollars?). However, there is no real-world meaning to the value. Just know that the higher the cost, the longer the query will take, and more resources will be used.
After the optimizer runs the query, it looks to see if the query could be improved with some additional indexes. It assembles this information into a series of missing_index views. We can use these views to see if we might be able to improve performance with the addition of an index.
The following query looks at the assembled missing index data, and “suggests” how things can be improved via indexes.
Code Listing 53: Missing indexes
-------------------------------------------------------- -- Script: Missing_indexes.sql -- List all indexes that are missing --------------------------------------------------------- SELECT DMID.statement as [TableName], avg_user_impact, unique_compiles, equality_columns, inequality_columns, included_columns FROM sys.dm_db_missing_index_groups AS DMIG INNER JOIN sys.dm_db_missing_index_group_stats AS DMIGS ON DMIGS.group_handle = DMIG.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS DMID ON DMID.index_handle = DMIG.index_handle WHERE database_id>=DB_ID() ORDER BY avg_user_impact DESC,unique_compiles |
Understanding the columns is the key to determining whether you should create the recommended index.
The TableName column is the table that was being queried when the missing index was detected. It includes the database name and schema name.
The avg_user_impact is a percentage guess as to how much the query would be improved if this index were added. However, it is a guess made by the optimizer, not a guaranteed improvement.
The unique_compiles is a guess as to the number of queries that would benefit from the addition of the index. Basically, the optimizer says, “I found a number of different queries that I think this index will help.” Table 31 shows some sample results.
Table 31: Sample “missing indexes”
TableName | avg_user_impact | unique_compiles | equality_columns |
|---|---|---|---|
[dbo].[Device] | 83.13 | 50 | [AssignUserId] |
This result says that this index would help 50 “queries” and improve them by over 80%. However, keep in mind that these statistics are gathered from the time SQL was last started. If you start SQL every day, 50 queries with an 80%+ improvement is worth considering. If SQL has been up for six months, or you see a small number of compiles or a low improvement percentage, you are probably safe ignoring the recommended index.
The next three columns indicate the fields SQL was using for which it felt an index would be helpful. The columns contain comma-delimited field names to assist in creating the index.
These fields contain the list of columns where SQL was looking for a match (for example, UserID = 50).
These are fields where SQL was looking for anything other than a match, such as the following.
LoginDate > ‘7/1/2019’ or IsActive <> 1
These are columns that SQL recommends you include in the index, to help improve performance.
Keep in mind that there are limitations to the missing indexes system, and it is only intended to point a developer or administrator to an area to consider, not to fine-tune index configuration. Some of the limitations are:
There are other tuning tools provided, and any experience reading execution plans can really help fine-tune your indexes and query performance. While the missing index tables might help identify which tables/columns to look at, a good developer or DBA is your best bet to optimize the index usage.
The LIKE clause is a powerful SQL feature, allowing a person to find “matches” in a table, rather than exact values. However, it is possible to create a condition where SQL must use a table scan (slower) rather than any indexes to resolve the like expression. This can impact performance when applying the like clause to large tables.
If you were creating a system to allow people to search by last name, you might want to use LIKE as shown in the following.
Beginning with => LIKE ‘Mc%’
Ending with => LIKE ‘%son’
The first LIKE clause will use an index (assuming one exists on the last name column). However, the second clause will require a table scan, which can slow performance by quite a bit. To the user of the system, though, the difference might not be understandable as to why one search is quick and the other quite slow.
If you need to improve performance for the second scenario, you can take advantage of SQL’s computed and persisted columns.
Add a computed and persisted column that consists of the REVERSE(last_name) column.
When the user wants to search for last names ending with a pattern, simply reverse their input string and add the wildcard to the end.
DECLARE @reversed varchar(32) = REVERSE(@SearchName)+’%’
This is a simple solution to address name searching and performance in large tables.
SQL Server provides the ability to review and analyze your table and index structure, to help eliminate potential development gotchas, and to improve performance. In this chapter, we used these views and functions to give you a heads-up on improving your database performance.
The Dynamic Management Views and functions are very thorough and helpful for exploring what is going on inside your server. We just touched upon a few of them here, but hopefully this chapter will encourage you to explore them further to better optimize your SQL server.