SQL Server for C# Developers Succinctly^®
by Sander Rossel

CHAPTER 7

Troubleshooting

Once in a while queries go awry. Unfortunately, there can be many reasons for this. Maybe your database has grown to the point where a query that always worked well is now very slow. Maybe one of your environments is missing an index. Maybe a query is using a cached query plan that is not optimal for the current parameter value. In this chapter we’re going to look at how we can identify troublesome queries and how we can solve these issues.

For a few examples, we’re going to need a database with a little body. Microsoft has a sample database we can use called Adventure Works. You can download it from CodePlex. Alternatively, you can Google (or Bing) AdventureWorks2014 and see what shows up. Download the Adventure Works 2014 Full Database Backup.zip file and unzip the .bak file. In SSMS, right-click the Databases node in your Object Explorer and then click Restore Database… in the context menu. In the Restore Database window, enable Device, click the button “…” and add the .bak file you just unzipped. Then simply click OK, and the database should now be available.

Restoring Adventure Works

Figure 20: Restoring Adventure Works

Profiling

First of all, we’ll need to know which queries are running slowly. For this (and many more) we can use the profiler. I recommend using the profiler often when developing, as it shows you just about everything that happens in a database. For example, you may have a well-performing piece of code, but maybe it only performs well because you’re testing locally with few data. The profiler may show you that you’re actually sending thousands of queries to your database (maybe because of EF lazy-loading, or because you’re executing queries in a loop). I’ve often run into such scenarios that run smoothly on a test environment, but bring the database to a grinding halt in production (and that’s never fun).

To start the profiler, start up SSMS, and in the menu, go to Tools and then SQL Server Profiler. You can also run it as a stand-alone application from your start menu (or wherever you have it installed). If you start it from SSMS, it will automatically start a new trace. If you’re starting it from Windows directly, you can find New Trace under the File menu. Next, you will have to log in to the instance you want to trace. Be careful when tracing a production environment, as it will put a load on the system. Connect to your local instance, and you should get a window captioned “Trace Properties.” On the General tab, you can name your trace, save to file, or pick a template. We’re going to ignore this tab. Instead, go to the Events Selection tab. On this tab you can find all kinds of events that are raised by the database. A few are already selected for you, but there are many more. Check the Show all events and Show all columns boxes, and you’ll see what I mean. For now we’re interested in the following events: “Stored Procedures -> RPC:Completed and SP:StmtCompleted” and “TSQL -> SQL:BatchCompleted”. Now click Run to run the trace.

Events Selection tab of Trace

Figure 21: Events Selection tab of Trace

If you now go into SSMS and query a table, or you run the application we created in the previous chapter, you will see the queries in the profiler. And, unfortunately, you may see a whole lot more (I’m also seeing queries from my Report Server and some other stuff I’ve got running). You can pause the trace and put some filters in place.

Trace Pause and Properties

Figure 22: Trace Pause and Properties

In the Events Selection tab, go to Column Filters… and filter on ApplicationName like “EntityFramework”. If you haven’t manually changed the Application in the connection string of your DbContext, you will now only see queries that are fired from your application. A filter on LoginName and/or NTUserName can also come in handy, especially when you have millions of queries a day and you’re only interested in a single user. Another filter I use quite often is Duration Greater than or equal; set it to 5000 (milliseconds) and you’ll get all the queries that take at least five seconds. If one query pops up a lot, you may need to have a look at it. Anyway, with your ApplicationName filter in place, run some query from your application and look it up in the profiler. I ran context.People.Where(p => p.GenderId == 1).ToList();.

Query in the profiler

Figure 23: Query in the profiler

As you can see, EF has some overhead, and only the fifth row is my actual query. By looking at the generated query, I can see 1 = [Extent1].[GenderId] was not parameterized (as I intended in my query). Looking further at the columns, we can see the database needed 18 reads, a single write, and 19 milliseconds to come up with the result. When looking for troublesome queries, you might want to keep an eye on those values. Higher isn’t necessarily bad, but it may be an indication that something can be optimized. Now we can also copy and paste the query and run it in SSMS manually.

The profiler can be used for many things, for example, trace deadlocks (event “Locks -> Deadlock graph”). Just look at all the events and columns and you’ll see that we’d need another book to fully understand just this one tool. For my day-to-day activities, the BatchCompleted and StmtCompleted events, together with the Duration filter, prove to be invaluable.

Sometimes you might just want to run the trace for a while. Luckily, you can set up your trace and then in the File menu, go to Export and then Script Trace Definition and For SQL Server 2005 - 2014… Save the script somewhere and open it. Be sure to replace InsertFileNameHere (for example, C:\Traces\MyTrace will do, but make sure the folder exists) and run the script. Now it doesn’t matter whether the profiler is running or not; everything will be saved to the trace file. Of course you could do this using the Trace Properties, but I wanted to show you the script as well. By the way, if you’re having trouble opening the file, try to run the profiler as administrator.

To see which traces are running, you can use the following script.

Code Listing 63: View SQL traces

SELECT * FROM fn_trace_getinfo(default)

To stop, start, and delete a trace, you can use the following.

Code Listing 64: Stop, start, and delete a trace file

SELECT * FROM fn_trace_getinfo(default)

-- The file we created got traceid 3.

-- Yours may be different, check first.

DECLARE @TraceId INT = 3

-- Stop the trace.

EXEC sp_trace_setstatus @TraceId, 0

-- Start the trace.

EXEC sp_trace_setstatus @TraceId, 1

-- To delete the trace, first stop it.

EXEC sp_trace_setstatus @TraceId, 0

EXEC sp_trace_setstatus @TraceId, 2

Query plans

Now that you know how to get the queries that may cause trouble, you want to find out why exactly they cause trouble. The most obvious way to do this is by looking at the query plan. Before a query is executed, SQL Server determines the approximate quickest way to fetch all the data. I say approximate because SQL Server knows how to do one thing in different ways, so for example, a JOIN can be realized internally by a LOOP, MERGE, or HASH JOIN. Take two JOINS in a single query and we can have 3 * 3 = 9 different ways to execute this query. Add a WHERE clause, and things grow exponentially. A single query can have hundreds or even thousands of different plans. SQL Server takes an educated guess as to what plan might be optimal. We’ve already talked a bit about this a bit in the parameterization chapter, but now we’re going to look at some actual plans.

Unfortunately, Syncfusion doesn’t have a Succinctly e-book on query plans (yet?), but you could always check out this free e-book on query plans by SimpleTalk.

For now, open SSMS, browse to the AdventureWorks database, and open a new query window. Right-click on the empty query window, and in the context menu, check Include Actual Execution Plan. For some extra information, run the following statements in your query window.

Code Listing 74: Enabling statistics

SET STATISTICS IO ON

SET STATISTICS TIME ON

These statistics will be shown in your Messages tab after you run a query. You can delete the statements from your window; they will be active in this window until you close it or until you explicitly turn it off again (SET STATISTICS X OFF). Now run the following query (in the same query window).

Code Listing 75: A SQL query with JOIN

SELECT p.FirstName + ' ' + p.LastName AS FullName, *

FROM Person.Person p

JOIN Person.EmailAddress e ON

e.BusinessEntityID = p.BusinessEntityID

You now get the Results, Messages, and an additional tab, the Execution plan.

An execution plan

Figure 24: An execution plan

You should read the execution plan from right to left (the SELECT is executed last). The thicker an arrow between two nodes, the more data that was handled by that node. SQL Server starts by doing a scan on the Person table (using the primary key). It then computes the scalar FullName. It does another scan on EmailAddress using the primary key for that table. It then does a merge join between Person and EmailAddress. Finally, it selects the data. To interpret this, you must have a little knowledge on the inner workings of SQL Server, and quite possibly, algorithms. Knowing the alternatives is a must. So let’s dissect this plan and check out the alternatives.

We first start with an Index Scan. The alternative is an Index Seek. An Index Scan simply reads through the rows in your input (in this case an index table) and takes out the rows that match the queries criteria. In this case we want all rows, so we can’t really get faster than just fetching all rows like we do now. Let’s say we expect only one row to be returned because we filter on an ID—surely an entire table scan would be a waste of resources. That’s when SQL Server uses an Index Seek, which only goes through pages and rows that contain qualifying data. Somewhere in between all rows and none or one row is a turning point where one of the two is faster than the alternative.

The MERGE JOIN is the fastest join around. It only works when two sets of data are ordered in the same way, and when two rows have to match on a key. In this case, both tables are ordered on BusinessEntityID. MERGE JOIN reads a row from one input and compares it with a row from the other input. If the rows don’t match, the row with the smaller value can be discarded and the next row is read until all rows have been read. The maximum number of reads necessary for this operation is the number of rows in the first input, plus the number of rows in the second input.

The alternatives here are LOOP JOIN and HASH JOIN. LOOP JOIN loops through all rows in the first input, and for each row, it loops through all rows of the second input until a match is found. This is less efficient than a MERGE JOIN, as the maximum number of reads is now the number of rows in input one times the number of rows in input two.

The HASH JOIN can require the most processing behind the scenes, and if you encounter it, you probably should check your design. The HASH JOIN algorithm creates a hash table for the smaller input of the join (and possibly writes this to disk if short on memory), and then compares them to the other values using a hash match function. For small inputs this can be very efficient, but if you have bigger inputs, this might slow down your query considerably.

As we can see, this query is optimal; we have the optimal search algorithm and the optimal JOIN algorithm. The next query is far from optimal, and since the data set is relatively small, even though it’s not giving us any troubles, we’re going to optimize it for the sake of practice.

Code Listing 65: A query with a suboptimal plan

SELECT FirstName, MiddleName, LastName, Title

FROM Person.Person

WHERE MiddleName > 'Z'

OPTION (RECOMPILE)

I have added the OPTION (RECOMPILE) so that we’re sure SQL Server always generates a new plan for this query (so we’re not looking at cached plans). Here is the generated plan.

A suboptimal query plan

Figure 25: A suboptimal query plan

Here we see that we’re doing an Index Scan, but when we execute the query, we only get one result. So we’re scanning the entire table (almost 20,000 rows) for just one row. Also notice the Key Lookup. This happens when an index is used that does not have all the requested values. In this case, the Index Scan is done on IX_Person_LastName_FirstName_MiddleName, but we’re requesting more fields than just those three because we also want Title. The Key Lookup searches for the row index and retrieves all missing fields. Of course, SQL Server needs to join the results of the Index Scan and the Key Lookup, which costs even more time and resources.

Tip: The problem with slowly executing queries is that sometimes they run very slowly (like maybe an hour or more). In such cases, “Include Actual Execution Plan” is not an option, as it requires you to run the query, but you can’t wait for hours for it to finish, taking up resources and possibly blocking other queries. For such cases, SQL Server has an option in the context menu, “Display Estimated Execution Plan.” This option does not run the query and only creates the estimated plan. It may not give you the actual plan, but more often than not it’s pretty accurate, and you can at least inspect the plan and identify possible bottlenecks.

Indexing

So we have this query that uses an index, but still needs to look up the master row in the primary key index. That happens when an index is not covering. When all requested fields from a table are contained in a single index, this index is called a covering index. In this case, Title is missing from the index. So let’s add Title to the index. We can do two things: add Title to the actual index, or include it in the index.

To understand the difference, we must understand that an index is a virtual sorting of one or more fields. IX_Person_LastName_FirstName_MiddleName is sorted on LastName, then FirstName, and then MiddleName. That means SQL Server can use the Binary Search algorithm to quickly find all rows with a specific LastName, then for all people with that LastName, it can quickly find all people with a specific FirstName. Finally, for people with that LastName and FirstName, it can find all people with a specific MiddleName.

If we added Title to the index, we could find people with a specific name and specific Title as well. Unfortunately, indexing comes with a price. Every time one of the fields is updated, the index must be updated as well. Adding Title to the index means the index must be updated when any LastName, FirstName, MiddleName, or Title is updated. So inserts, updates, and deletes on this table will now take the performance hit. We’re probably never going to query for such a specific name with a specific title, so we’d rather not do that.

Luckily we have a second choice: include Title as an included field. It’s not included in the sorting, but it is included in the index, meaning that whenever Title is updated, the index doesn’t have to update. The only cost we have then is that of some extra memory usage, a fair price to pay if it speeds up our query. To include a field in an index, find the index in the Object Explorer, right-click, choose Properties, and add the field on the Included columns tab (and NOT the “Index key columns” tab). Now let’s execute the query again.

Changing the index worked, since our plan no longer requires a Key Lookup.

Key Lookup eliminated

Figure 26: Key Lookup eliminated

Unfortunately, we still have an Index Scan instead of an Index Seek. Apparently our current indexes just don’t cut it. You might as well remove the Title from the included columns now, as we’re going to create an index that’s completely optimized for this query. Simply add an index and put MiddleName in the Index key columns, and FirstName, LastName, and Title in the Included columns.

Code Listing 66: Index on MiddleName

CREATE NONCLUSTERED INDEX IX_Person_MiddleName

ON Person.Person

(

MiddleName

)

INCLUDE (FirstName, LastName, Title)

If we run the query again, we will see that we have eliminated the Index Scan as well, and the query is now optimal.

Optimal query plan

Figure 27: Optimal query plan

In some cases, SQL Server may suggest indexes for you. For example, run the following query and inspect the query plan.

Code Listing 67: Non-indexed query

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate = '20120101'

Obviously it would be nice if we had an index on OrderDate. The SalesOrderHeader table has over 30,000 rows, and currently OrderDate has no index, and an index scan is performed. A specific order date never returns huge result sets, though, so scanning the entire index is never optimal for this query. Luckily, SQL Server sees that, too, and gives you a hint. You can right-click it and select Missing index details, and it will open a new query window with the index script. Be sure to take a good look at it, as it will not always be a desired index! For example, a SELECT * may result in an index having all columns that are not part of the ordering to be included in the index. In such cases, it’s probably better to not include any columns at all, and just stick to the index columns.

A missing index hint

Figure 28: A missing index hint

Now, you may think it’s a good idea to optimize every query, but it’s not. Imagine a query taking five minutes to complete—surely that must be optimized, right? What if the query runs only once a month? Suddenly those five minutes are not so important anymore. Optimizing it may take hours of work that you’ll have back after years of running the query. If the five minutes is a daily routine and users are waiting for it, by all means, optimize! Even a query that takes an hour can be fine if it is, for example, an automated job that runs at night. Optimizing it might mean that other processes that run constantly during production may incur a performance hit. Finally, as we will see in the next section, an optimization may not always be optimal for all cases.

Parameter sniffing

Sometimes your query runs really slow when executed from your C# code, while other times it completes in a few milliseconds. Multiple things can happen here. First, and we’ll get to this, is blocking queries. Another option is parameter sniffing. What SQL Server does when it encounters a new query that uses parameters is that it creates a query plan that is (approximately) optimal for the current value of the parameter. Remember that query plans are cached and reused if the same query is executed again. However, if the same parameterized query now has a different value for its parameter, the plan may now be suboptimal. Luckily we can replicate this issue quite easily (although we won’t see the performance hit, as our database is still too small).

First we need some parameterized query. An ad hoc query in a query window is not parameterized unless you use sp_executesql, an in-built procedure that runs a string as SQL and can optionally take parameters. This is what C# sends to the database if you run a parameterized query. Running a string isn’t very easy to type, though (as you have to escape it yourself), so let’s go with the alternative, a stored procedure. Create the following procedure in the AdventureWorks database (never mind that this procedure is pretty useless and becomes literally unusable after some million orders).

Code Listing 68: SP GetOrdersByDate

CREATE PROCEDURE GetOrdersByDate

@OrderDate DATETIME

BEGIN

SET NOCOUNT ON;

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate < @OrderDate

END

You have to know one thing about SQL Server when dealing with plans. SQL Server keeps all kinds of statistics internally. When you’re going to execute this procedure, SQL Server will inspect @OrderDate and make some assumptions based on its statistics. That makes things a little easier for us, too. At this point we know there is no index on OrderDate. As a result, with so many rows, SQL Server will probably just scan the table for each value we assign to @OrderDate. We can simply test this by executing this query with a few values and inspecting the plan.

Code Listing 80: Multiple plans for the same query

DBCC FREEPROCCACHE

EXECUTE GetOrdersByDate @OrderDate = '20100101'

DBCC FREEPROCCACHE

EXECUTE GetOrdersByDate @OrderDate = '20120101'

DBCC FREEPROCCACHE

EXECUTE GetOrdersByDate @OrderDate = '20150101'

DBCC FREEPROCCACHE clears all query plans from the cache, so we know SQL Server isn’t reusing any plans. We get the same plan for each parameter.

Equal plans

Figure 29: Equal plans

That isn’t helping our example very much. It also seems, since @OrderDate = ‘20100101’ returns 0 rows, that a table scan isn’t optimal. So let’s make SQL Server know this, too. Let’s put an index on OrderDate.

Code Listing 81: Index on OrderDate

CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_OrderDate

ON Sales.SalesOrderHeader

(

[OrderDate] ASC

)

With this index in place, run the previous query again, with all the DBCC FREEPROCCACHE, and surely the first query plan will look a little different! It now does an Index Seek on IX_SalesOrderHeader_OrderDate, which is a lot better for this particular query.

Unfortunately, we can’t go around and execute DBCC FREEPROCCACHE before every query. In fact, never run DBCC FREEPROCCACHE on a production server unless you know what you’re doing. Run the following query, check the query plans, and you will see the effects of parameter sniffing.

Code Listing 69: Parameter sniffing

DBCC FREEPROCCACHE

EXECUTE GetOrdersByDate @OrderDate = '20100101'

EXECUTE GetOrdersByDate @OrderDate = '20120101'

EXECUTE GetOrdersByDate @OrderDate = '20150101'

The query plans are now equal again for each execution. Unfortunately, this is far from optimal for order dates 2012 and 2015. At this point you may be wondering, “Why did it use such suboptimal plans?” or “Why is my query so slow?” When you hover over a node in the query plan, you get to see all kinds of statistics that may be helpful in figuring out what went wrong.

Plan step statistics

Figure 30: Plan step statistics

The really interesting values here are the Estimated Number of Rows and the Actual Number of Rows. When SQL Server expects no rows to be returned, the estimated number will be 1, and SQL Server will optimize for that. However, the actual number of rows returned was 31,465. That’s a huge difference, and it’s an indicator that you’re probably missing an index, your statistics are out-of-sync, or you’ve become victim of parameter sniffing.

Parameter sniffing isn’t a bad thing by itself, but it has its quirks. There are a few things we can do to solve parameter sniffing. First, when you’re pretty sure each plan will be different, you can recompile your queries. Recompilation makes sure that a new plan is created every time the query is executed. You will get an optimal plan for order dates 2010, 2012, and 2015, but every execution will incur a performance hit for the addition plan creation. You can recompile single queries or entire procedures.

Code Listing 70: Recompiling queries and procedures

ALTER PROCEDURE GetOrdersByDate

@OrderDate DATETIME

BEGIN

SET NOCOUNT ON;

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate < @OrderDate

OPTION (RECOMPILE)

END

-- OR

ALTER PROCEDURE GetOrdersByDate

@OrderDate DATETIME

WITH RECOMPILE

BEGIN

SET NOCOUNT ON;

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate < @OrderDate

END

Another option is to tell SQL Server to optimize for a specific value. In this case, we’re pretty sure this procedure will be called with an order date that’s pretty close to today, so 99 out of 100 times we expect to fetch thousands of rows.

Code Listing 71: Optimize for value

ALTER PROCEDURE GetOrdersByDate

@OrderDate DATETIME

BEGIN

SET NOCOUNT ON;

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate < @OrderDate

OPTION (OPTIMIZE FOR (@OrderDate = '20150101'))

END

Order date 2010 will have a suboptimal query, but 2012 and 2015 run fine and don’t incur the performance penalty for creating the plan. In case you’re not sure about an ideal value, you may optimize for unknown. SQL Server will generate a plan based on some statistical average of your data distribution. The downside, of course, is that this optimization is unpredictable and may still result in parameter sniffing issues.

Code Listing 72: Optimize for unknown

OPTION (OPTIMIZE FOR UNKNOWN)

In some cases, SQL Server makes a wrong call considering the index to use. Or maybe it’s not wrong, just not what you’d like it to be. A while ago I found myself in such a situation. A coworker had created a rather lengthy stored procedure and the execution wasn’t all that it was supposed to be. Sure enough, we were missing an index. We added the index and cleared the query cache, but nothing happened. A quick look at the execution plan revealed that SQL Server wasn’t using our index! In this instance, we just wanted it to use our index.

Code Listing 73: Forcing an index

ALTER PROCEDURE [dbo].[GetOrdersByDate]

@OrderDate DATETIME

BEGIN

SET NOCOUNT ON;

SELECT *

FROM Sales.SalesOrderHeader

WITH (INDEX(IX_SalesOrderHeader_CustomerID))

WHERE OrderDate < @OrderDate

END

Surely that index makes no sense! The fun part is that if you run it now and look at the query plan, SQL Server will give you a hint that you’re missing an index on OrderDate (with all other fields included, which we don’t want).

Yet another option is to make a local variable from your input parameter. The idea is that SQL Server optimizes queries for a given value: the one that is passed to the stored procedure. However, when you use a local variable, SQL Server cannot know the value of that variable in the query plan building stage. As a result, using a local variable will create the same output as OPTION (OPTIMIZE FOR UNKNOWN).

Code Listing 74: Using a local variable

ALTER PROCEDURE [dbo].[GetOrdersByDate]

@OrderDate DATETIME

BEGIN

SET NOCOUNT ON;

DECLARE @LocalOrderDate DATETIME = @OrderDate

SELECT *

FROM Sales.SalesOrderHeader

WHERE OrderDate < @LocalOrderDate

END

As you can see, there are some solutions to parameter sniffing when working in SQL Server. You can apply the same methods in C# when working with ADO.NET. You’re building the query, so you can also include hints like OPTION (RECOMPILE), OPTION (OPTIMIZE FOR …), or WITH (INDEX(…)). Unfortunately, when using Entity Framework, you’re not building your own queries. But you have a few options. You can use stored procedures for troublesome queries (in fact, some people always use stored procedures for everything, but that’s another discussion). You can also use plain ADO.NET for troublesome queries. If you’re stuck with EF, you can use EF Interceptors (EF6 and onwards only), as discussed in the next chapter.

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now

Troubleshooting

Profiling

Query plans

Indexing

Parameter sniffing

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.