CHAPTER 3
If we knew all of the requirements upfront and could design the database to take them into account, our lives would be much simpler. In the real world, we don’t know all of the requirements upfront. Even the requirements that we are given upfront will change over time, and eventually, many of the initial requirements may no longer be valid or may be unrecognizable given enough time.
The database is designed in the beginning, using the best information and best practices that we have at the time. Over time, this well-designed database will deteriorate into violating any number of best practices, usually as a result of the changing requirements. No one sets out to create a badly designed database, but anyone who has worked with the same data model for a decent length of time will tell you all sorts of things that they wished were different or that they would do differently if they could start from scratch.
Note: A lot of time can be spent complaining about a bad design or wondering why anyone would come up with the design that you currently have. Such efforts are rarely productive.
We saw this syntax briefly when we looked at auditing with insert triggers. Instead of explicitly giving the input values for an INSERT statement, we can provide a SELECT statement that will provide this data. This will insert one record for every record in the specified SELECT statement.
An example of a SELECT-driven insert is:
Code Listing 3-1: Select-driven insert
INSERT INTO dbo.Department ( DepartmentName , CostCenter , LocationId ) ( SELECT 'Accounting' , ' 6MRTG99RAR3N6BI443CRY0M3' , 0); |
This query will insert a single record because the associated select returns only a single record. Because we did not specify a table name, only the values explicitly given in the SELECT clause will be returned. Our SELECT statement can be as complex as any SELECT statement we will see, meaning that we can do some amazing things here.
This is a very powerful technique. With it you can easily move data from one table to another, as we did with the insert trigger. We can also use it to fill in missing records where constraint was missing so that you can add it back.
We have a couple of rules to follow for the SELECT statement:
Now let’s revisit the auditing insert trigger from the last chapter.
Code Listing 3-2: Insert trigger revisited
CREATE TRIGGER trEmployeeInsert ON dbo.Employee FOR INSERT AS INSERT INTO dbo.EmployeeAudit ( EmployeeId , FirstName , LastName , ReportsTo , DepartmentId , BirthDate , HireDate , TerminationDate , AuditAction , AuditDateTime ) SELECT Inserted.EmployeeId , Inserted.FirstName , Inserted.LastName , Inserted.ReportsTo , Inserted.DepartmentId , Inserted.BirthDate , Inserted.HireDate , Inserted.TerminationDate, 'I' , -- I is for insert GETDATE() FROM Inserted; |
This query builds on the fact that the Inserted table has the same structure as the table associated with the trigger. Additionally, the Inserted table has a record for each record that was inserted. Most of the time, this will be a single record, but if it happened to have two records, or even a thousand records, this query will handle keeping the audit table up-to-date.
As another example, we can look at what might happen if the foreign key constraints between the Employee and Department tables was accidently dropped. By the time we discovered this problem, we may have Employee records referring to DepartmentIds that do not exist. We would like to correct this problem by adding the missing records to the Department table so that we can enable the constraint and keep our data clean.
We can run a query like this to find the missing Department records:
Code Listing 3-3: Select to identify missing records to insert
|
SELECT DISTINCT DepartmentId FROM dbo.Employee WHERE DepartmentId NOT IN ( SELECT DepartmentId FROM dbo.Department ); |
Don’t worry about this SELECT syntax yet—we will cover it shortly. For now, just know that it will produce a list of the DepartmentIds that are not in the Department table.
Before we can insert these records, we will need to update the database to allow us to insert a value into the Identity column. After we insert these records, we want to disable explicitly inserting values into the identity columns. The full code to handle inserting the missing records will be:
Code Listing 3-4: Enabling identity insert to give specific DepartmentId for the insert
SET IDENTITY_INSERT dbo.Department ON; INSERT INTO dbo.Department ( DepartmentId , DepartmentName ) ( SELECT DISTINCT DepartmentId , 'Missing Department Added by Data Cleanup' DepartmentName FROM dbo.Employee WHERE DepartmentId NOT IN ( SELECT DepartmentId FROM dbo.Department )); SET IDENTITY_INSERT dbo.Department OFF; |
Note: We can infer the missing DepartmentId values based on the usage in the Employee table, but this does not give us any guidance on what the DepartmentName should be. We want to supply a standard name that will obviously not be the real name, and that we can easily track and go back to correct as a second step in the data cleanup.
Since we can explicitly set the value of an identity column during the insert, you may wonder about explicitly updating its value. After all, this is also a common scenario. Suppose that after an operational reorganization, you need to update the identity column for every employee hired after a particular date. Turns out this not as simple as issuing an UPDATE statement; there are many steps to go through for what conceptually should be a relatively straightforward task.
Note: In actual practice, never do this. The primary key to a table should never have any meaning outside of uniquely identifying a record. If you need a number that has a business meaning, this should be a separate column with no database meaning. In this case, you probably needed an EmployeeId column and an EmployeeNumber column. The Id column is used to maintain the data in the database, while the Number column may have whatever business meaning is appropriate.
The steps to update the value of an identity column are:
The full code to handle this type of change looks like this:
Code Listing 3-5: Complete process for updating existing identity column values
SELECT EmployeeId + 1000 AS EmployeeId , FirstName , LastName , ReportsTo , DepartmentId , BirthDate , HireDate , TerminationDate INTO EmployeeStaging FROM dbo.Employee WHERE HireDate > DATEADD(MONTH, -3, GETDATE()); DELETE FROM dbo.Employee WHERE HireDate > DATEADD(MONTH, -3, GETDATE()); SET IDENTITY_INSERT dbo.Employee ON; INSERT INTO dbo.Employee ( EmployeeId , FirstName , LastName , ReportsTo , DepartmentId , BirthDate , HireDate , TerminationDate ) ( SELECT EmployeeId , FirstName , LastName , ReportsTo , DepartmentId , BirthDate , HireDate , TerminationDate FROM dbo.EmployeeStaging); SET IDENTITY_INSERT dbo.Employee OFF; |
We touched on this briefly in the last chapter when we talked the various ways to join tables. A cross join will return every possible combination from the tables being joined. If you have two records in one table and four records in another table, the result of the cross join will include eight records.
To see how this could be useful, let’s add a new table. The Ordinal table will look like this:

Figure 3-1: Ordinal table
With this table, we can easily produce a query that will return any number of records, incorporating details from whatever table we want to cross join with it. For example, we could simulate eight employee records for each department using the following query:
Code Listing 3-6: Forcing an eight-way Cartesian product on the Employee table
SELECT 'Employee' FirstName , Name LastName , DepartmentId FROM dbo.Department CROSS JOIN dbo.Ordinal WHERE OrdinalNumber <= 8; |
Which will give us the following results:
FirstName | LastName | DepartmentId |
|---|---|---|
Employee | One | 1 |
Employee | Two | 1 |
Employee | Three | 1 |
Employee | Four | 1 |
Employee | Five | 1 |
Employee | Six | 1 |
Employee | Seven | 1 |
Employee | Eight | 1 |
Employee | One | 2 |
Employee | Two | 2 |
Employee | Three | 2 |
Employee | Four | 2 |
Employee | Five | 2 |
Employee | Six | 2 |
Employee | Seven | 2 |
Employee | Eight | 2 |
Result Set 3-1: Results of the Cartesian query
Wrap this in an INSERT statement, and we will insert eight new employee records for each department. This could be useful for loading dummy data into a database to perform load testing or anytime substantial data needs to be staged.
Tip: Red Gate has a great tool for generating data: the SQL Data Generator. You can use it to easily generate hundreds of thousands of records for any table.
You could also use this technique to create 12 budget records for each department. This would be one for each month of the year, making it easier to verify that every possible record is created from the beginning and subsequent processing can focus on updating existing records.
Tip: This technique can be useful anytime you want to ensure that you have accounted for inserting all possible records.
The updates we saw in the previous chapter all focused on a single table. As a record was updated, the new values were either explicitly given or determined by manipulating other values from the record being updated. This is often all that is needed, but not always—sometimes you need something fancier.
A correlated update allows you to define an UPDATE statement in terms of a SELECT statement and use any of the columns from any of the tables joined in the SELECT statement to determine the update values.
To show how this works, let’s revisit the order-tracking tables from the last chapter.
![]()
Figure 3-2: A classic order-tracking data model
We have an Order table and an OrderDetail table. These are staples for any e-commerce system. The Order table includes a roll up from the OrderDetail table.
Tip: Summarizing the detail data in the Order record is a common performance trick, even though it violates several rules for good database design. Sometimes good design takes a back seat to performance.
We can update these values with a query like this:
Code Listing 3-7: Correlated update
DECLARE @OrderID INT; SET @OrderID = 41867 UPDATE dbo.[Order] SET NumberOfItems = Summary.NumberOfItems , OrderTotal = Summary.OrderTotal FROM dbo.[Order] JOIN ( SELECT SUM(Quantity) NumberOfItems , SUM(Quantity * Price) OrderTotal , OrderId FROM dbo.OrderDetail JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId GROUP BY OrderId ) Summary ON Summary.OrderId = [Order].OrderId WHERE [Order].OrderId = @OrderID; |
This query is a little bit more complicated than is typical in a correlated query because the query being used cannot directly include a GROUP BY, so we had to move the GROUP BY to a subquery.
In this example, we define the variable @OrderId to constrain the UPDATE to a single Order record. Without the WHERE clause on the UPDATE statement, every Order record would be updated. This may often be what you want, but for the first pass it’s a good idea to update a single record and ensure that you get the expected results before forcing an update on every record. Adding a variable is another good practice because it keeps the query free of literal values, and makes it easier to spot where to change the literal value if you need to change the query to update a different Order record.
Note: Don’t worry about the GROUP BY syntax in the query. We will cover this syntax in detail in the next chapter, along with aggregate functions like SUM.
A subquery is a little bit of SQL magic where we can define a SELECT statement and treat the result set as a temporary table that exists only as long as the containing query is running. This can be very useful, but can lead to queries that are difficult to read, depending on how complex the derived table’s definition is.
Common table expressions (CTEs) are similar to subqueries, but much more powerful. Like subqueries, a CTE is a named query that exists only for the duration of the single statement that comes right after its definition. What makes CTEs more powerful is that they can be referenced multiple times in the same query. A CTE can also reference itself. Such recursive queries open up some very interesting possibilities, which we will explore in Chapter 5.
The basic syntax for a CTE is:
Code Listing 3-8: The basic structure of a CTE
WITH CTE AS (SELECT statement) |
For the SELECT statement, we can do anything that we can do in any SELECT statement. So we can rework Code Listing 3-7 with a CTE like this:
Code Listing 3-9: Basic syntax for common table expressions (CTE)
WITH Summary (NumberOfItems, OrderTotal, OrderID) AS ( SELECT SUM(Quantity) NumberOfItems , SUM(Quantity * Price) OrderTotal , OrderId FROM dbo.OrderDetail JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId GROUP BY OrderId) UPDATE dbo.[Order] SET NumberOfItems = Summary.NumberOfItems , OrderTotal = Summary.OrderTotal FROM dbo.[Order] JOIN Summary ON Summary.OrderId = [Order].OrderId WHERE [Order].OrderId = @OrderId; |
The WITH clause names the CTE and specifies the names for the columns that will be included in the result set. You don’t have to include the list of columns as long as all of the columns are explicitly named in the query.
Tip: If you specify the list of columns in the WITH clause, the number of columns must match the number of columns in the SELECT statement, but the names do not need to match. The names in the WITH clause will be the ones used outside of the CTE, not the names in the SELECT statement.
We can include multiple CTEs in a query. Each CTE is separated by commas. We could rewrite the UPDATE statement using two CTEs:
Code Listing 3-10: Multiple CTEs in a single query
WITH Units ( NumberOfItems, OrderID ) AS ( SELECT SUM(Quantity) NumberOfItems , OrderId FROM dbo.OrderDetail GROUP BY OrderId), Volume ( OrderTotal, OrderID ) AS ( SELECT SUM(Quantity * Price) OrderTotal , OrderId FROM dbo.OrderDetail JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId GROUP BY OrderId) UPDATE dbo.[Order] SET NumberOfItems = Units.NumberOfItems , OrderTotal = Volume.OrderTotal FROM dbo.[Order] JOIN Units ON Units.OrderID = [Order].OrderId JOIN Volume ON Volume.OrderID = [Order].OrderId WHERE [Order].OrderId = @OrderID; |
Here we have two CTEs, Units and Volume. In actual practice you would not want to separate these into two CTEs, but this showcases how to include multiple CTEs in a query.
We have only scratched the surface of what is possible with CTEs, and we will explore them more fully in Chapter 5.
With the MERGE statement, you can replace a lot of messy code with something much more maintainable. Unfortunately, the syntax takes some getting used to because it supports several different clauses to support multiple types of data modifications.
The general syntax for the MERGE statement is:
Code Listing 3-11: Basic MERGE statement with an update
MERGE TargetTable USING SourceTable ON TargetTable.TargetId = SourceTable.TargetId WHEN MATCHED THEN UPDATE SET ColumnName = value; |
We can also include a common table expression. The syntax also includes clauses for not matching the source and not matching the target.
We can rewrite the UPDATE statement using the MERGE statement along with a helpful CTE:
Code Listing 3-12: MERGE statement using a CTE
WITH OrderSummary AS ( SELECT OrderId , SUM(Quantity) AS NumberOfItems , SUM(Price) AS OrderTotal FROM dbo.OrderDetail JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId GROUP BY OrderId) MERGE dbo.[Order] USING OrderSummary ON [Order].OrderId = OrderSummary.OrderId WHEN MATCHED THEN UPDATE SET NumberOfItems = OrderSummary.NumberOfItems , OrderTotal = OrderSummary.OrderTotal ; |
Here the Target table is the Order table. The Source table is the CTE. This may look overly complicated for a single UPDATE statement, but we aren’t limited to a single one. We can incorporate any number of WHEN clauses:
Code Listing 3-13: MERGE statement handling multiple business rules
WITH OrderSummary AS ( SELECT OrderId , SUM(Quantity) AS NumberOfItems , SUM(Price) AS OrderTotal FROM dbo.OrderDetail JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId GROUP BY OrderId) MERGE dbo.[Order] USING OrderSummary ON [Order].OrderId = OrderSummary.OrderId WHEN MATCHED THEN UPDATE SET NumberOfItems = OrderSummary.NumberOfItems , OrderTotal = OrderSummary.OrderTotal WHEN MATCHED AND CustomerId = 340419 THEN UPDATE SET NumberOfItems = 0 , OrderTotal = 0 WHEN NOT MATCHED BY SOURCE THEN UPDATE SET NumberOfItems = 0 , OrderTotal = 0 ; |
Once you are comfortable with the syntax for the MERGE statement, you see that this gives us a very concise format to express multiple business rules in a single readable statement.
Deletes are rarely fancy, and we don’t want them to be fancy, since they are destructive to existing data. Even though we like boring, simple DELETE statements, we occasionally need to get a bit fancier than the delete by primary key that we saw in the last chapter.
Let’s start by revisiting the problems we had deleting a Department record with associated Employee records because we didn’t want to enable cascade deletes for these two tables. Without cascade deletes, we get the following the following error.
Code Listing 3-14: A reference constraint raised by a DELETE statement
DELETE FROM dbo.Department WHERE DepartmentId = 1 The DELETE statement conflicted with the REFERENCE constraint "FK_Employee_Department". The conflict occurred in database "SQL", table "dbo.Employee", column 'DepartmentId'. |
So how do we solve this problem without cascading deletes?
Tip: If a record could exist without the foreign key value (such as an employee without a department), it is not a good candidate for cascade deletes.
We need to first delete all the records in the related table before deleting the target records in the main table. This is relatively straightforward if we want to delete a single record; it just requires two DELETE statements in the right order.
Code Listing 3-15: Explicitly delete related records before deleting the foreign record
DELETE FROM dbo.Employee WHERE DepartmentId = 1; DELETE FROM dbo.Department WHERE DepartmentId = 1; |
But what if you have a group of Department records to delete?
Code Listing 3-16: Deleting more than one department at a time
DELETE FROM dbo.Department WHERE LocationId = 95 |
Clearly we don’t want to handle these deletes one at a time. We have no idea how many Department records are in our target location. Fortunately, we can write a query to get the list of Department records that we want to delete and feed that to the DELETE statement.
Code Listing 3-17: Identifying the departments to be deleted based on the location
SELECT DepartmentId FROM dbo.Department WHERE LocationId = 95; |
This will give us a list of the departments to be deleted. Now we can feed this list to the DELETE statement.
Code Listing 3-18: Deleting related records and primary records together
DELETE FROM dbo.Employee WHERE DepartmentId IN ( SELECT DepartmentId FROM dbo.Department WHERE LocationId = 95 ); DELETE FROM dbo.Department WHERE LocationId = 95; |
Regardless of how many Department records are going to be deleted, these two queries will handle the deletes.
Note: This is a great example of thinking in terms of sets instead of thinking procedurally.
A classic procedural approach to solving this problem would tackle the departments one at a time iteratively, perhaps even looping through a cursor. This is substantially more work than is necessary.
It is much simpler and more efficient to think in terms of sets. The inner SELECT statement creates a set of the departments that we want to delete. We then delete the contents of the set as a whole (conceptually) instead of one at a time.
We already talked about the MERGE statement as a fancy update; we can also use it to handle inserts and deletes. We can handle these deletes with the following query:
Code Listing 3-19: Deleting the related records as a MERGE statement
MERGE dbo.Employee USING dbo.Department ON Employee.DepartmentId = Department.DepartmentId WHEN MATCHED AND LocationId = 95 THEN DELETE; |
This is much easier than the previous queries to handle the delete.
We have taken the complexity for each of the basic statements up a notch. A common theme that we have seen in each of these scenarios is how a query can be used to drive and control more complex data manipulation scenarios.
We have also introduced a couple of new concepts: the common table expression and the MERGE statement. Both of these are powerful tools to add to our tool belt.
Now we will turn our attention to more complex SELECT statements and explore various ways to slice and dice data.