SQL Queries Succinctly^®
by Nick Harrison

CHAPTER 3

More Advanced CRUD

If we knew all of the requirements upfront and could design the database to take them into account, our lives would be much simpler. In the real world, we don’t know all of the requirements upfront. Even the requirements that we are given upfront will change over time, and eventually, many of the initial requirements may no longer be valid or may be unrecognizable given enough time.

The database is designed in the beginning, using the best information and best practices that we have at the time. Over time, this well-designed database will deteriorate into violating any number of best practices, usually as a result of the changing requirements. No one sets out to create a badly designed database, but anyone who has worked with the same data model for a decent length of time will tell you all sorts of things that they wished were different or that they would do differently if they could start from scratch.

Note: A lot of time can be spent complaining about a bad design or wondering why anyone would come up with the design that you currently have. Such efforts are rarely productive.

Fancy inserts

Select-based inserts

We saw this syntax briefly when we looked at auditing with insert triggers. Instead of explicitly giving the input values for an INSERT statement, we can provide a SELECT statement that will provide this data. This will insert one record for every record in the specified SELECT statement.

An example of a SELECT-driven insert is:

Code Listing 3-1: Select-driven insert

INSERT INTO dbo.Department

( DepartmentName ,

CostCenter ,

LocationId )

( SELECT 'Accounting' ,

' 6MRTG99RAR3N6BI443CRY0M3' ,

0);

This query will insert a single record because the associated select returns only a single record. Because we did not specify a table name, only the values explicitly given in the SELECT clause will be returned. Our SELECT statement can be as complex as any SELECT statement we will see, meaning that we can do some amazing things here.

This is a very powerful technique. With it you can easily move data from one table to another, as we did with the insert trigger. We can also use it to fill in missing records where constraint was missing so that you can add it back.

We have a couple of rules to follow for the SELECT statement:

The columns in the SELECT must match the columns and data types from the INSERT statement.
The SELECT statement cannot include an Order By statement. The database will complain because there is no need to sort the records being inserted.

Now let’s revisit the auditing insert trigger from the last chapter.

Code Listing 3-2: Insert trigger revisited

CREATE TRIGGER trEmployeeInsert ON dbo.Employee

FOR INSERT

INSERT INTO dbo.EmployeeAudit

( EmployeeId ,

FirstName ,

LastName ,

ReportsTo ,

DepartmentId ,

BirthDate ,

HireDate ,

TerminationDate ,

AuditAction ,

AuditDateTime )

SELECT Inserted.EmployeeId ,

Inserted.FirstName ,

Inserted.LastName ,

Inserted.ReportsTo ,

Inserted.DepartmentId ,

Inserted.BirthDate ,

Inserted.HireDate ,

Inserted.TerminationDate,

'I' , -- I is for insert

GETDATE()

FROM Inserted;

This query builds on the fact that the Inserted table has the same structure as the table associated with the trigger. Additionally, the Inserted table has a record for each record that was inserted. Most of the time, this will be a single record, but if it happened to have two records, or even a thousand records, this query will handle keeping the audit table up-to-date.

As another example, we can look at what might happen if the foreign key constraints between the Employee and Department tables was accidently dropped. By the time we discovered this problem, we may have Employee records referring to DepartmentIds that do not exist. We would like to correct this problem by adding the missing records to the Department table so that we can enable the constraint and keep our data clean.

We can run a query like this to find the missing Department records:

Code Listing 3-3: Select to identify missing records to insert

SELECT DISTINCT DepartmentId

FROM dbo.Employee

WHERE DepartmentId NOT IN

( SELECT DepartmentId

FROM dbo.Department );

Don’t worry about this SELECT syntax yet—we will cover it shortly. For now, just know that it will produce a list of the DepartmentIds that are not in the Department table.

Before we can insert these records, we will need to update the database to allow us to insert a value into the Identity column. After we insert these records, we want to disable explicitly inserting values into the identity columns. The full code to handle inserting the missing records will be:

Code Listing 3-4: Enabling identity insert to give specific DepartmentId for the insert

SET IDENTITY_INSERT dbo.Department ON;

INSERT INTO dbo.Department

( DepartmentId ,

DepartmentName )

( SELECT DISTINCT

DepartmentId ,

'Missing Department Added by Data Cleanup' DepartmentName

FROM dbo.Employee

WHERE DepartmentId NOT IN ( SELECT DepartmentId

FROM dbo.Department ));

SET IDENTITY_INSERT dbo.Department OFF;

Note: We can infer the missing DepartmentId values based on the usage in the Employee table, but this does not give us any guidance on what the DepartmentName should be. We want to supply a standard name that will obviously not be the real name, and that we can easily track and go back to correct as a second step in the data cleanup.

Since we can explicitly set the value of an identity column during the insert, you may wonder about explicitly updating its value. After all, this is also a common scenario. Suppose that after an operational reorganization, you need to update the identity column for every employee hired after a particular date. Turns out this not as simple as issuing an UPDATE statement; there are many steps to go through for what conceptually should be a relatively straightforward task.

Note: In actual practice, never do this. The primary key to a table should never have any meaning outside of uniquely identifying a record. If you need a number that has a business meaning, this should be a separate column with no database meaning. In this case, you probably needed an EmployeeId column and an EmployeeNumber column. The Id column is used to maintain the data in the database, while the Number column may have whatever business meaning is appropriate.

The steps to update the value of an identity column are:

Create a new table with the same structure as the one we want to update, without specifying an identity column. We will call this the Staging table.
Insert the records that need to be updated into this new table setting the identity column as you want it updated.
Delete the records from the original table that were just copied.
Allow inserts into the identity column on the original table.
Insert the records from the Staging table back into the original table.
Disable inserts into the identity column.
Drop the Staging table.

The full code to handle this type of change looks like this:

Code Listing 3-5: Complete process for updating existing identity column values

SELECT EmployeeId + 1000 AS EmployeeId ,

FirstName ,

LastName ,

ReportsTo ,

DepartmentId ,

BirthDate ,

HireDate ,

TerminationDate

INTO EmployeeStaging

FROM dbo.Employee

WHERE HireDate > DATEADD(MONTH, -3, GETDATE());

DELETE FROM dbo.Employee

WHERE HireDate > DATEADD(MONTH, -3, GETDATE());

SET IDENTITY_INSERT dbo.Employee ON;

INSERT INTO dbo.Employee

( EmployeeId ,

FirstName ,

LastName ,

ReportsTo ,

DepartmentId ,

BirthDate ,

HireDate ,

TerminationDate )

( SELECT EmployeeId ,

FirstName ,

LastName ,

ReportsTo ,

DepartmentId ,

BirthDate ,

HireDate ,

TerminationDate

FROM dbo.EmployeeStaging);

SET IDENTITY_INSERT dbo.Employee OFF;

Practical uses of Cartesian products

We touched on this briefly in the last chapter when we talked the various ways to join tables. A cross join will return every possible combination from the tables being joined. If you have two records in one table and four records in another table, the result of the cross join will include eight records.

To see how this could be useful, let’s add a new table. The Ordinal table will look like this:

Ordinal table

Figure 3-1: Ordinal table

With this table, we can easily produce a query that will return any number of records, incorporating details from whatever table we want to cross join with it. For example, we could simulate eight employee records for each department using the following query:

Code Listing 3-6: Forcing an eight-way Cartesian product on the Employee table

SELECT 'Employee' FirstName ,

Name LastName ,

DepartmentId

FROM dbo.Department

CROSS JOIN dbo.Ordinal

WHERE OrdinalNumber <= 8;

Which will give us the following results:

FirstName	LastName	DepartmentId
Employee	One	1
Employee	Two	1
Employee	Three	1
Employee	Four	1
Employee	Five	1
Employee	Six	1
Employee	Seven	1
Employee	Eight	1
Employee	One	2
Employee	Two	2
Employee	Three	2
Employee	Four	2
Employee	Five	2
Employee	Six	2
Employee	Seven	2
Employee	Eight	2

Result Set 3-1: Results of the Cartesian query

Wrap this in an INSERT statement, and we will insert eight new employee records for each department. This could be useful for loading dummy data into a database to perform load testing or anytime substantial data needs to be staged.

Tip: Red Gate has a great tool for generating data: the SQL Data Generator. You can use it to easily generate hundreds of thousands of records for any table.

You could also use this technique to create 12 budget records for each department. This would be one for each month of the year, making it easier to verify that every possible record is created from the beginning and subsequent processing can focus on updating existing records.

Tip: This technique can be useful anytime you want to ensure that you have accounted for inserting all possible records.

Fancy updates

The updates we saw in the previous chapter all focused on a single table. As a record was updated, the new values were either explicitly given or determined by manipulating other values from the record being updated. This is often all that is needed, but not always—sometimes you need something fancier.

Correlated updates

A correlated update allows you to define an UPDATE statement in terms of a SELECT statement and use any of the columns from any of the tables joined in the SELECT statement to determine the update values.

To show how this works, let’s revisit the order-tracking tables from the last chapter.

A classic order-tracking data model

Figure 3-2: A classic order-tracking data model

We have an Order table and an OrderDetail table. These are staples for any e-commerce system. The Order table includes a roll up from the OrderDetail table.

Tip: Summarizing the detail data in the Order record is a common performance trick, even though it violates several rules for good database design. Sometimes good design takes a back seat to performance.

We can update these values with a query like this:

Code Listing 3-7: Correlated update

DECLARE @OrderID INT;

SET @OrderID = 41867

UPDATE dbo.[Order]

SET NumberOfItems = Summary.NumberOfItems ,

OrderTotal = Summary.OrderTotal

FROM dbo.[Order]

JOIN ( SELECT SUM(Quantity) NumberOfItems ,

SUM(Quantity * Price) OrderTotal ,

OrderId

FROM dbo.OrderDetail

JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId

GROUP BY OrderId ) Summary ON Summary.OrderId = [Order].OrderId

WHERE [Order].OrderId = @OrderID;

This query is a little bit more complicated than is typical in a correlated query because the query being used cannot directly include a GROUP BY, so we had to move the GROUP BY to a subquery.

In this example, we define the variable @OrderId to constrain the UPDATE to a single Order record. Without the WHERE clause on the UPDATE statement, every Order record would be updated. This may often be what you want, but for the first pass it’s a good idea to update a single record and ensure that you get the expected results before forcing an update on every record. Adding a variable is another good practice because it keeps the query free of literal values, and makes it easier to spot where to change the literal value if you need to change the query to update a different Order record.

Note: Don’t worry about the GROUP BY syntax in the query. We will cover this syntax in detail in the next chapter, along with aggregate functions like SUM.

A subquery is a little bit of SQL magic where we can define a SELECT statement and treat the result set as a temporary table that exists only as long as the containing query is running. This can be very useful, but can lead to queries that are difficult to read, depending on how complex the derived table’s definition is.

Common table expressions

Common table expressions (CTEs) are similar to subqueries, but much more powerful. Like subqueries, a CTE is a named query that exists only for the duration of the single statement that comes right after its definition. What makes CTEs more powerful is that they can be referenced multiple times in the same query. A CTE can also reference itself. Such recursive queries open up some very interesting possibilities, which we will explore in Chapter 5.

The basic syntax for a CTE is:

Code Listing 3-8: The basic structure of a CTE

WITH CTE AS (SELECT statement)

For the SELECT statement, we can do anything that we can do in any SELECT statement. So we can rework Code Listing 3-7 with a CTE like this:

Code Listing 3-9: Basic syntax for common table expressions (CTE)

WITH Summary (NumberOfItems, OrderTotal, OrderID)

AS ( SELECT SUM(Quantity) NumberOfItems ,

SUM(Quantity * Price) OrderTotal ,

OrderId

FROM dbo.OrderDetail

JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId

GROUP BY OrderId)

UPDATE dbo.[Order]

SET NumberOfItems = Summary.NumberOfItems ,

OrderTotal = Summary.OrderTotal

FROM dbo.[Order]

JOIN Summary ON Summary.OrderId = [Order].OrderId

WHERE [Order].OrderId = @OrderId;

The WITH clause names the CTE and specifies the names for the columns that will be included in the result set. You don’t have to include the list of columns as long as all of the columns are explicitly named in the query.

Tip: If you specify the list of columns in the WITH clause, the number of columns must match the number of columns in the SELECT statement, but the names do not need to match. The names in the WITH clause will be the ones used outside of the CTE, not the names in the SELECT statement.

We can include multiple CTEs in a query. Each CTE is separated by commas. We could rewrite the UPDATE statement using two CTEs:

Code Listing 3-10: Multiple CTEs in a single query

WITH Units ( NumberOfItems, OrderID )

AS ( SELECT SUM(Quantity) NumberOfItems ,

OrderId

FROM dbo.OrderDetail

GROUP BY OrderId),

Volume ( OrderTotal, OrderID )

AS ( SELECT SUM(Quantity * Price) OrderTotal ,

OrderId

FROM dbo.OrderDetail

JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId

GROUP BY OrderId)

UPDATE dbo.[Order]

SET NumberOfItems = Units.NumberOfItems ,

OrderTotal = Volume.OrderTotal

FROM dbo.[Order]

JOIN Units ON Units.OrderID = [Order].OrderId

JOIN Volume ON Volume.OrderID = [Order].OrderId

WHERE [Order].OrderId = @OrderID;

Here we have two CTEs, Units and Volume. In actual practice you would not want to separate these into two CTEs, but this showcases how to include multiple CTEs in a query.

We have only scratched the surface of what is possible with CTEs, and we will explore them more fully in Chapter 5.

Merge statement

With the MERGE statement, you can replace a lot of messy code with something much more maintainable. Unfortunately, the syntax takes some getting used to because it supports several different clauses to support multiple types of data modifications.

The general syntax for the MERGE statement is:

Code Listing 3-11: Basic MERGE statement with an update

MERGE TargetTable

USING SourceTable

ON TargetTable.TargetId = SourceTable.TargetId

WHEN MATCHED THEN

UPDATE SET ColumnName = value;

We can also include a common table expression. The syntax also includes clauses for not matching the source and not matching the target.

We can rewrite the UPDATE statement using the MERGE statement along with a helpful CTE:

Code Listing 3-12: MERGE statement using a CTE

WITH OrderSummary

AS ( SELECT OrderId ,

SUM(Quantity) AS NumberOfItems ,

SUM(Price) AS OrderTotal

FROM dbo.OrderDetail

JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId

GROUP BY OrderId)

MERGE dbo.[Order]

USING OrderSummary

ON [Order].OrderId = OrderSummary.OrderId

WHEN MATCHED THEN

UPDATE SET

NumberOfItems = OrderSummary.NumberOfItems ,

OrderTotal = OrderSummary.OrderTotal ;

Here the Target table is the Order table. The Source table is the CTE. This may look overly complicated for a single UPDATE statement, but we aren’t limited to a single one. We can incorporate any number of WHEN clauses:

Code Listing 3-13: MERGE statement handling multiple business rules

WITH OrderSummary

AS ( SELECT OrderId ,

SUM(Quantity) AS NumberOfItems ,

SUM(Price) AS OrderTotal

FROM dbo.OrderDetail

JOIN dbo.Item ON Item.ItemId = OrderDetail.ItemId

GROUP BY OrderId)

MERGE dbo.[Order]

USING OrderSummary

ON [Order].OrderId = OrderSummary.OrderId

WHEN MATCHED THEN

UPDATE SET

NumberOfItems = OrderSummary.NumberOfItems ,

OrderTotal = OrderSummary.OrderTotal

WHEN MATCHED AND CustomerId = 340419 THEN

UPDATE SET

NumberOfItems = 0 ,

OrderTotal = 0

WHEN NOT MATCHED BY SOURCE THEN

UPDATE SET

NumberOfItems = 0 ,

OrderTotal = 0 ;

Once you are comfortable with the syntax for the MERGE statement, you see that this gives us a very concise format to express multiple business rules in a single readable statement.

Fancy deletes

Deletes are rarely fancy, and we don’t want them to be fancy, since they are destructive to existing data. Even though we like boring, simple DELETE statements, we occasionally need to get a bit fancier than the delete by primary key that we saw in the last chapter.

Select feeding a delete

Let’s start by revisiting the problems we had deleting a Department record with associated Employee records because we didn’t want to enable cascade deletes for these two tables. Without cascade deletes, we get the following the following error.

Code Listing 3-14: A reference constraint raised by a DELETE statement

DELETE FROM dbo.Department

WHERE DepartmentId = 1

The DELETE statement conflicted with the REFERENCE constraint "FK_Employee_Department". The conflict occurred in database "SQL", table "dbo.Employee", column 'DepartmentId'.

So how do we solve this problem without cascading deletes?

Tip: If a record could exist without the foreign key value (such as an employee without a department), it is not a good candidate for cascade deletes.

We need to first delete all the records in the related table before deleting the target records in the main table. This is relatively straightforward if we want to delete a single record; it just requires two DELETE statements in the right order.

Code Listing 3-15: Explicitly delete related records before deleting the foreign record

DELETE FROM dbo.Employee

WHERE DepartmentId = 1;

DELETE FROM dbo.Department

WHERE DepartmentId = 1;

But what if you have a group of Department records to delete?

Code Listing 3-16: Deleting more than one department at a time

DELETE FROM dbo.Department

WHERE LocationId = 95

Clearly we don’t want to handle these deletes one at a time. We have no idea how many Department records are in our target location. Fortunately, we can write a query to get the list of Department records that we want to delete and feed that to the DELETE statement.

Code Listing 3-17: Identifying the departments to be deleted based on the location

SELECT DepartmentId

FROM dbo.Department

WHERE LocationId = 95;

This will give us a list of the departments to be deleted. Now we can feed this list to the DELETE statement.

Code Listing 3-18: Deleting related records and primary records together

DELETE FROM dbo.Employee

WHERE DepartmentId IN ( SELECT DepartmentId

FROM dbo.Department

WHERE LocationId = 95 );

DELETE FROM dbo.Department

WHERE LocationId = 95;

Regardless of how many Department records are going to be deleted, these two queries will handle the deletes.

Note: This is a great example of thinking in terms of sets instead of thinking procedurally.

A classic procedural approach to solving this problem would tackle the departments one at a time iteratively, perhaps even looping through a cursor. This is substantially more work than is necessary.

It is much simpler and more efficient to think in terms of sets. The inner SELECT statement creates a set of the departments that we want to delete. We then delete the contents of the set as a whole (conceptually) instead of one at a time.

Merge statement

We already talked about the MERGE statement as a fancy update; we can also use it to handle inserts and deletes. We can handle these deletes with the following query:

Code Listing 3-19: Deleting the related records as a MERGE statement

MERGE dbo.Employee

USING dbo.Department

ON Employee.DepartmentId = Department.DepartmentId

WHEN MATCHED AND LocationId = 95 THEN

DELETE;

This is much easier than the previous queries to handle the delete.

Summary

We have taken the complexity for each of the basic statements up a notch. A common theme that we have seen in each of these scenarios is how a query can be used to drive and control more complex data manipulation scenarios.

We have also introduced a couple of new concepts: the common table expression and the MERGE statement. Both of these are powerful tools to add to our tool belt.

Now we will turn our attention to more complex SELECT statements and explore various ways to slice and dice data.

Uncover hidden insights in your SQL data

Direct integration with SQL databases for real-time data access
Interactive dashboards for deeper data exploration
Advanced analytics to uncover hidden patterns and trends
Enhanced decision-making with data-driven insights

Get Your Free Trial Now

More Advanced CRUD

Fancy inserts

Select-based inserts

Practical uses of Cartesian products

Fancy updates

Correlated updates

Common table expressions

Merge statement

Fancy deletes

Select feeding a delete

Merge statement

Summary

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.