left-icon

Database Design Succinctly®
by Joseph D. Booth

Previous
Chapter

of
A
A
A

CHAPTER 4

Logical Model

Logical Model


The logical model begins to flesh out the details from the conceptual model. Attributes are added to the entities, and relationships are more clearly defined. We still aren’t concerned with database implementation, but rather providing the details of what information the database must model.

In this chapter, we will take all the entities gathered previously and define a logical diagram to be the basis for the actual database design. However, we want to make sure that our model is both flexible and robust, to make sure the database can be used efficiently by the business.

Attributes and keys

When describing an entity, there are two interrelated items to consider. Attributes are the elements of the information associated with the entity. Each attribute should contain a single fact about the entity. For a customer, this might include name, address, and tax information. For an order, the attributes might be the order number, date placed, or date required.

The keys are either a single attribute or a combination of attributes that allow an entity to be uniquely defined. You could create a key of every attribute, since that should be enough to identify the record. So technically, every attribute is also a key. However, the best key is the smallest attribute or collection of attributes necessary to identify the entity uniquely.

Type of keys

To better understand keys, let’s consider the various keys that you’ll need to work with.

Primary key

Every entity requires a primary key. This is the selected candidate key that will be used. It must be unique and cannot be NULL.

Candidate keys

Candidate keys are attributes that are minimally necessary to identify a record. For example, if we are modeling an employee with the following attributes:

  • Employee ID  
  • Social Security number
  • First and last name

Each of these could be considered a candidate key. The attributes hold just enough information to identify the entity, without any extra information. By identifying the candidate keys, you can select one of them to become the entity’s primary key.

Superkey

A superkey is a collection of attributes that uniquely identify the entity but may contain extra, unneeded information. For example, in our person entity, the first name, last name, and age together form a superkey, even though the age attribute doesn’t help identify the entity. An entity will likely have many superkeys. 

Composite key

If more than one attribute is needed to create the key, then that is a composite key. If the first and last names were separate fields, and you want to use that as a key, it would be a composite key.

Surrogate key

For various reasons, you might not want to use any of the candidate keys. In those cases, you can create an internally generated key that has no actual meaning, but simply is the primary key to identify the record.

For example, if we chose first and last name as our primary key, other referencing entities would need to store those values to find the related entity. If the person’s last name changes for whatever reason, every entity would need to be updated. In such an example, a simple integer value that would never change could offer a better solution to joining the entities together.

Foreign key

A foreign key is a linking key used to create a relationship between a child record and its parent entity. An order entity must have a customer, so the primary key of the customer entity will be stored as an attribute (foreign key) in the order entity. A foreign key must reference a primary key in another table or be NULL.

Note: Most relational database systems allow you to define what happens if the parent record is deleted. You might not allow the deletion if the parent has child rows, or you might automatically delete the child rows, and so on.

Data normalization

One of the first concepts in the logical model is a process called normalization. This is a step-by-step process designed to make your database more flexible and robust, and to eliminate redundant data.

Normalization is a set of rules (called normal forms). Once a rule is applied, the database is in “n” normal form. Most databases will be designed to third normal form, even though additional forms (rules) exist. We will briefly describe the additional forms in this chapter, but for practical purposes, the third normal form should be good enough.

Flexible data

The database design should not impose arbitrary limits on the data. For example, when tracking the number of children a person has, the system should allow no children up to some practical number. A business rule might set the maximum number of children, but the database design itself should not create a limit.

Redundant data

Redundant data (values that appear in multiple tables and fields) not only wastes space, but also makes maintenance in the database more difficult. Imagine in the real world that the finance office keeps track of customer information (including ship-to addresses). The warehouse also tracks customer information (to know where to ship products). However, the people in the warehouse often can’t reach finance, which is why they keep the addresses themselves. 

When the customer informs the company (via the finance office) of the new location, the warehouse might not be informed and could ship the product to the old location. In this type of scenario, the address is kept in two locations, and both must be updated when a change is made. Normalization will remove the redundant information, keeping the ship-to address in one spot.

First normal form

The first normal form (rule) primarily deals with flexibility in the database. The rule has three steps that should be completed:

  1. Identify any repeating groups.
  2. Create a separate table for the items in the repeating group.
  3. Link the separate table to the parent record.

The original table for the customer

Figure 16: The original table for the customer

Repeating groups

Any repeating groups in a table must be removed. For example, our customer table might have the following attributes:

  • Customer Name
  • Billing Address
  • Ship-to Address 1
  • Ship-to Address 2
  • Ship-to Address 3

If every customer had exactly three shipping addresses, this might be feasible (despite other problems it causes), but that is very unlikely to be the case. Some customers might only order electronic products and/or services, so, therefore, they don’t need a ship-to address. Other customers might have a shipping location in 20 different states or provinces, but this design prevents the customer from having more than three locations.

Repeating groups might also be stored in one column that has multiple values. For example, you might have a field called Customer Phone that has multiple phone numbers separated by commas. Table 3 shows an example.

Table 3: Multiple values in a single field

Customer Name

Billing Address

Customer Phone Number

ABC Company

Malvern, PA 

610-555-1234, 215-867-5309

This violates the concept that each attribute should contain a single fact about the entity. In either design, we need to fix the model to eliminate the repeating groups.

Create a new entity

Once you’ve identified the repeating groups, create a separate entity for the items. For our shipping address, our new entity might look like this.

Customer ID (Foreign key to customer table primary key)

Ship-to Name (or number) (This, with the customer ID, is our composite, primary key)

Address information

For our phone numbers, we might have a customer ID and phone type (mobile, work, home, fax, and so on). Customer ID and phone type would be the composite primary key for the customer phone numbers entity.

Link to the primary table

The customer ID is the primary key attribute of the customer entity. The ship-to name (which could be a type or number) is combined with the customer ID to create a composite primary key for the shipping entity.

With this step done for all repeating groups, the table is in first normal form. You can have any number of shipping addresses, phone numbers, and so on. Customers with no shipping location can be represented, as well. Our new logical model for a customer entity is shown in Figure 17.

Revised logical customer

Figure 17: Revised logical customer

The phone number table implies that we only have one phone number of each type. For practical purposes, we might create an integer value for each separate phone number (so the composite key would be the customer ID and integer value). The phone type would become an attribute of the entity, not part of the key.

Second normal form

The second normal form addresses some potential redundancy in the database design. For a database to be in second normal form (2NF), it must already be in first normal form (1NF). After that, we need to apply the following rule.

Split partial dependencies to separate tables

A partial dependency occurs when an attribute (such as Ship-To) depends on part of the key, not the entire key. For example, consider our order line-item table. Each order consists of one or more items, so the initial entity might look like Table 4.

Table 4: Partial dependencies

Order Number

Seq

Qty

Ship-To

Product Code

           100

1

4

Bensalem, PA

TB

           100

2

2

Bensalem, PA

WRIST

214

1

12

  Cherry Hill, NJ

  TB

In this example, the ship-to location is only based on the order number, not the combination of Order Number and Seq (primary, composite key). So, the Ship-To field is redundant and should not be part of the line-item entity.

We can solve this by creating a new entity if needed. Or we might be able to determine the Ship-To location directly from the order. In this example, the Ship-To information is stored with the order, so a new entity is not needed.

Note: If the system allows line items to be shipped to different addresses, then the Ship-To column is not redundant. It is not redundant if both parts of the key are needed to determine its value.

Third normal form

The third normal form (3NF) addresses another potential redundancy in the database design. For a database to be in the third normal form, it must already be in the second normal form (2NF). After that, we need to apply the following rule.

Remove any attribute columns not dependent upon the primary key.

For example, Table 5 shows a potential design for the order line-item table. The primary key for this table is the combination of the Order Number and the Seq number. The quantity and the product code both are dependent upon the combined key.

Table 5: Order line-item table

Order Number

Seq

Qty

Product Code

Product Name

Cost

           100

1

4

TB

Tennis Balls

$2.99

           100

2

2

WRIST

Wrist Bands

$5.00

214

1

12

  TB

Tennis Balls

$2.99

However, the Product Name and Cost are only based on the Product Code, not the combination of the Order Number and Product Code. Both Product Name and Cost are redundant.

We can solve this by creating a new entity, Product, that has the Product Code as the identifying key, and Product Name and Cost as attributes.

Revised logical design

Figure 18: Revised logical design

In this example, we added the shipping weight one time to the product table, as opposed to needing to modify the line-item table and update the shipping weight for all ordered products.

Note: Working with prices can be complicated, as prices can change, and we need to ensure line items for prior orders show an accurate price at the time the order was placed. We can do this by either keeping multiple price points in the product table (and having a composite key of the code and price point) or keeping multiple price points with a change date in a separate table. To show the line-item price, we need to check the order date to see what the price was before any later change.

Once a database design is in third normal form (3NF), most times this is sufficient to start the database implementation. 

Why bother?

What have we solved with these steps? By moving the database to first normal form (1NF), we have addressed two issues. Each attribute should be a single fact about the entity, and the database is not restricting the number of attributes that can be associated with the entity.

When the design reaches the third normal form (3NF), updates to attributes only need to be done in one spot, which reduces the possibility of different values for the same conceptual attribute. If we kept the product name and cost in the line-item table, then updated the cost on the first line item, but not the second, asking the system the cost of tennis balls would result in two different values.

Boyce-Codd normal form (BCNF or 3.5 NF)

This form is an improvement over 3NF if your table has multiple attributes (columns) in the primary key. If the primary key has only a single attribute, the entity is already in BCNF.

The attributes of the table are considered prime attributes if they are part of the key, and nonprime if they are not. If a nonprime attribute is dependent upon a single prime attribute (not the entire key), this violates BCNF and should be corrected.

For example, imagine a tennis club has members who take classes and clinics run by the club pros. Our table might look like Table 6.

Table 6: Tennis lessons

Member

Class

Pro Name

Christy

Women’s 4.0 Mixed

Luis

Christy

Sunrise Tennis

Vince

Rachel

Women’s 4.0 Mixed

Chris

Sondra

Competitive 4.0 Drill

Mike

Mary

Sunrise Tennis

Vince

The primary key for this table is the Member and Class attributes, so these are our prime attributes. The Pro Name is a nonprime attribute. In this example, the Pro Name is dependent upon the class, but not the full key. BCNF is violated when a nonprime attribute (Pro Name) is dependent upon a prime attribute (class), but not the entire key. To put this into BCNF, we create a separate table of Pro Name and Class, as shown in Table 7.

Table 7: Classes

ID

Class

Pro Name

1

Women’s 4.0 Mixed

Luis

2

Sunrise tennis

Vince

3

Women’s 4.0 Mixed

Chris

4

Competitive 4.0 Drill

Mike

Finally, our table for lessons will look like Table 8.

Table 8: Tennis lessons

Member

Class

Christy

1

Christy

2

Rachel

3

Sondra

4

Mary

2

Fourth normal form

The fourth normal form is used to address potential redundancy that could occur with multivalue dependencies. Since this situation is considered to rarely occur, most normalization effort stops before 4NF.

Functionality dependency

The first 3.5 normal forms deal with functional dependencies. A functional dependency is when the attribute is dependent upon the entity key. For our customer table, we can say that the name, address, city, and so on are attributes that are functionally dependent upon the customer ID key. Given a customer ID, we can determine the other attributes.

A case called a multivalue dependency occurs when two or more attributes together are associated with the entity key. It is the combination of attributes that is associated with the key, and not either attribute by itself.

As an example, a college provides a list of courses. Each course has a teacher and a book associated with it. There is no relationship between the teacher and book directly; they are only related to the associated course. Table 9 provides an example course entity, showing the teacher and required book for each course.

Table 9: Course entity

Course

Teacher

Book

Machine Learning

James McCaffrey

Python Machine Learning

Machine Learning

Ed Frietas

Machine Learning using C# Succinctly

Business Processing

Erica Quigley

Death to the Org Chart

Business Processing

Blaire Glacken

Business Process Flow Mapping Succinctly

If we wanted to see the teacher and book for a given course, the result would imply a relationship between teacher and book, where one does not exist. The relationship only exists in the course—there is no direct correlation between teacher and book. 

To solve this implied relationship issue, we need to put these tables into the fourth normal form (4NF). This requires breaking the multivalue attributes into separate entities to prevent the implied relationship that does not exist. Tables 10 and 11 show the course entity placed into the fourth normal form.

Table 10: Course/Teacher entity

Course

Teacher

Machine Learning

James McCaffrey

Machine Learning

Ed Frietas

Business Processing

Erica Quigley

Business Processing

Blaire Glacken

Table 11: Course/Book entity

Course

Book

Machine Learning

Python Machine Learning

Machine Learning

Machine Learning using C# Succinctly

Business Processing

Death to the Org Chart

Business Processing

Business Process Flow Mapping Succinctly

While this design now requires two queries to get the teacher and book associated with the course, it prevents the false implied relationship between the teacher and the book.

Denormalization

Although the normalization process addresses redundancy in your database design, the extra tables that might be required could impact the performance of your application. This might be a factor to consider when designing your logical model.

For example, we record prices from a historical price table for each product. When the line items are added, the cost is not written to the line-item table, but rather extracted from the historical table based on the date the order was place. When we go to compute the total cost of the order, we need to:

  • Get all the items on the order.
  • For each item, look up the product code to get price, based on the date of the order.
  • Sum up all costs.

This process could be slow, particularly for a database with a large historical price table or frequently large number of items ordered. If a monthly or yearly report is requested, showing orders and total cost, it could take a while to recompute the cost for each order.

You might want to denormalize the line-item table and store the actual cost when the order is placed. This is redundant (since cost is a lookup value) but might improve performance for a large report. However, there is a risk that if the historical price is updated, the line-item table (now containing a cost column) will not pick up the updated price.

While denormalizing is a risk, it might be acceptable based on your application. You should always consider the benefit versus risk. If your application is selling high-end items, and a price variation could be in the thousands of dollars, you might not want to take the risk. However, if the price variation is only a few cents, the performance gain would likely outweigh the risk of a price discrepancy of a few cents.

Summary

The process of normalization reduces the likelihood that data is duplicated while maintaining flexible data sets. The first three normal forms are typically the level you should aim your logical model at. The additional forms handle some unusual but possible situations that could occur in your database.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.