left-icon

Database Design Succinctly®
by Joseph D. Booth

Previous
Chapter

of
A
A
A

CHAPTER 6

Data Standards

Data Standards


While modeling a database, you should strive to use standards for common attributes, such as addresses, countries, and phone numbers. Doing so will allow other developers to understand your model better and will likely increase your model’s robustness. You can develop your own standard naming conventions, field lengths, data types, and so on. Or you can take advantage of existing standards to apply. In this chapter, we will review some of the common public standards that might help when building your database.

The ISO (International Organization for Standardization) develops and publishes standards for almost everything, such as countries, languages, and phone numbers. If an ISO standard exists, it would be beneficial for your database to use it. This allows other developers to comfortably work with your data and facilitates easier data exchange.

In addition, many government agencies also define standards. If your application needs to interact with a government agency, it would be beneficial to use some of those standards as well. Even if your application isn’t used in government, using the standards can still be helpful for exchanging data with other organizations.

Internal standards

When designing your database, you should define standard values for certain column types. For example, the United States Social Security number (SSN) might be defined as characters, in groups of three digits, a dash, two digits, a dash, and four digits.

You might want to ensure, for example, that state or province codes are always two characters, and your city fields are always 30 characters long. Fortunately, Microsoft SQL Server supports the concept of user-defined data types. These are named data types that can be used to provide column size, defaults, null option, and so on. By using these types consistently, you can keep your column consistent across the tables.

User defined type

To create a user defined type, use the following syntax.

CREATE TYPE [schema].[type_name]

FROM base_type

NULL | NOT NULL

The base type is a standard SQL data type and optional size. You can also indicate whether the field accepts NULL values.

To create an SSN type, we could use the following example.

CREATE TYPE dbo.SSN

FROM

varchar(11) NOT NULL

We can add a check constraint to the type as well. The check constraint must be defined as a rule within SQL Server, and that rule would be bound to the data type. For example, we want to enforce the format of the Social Security number (according to U.S. usage).

Create a rule

A rule in SQL Server is a general-purpose, conditional expression. You create the rule with the following syntax.

CREATE RULE [schema].[rule_name] AS <conditional_expression>

The rule will have one variable associated with it, and it will contain the INSERTED or UPDATED value when the rule is applied. Our rule for the SSN field might look like the following code snippet.

CREATE RULE dbo.SSNCheck

AS

@val LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'

This rule says that the value (@val) of the field must be three digits, a dash, two digits, another dash, and four digits. The rule is valid if the conditional expression returns TRUE.

Once the rule is created, you can bind it to the data type using the SP_bindrule stored procedure.

EXEC SP_bindrule 'SSNCheck', 'SSN'

Create a default

SQL Server also allows you to create a DEFAULT rule, which can be applied to a matching column type. For example, let’s assume you always want to default to the state your company is located in. You might create a simple default expression, such as the following.

CREATE DEFAULT stateCode AS ‘PA’

You can then bind that default value to a user-defined data type, as well.

SQL Server SSMS has a dialog box (as shown in Figure 24) that allows you to create user-defined types, rather than relying on SQL scripts.

Dialog box to create user-defined types

Figure 24: Dialog box to create user-defined types

By creating user-defined data types and documenting them, you can ensure that the tables within your database have consistent type, sizes, nullable status, and so on.

User-defined table types

In addition to column user-defined types, you can also create user-defined table types. A user-defined table type is created using the following syntax.

CREATE TYPE <name> as TABLE
(
  Column Definitions
)

The column definitions are the same column definitions that CREATE TABLE uses. For example, you might want to create a table type to be used for lookup tables. One drawback is that you cannot create a new table based on the type, but rather can create a table variable. This would be useful for stored procedures that need temporary tables, and you’d prefer them to be consistently defined.

Note: There are workarounds to create a table from the user-defined table type; however, these solutions typically don’t pick up constraints, defaults, and so on. The following code snippet shows how to create a table based on a user-defined table type.

DECLARE @table AS dbo.UserDefinedTableName

SLECT * INTO dbo.DesiredTable FROM @table WHERE 1=2

This will create the table with the structure, but any constraints or defaults will not be present in the newly created table.

External standards

While defining user types can provide internal consistency in your table design, you should also strive to rely on existing standards for names, data types or sizes, and contents. For example, if you want to keep track of the type of business for a customer, rather than create your own coding scheme, you might want to look at SIC (Standard Industrial Classification) so if you exchange data with an external source, they will likely understand your data values.

ISO standards

The ISO (International Organization for Standardization) was founded in 1947 and is currently headquartered in Geneva, Switzerland. They create and certify standards that are defined by international experts. The standards that are developed are based on the consensus of the experts in the field, and all comments are considered before the standard is certified.

Country codes

ISO Standard 3166 provides two- and three-letter codes, as well as numeric values to represent countries of the world. A few example values are shown in Table 17.  By using either the code or numeric values, other databases can interpret your database columns as countries.

Table 17: Sample ISO-3166 country information

English name

Alpha-2

Alpha-3

Numeric

Belize

BZ

BLZ

084

Canada

CA

CAN

124

Germany

DE

DEU

276

United States

US

USA

840

You can browse the country codes on the ISO website. Search for 3166 and navigate to the country code list.

Currency codes

ISO Standard 4217 provides three-letter and numeric codes to represent various currencies of the world. A few example values are shown in Table 18. By using either the code or numeric values, other databases can interpret your currency type.

Table 18: Sample ISO-4217 currency information

Currency

Code

Numeric

Canadian Dollar

CAN

124

Swiss Franc

CHF

756

Japanese Yen

JPY

392

United States Dollar

USD

840

You can browse the country code on the ISO website. Search for 4217 and navigate to the currency list.

Language codes

ISO 639 provides codes for languages used throughout the world. Table 19 shows sample language data from ISO 639.

Table 19: Sample ISO-639 language codes

Language name

Native name

Code-2

Code-3

Chinese

中文 (Zhōngwén), 汉语漢語

zh

chi

French

Français

fr

fre

Greek

ελληνικά

el

gre

English

English

en

eng

Note that there are three versions of the three-character code (standards 639-2/T, 639-2/B, and 639-3). Be sure to communicate which version you are using when interfacing with external applications.

The country, language, and currency code standards are freely available from ISO. Most of the other standards are copyright protected. If you plan on using other standards from ISO, be sure to adhere to the appropriate copyright laws.

Wikipedia provides documentation and a good visual representation on these free standards:

You can also download the standards at many different websites. I would suggest searching GitHub for downloadable files.

Phone number standards

ISO does not publish phone number standards. However, the ITU (International Telecommunications Union) has created a standard called E.164. It basically breaks a phone number into two components: a 1–4-digit country code followed by the phone number itself.  Based on this, we might want to break the phone field into two fields (country code and number):

  • CCC varchar(4)
  • PhoneNumber varchar(15)

One approach to deal with phone numbers is to store just the digits in the database field, and then format the number based on the country code. Code Listing 2 shows a sample SQL computed field to format the phone number field based on the ISO country code. 

Code Listing 2: Computed phone number display

create table dbo.SamplePhone

( ISOCode      char(2),

  phoneNumber  varchar(15),

  formattedPhone

  as

  (case

   -- United States

   when ISOCode ='US'

then '+01 ('+substring(phoneNumber,1,3)+') '+

                   substring(phonenumber,4,3)+'-'+

                   substring(phonenumber,7,15)

   -- Mexico

   when ISOCode ='MX'

      then '+52 ('+substring(phoneNumber,1,2)+') '+

                  substring(phoneNumber,3,4)+' '+

                substring(phoneNumber,7,4)+' '

   -- Germany

   when ISOCode ='DE'

      then '+49 '+substring(phoneNumber,1,3)+' '+

                  substring(phonenumber,4,15)              

   else phonenumber

   end

   )

)

Note: Some database purists might rightfully point out that the formatting of the phone number could be done in the front-end. This is one approach; however, for a large collection of applications, it is possible that not all developers will approach formatting the same way. Having the database system format the number at least ensures it will be display consistently across various applications.

Address standards

No official organization publishes address standards since they vary so much by country. It is one of those thorny data processing issues that has many third-party solutions, but no defined standard. There are multiple approaches to handling addresses in the database.

Be lazy

With this approach, you simply add 3-4 lines called Address_Line_X, set them to a reasonable size, and let the application users determine the best way to write out the address. While it can work, it introduces a few problems:

  • There’s no easy way to identify duplicate addresses.
  • A lot of expectation is placed on the application users.
  • Reporting is difficult, and determining a state code consistently would be very hard to do.

Pick a country

If all your addresses are within a single country, it becomes easier. Most country post offices and government agencies define what address formats should look like. For the good news, the United States Postal Service provides documentation on addressing standards.

The bad news is the documentation is over 200 pages long. In general, the USPS format is 4–5 lines:

  • Line 1: Information/attention
  • Line 2: Company name
  • Line 3: Delivery address
  • Line 4: City, state, and zip code
  • Line 5: Country (only if outside U.S.)

The post office has considerable flexibility in handling addresses. The delivery address could be a P.O. Box, a rural route, a number, and street name. Mail can be delivered even without the zip code.

If you need to determine other country address formats, consider BitBoost Systems. They allow you to get formatting information for many countries. You can also consider the Universal Postal Union (a United Nations division). They provide address formatting information for most member nations.

If you know you’ll only mail to one country, determine the format, and create individual fields to hold the elements, such as city, state, and postal code.

Multiple countries

If you must support mailing to multiple countries, the concept is the same as a single country, except you need to make sure you have fields for all possible address components. You will also likely need to loosen validations on the postal code field, since postal codes vary by country.

For sake of example, let’s assume you’ve added the following fields:

  • Contact
  • Number
  • Street
  • City
  • Postal
  • State

I would then suggest creating a table of address templates, such as those shown in Table 20.

Table 20: Sample address templates

Country

Line 1

Line 2

Line 3

Mexico

~Contact~

~Street~ ~Number~

~Postal~ ~City~

France

~Contact~

~Number~ ~Street~

~Postal~ ~City~

United States

~Contact~

~Number~ ~Street~

~City~ ~State~ ~Postal~

You would then join the record with the template table, and either allow a SQL stored procedure to format the address via the template or ask the application developers to write that code.

Government standards

Many government agencies have standard definitions already created, and typically you will be required to use these for any data exchange with a government entity. In this section, we will look at some example government standards across the globe.

United States

The U.S. government must love standards, because it writes so many of them. Here are a few you might need to use.

(FIPS) codes for states and counties

The FIPS (Federal Information Processing System) can be used to identify geographical regions of the country. There are state-level FIPS codes, which are two digits long; and there are county-level FIP codes, which are five digits long (the first two characters are the state FIPS code). If you need this level of addressing detail, you could include the FIPS code for a county, to help locate and group your tables by county.

The FIPS codes state/county list can be found here.

SIC codes

SIC (Standard Industry Classification) codes are used to categorize a business or entity by the type of work it does. There are four-digit numbers assigned by the U.S. government. If you are designing a database where the company business type is needed, the SIC code might a good standard to categorize your companies, particularly if only working with U.S. companies. SIC codes were first established in 1937 and were updated in 1987 as the economy changed. To this day, SIC codes are still being updated. It is still a very popular categorization system for businesses.

Find out more about SIC codes.

NAICS codes

The NAICS (North American Industry Classification System) codes are like SIC codes, but used by Canada, the U.S., and Mexico. NAICS codes are 6 digits long. The government stopped updating SIC codes in 1987 (although some private companies still do) in favor of the NAICS codes. The NAICS codes allow for a more precise business breakdown than the SIC code.

A NAICS code can be broken down to various levels of details. The top level (first two digits) represents the sector, such as information, construction, finance, and health care. Each subsequent digit further breaks down the category. The fifth digit provides the industry, and the sixth digit (final level) provides the U.S. industry. For example, the NAICS code of 511210 is for software publishers.

You can learn more about NAICS codes and the closely related NAPCS (North American Product Classification System) codes here.

If you are designing a new database that needs industry classification, consider using the NAICS codes. However, the SIC codes are still very common, so you might come across them in existing database systems.

European classification codes

NACE (Nomenclature of Economic Activities) is the European version of SIC/NAICS codes.  The code begins with a section letter (B=fishing, D=manufacturing, and so on). The next two digits are a division (such as health), and the numbers after the decimal break it down further into group and class. For example, the code Q86.23 refers to dental practice. 

Most of the countries in Europe and the United Kingdom use NACE codes for business classification. If you are designing databases for the European countries, you will likely need to work with NACE codes.

You can learn more about NACE codes here

ISIC codes

SIC codes were originally developed by the United States, but in 1948, the United Nations created a similar set of codes called ISIC codes (International Standard Industrial Classification). The concept is like the United States SIC codes but used more often by member nations. The NACE code system is based on the ISIC codes.

You can learn about ISIC codes here.

You can convert codes online between the two systems using this website.

Whether you use NAICS codes or ISIC codes for business classification depends on your company list. With the United States only, NAISC codes are commonly recognized. If your market is international, you should consider ISIC or NACE codes instead.

Summary

In this chapter, we discussed internal standards and using user-defined data types to keep the database columns consistent. We also discussed a few examples of external standards.  Depending on your application and industry, you should strive to use standards so that any data exchange with external companies are “speaking the same language.”

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.