left-icon

Regular Expressions Succinctly®
by Joseph D. Booth

Previous
Chapter

of
A
A
A

CHAPTER 6

Alternation

Alternation


Alternation is the ability in a regex to match one of a list of choices. For example, if we want to write a regex to match a URL, we might tackle it as shown in Table 10:

Table 10: Alternation Rules

English rule

Regex pattern

Begins with www

www

Need a period

\.

Then any number of word characters

\w{1,}

Another period

\.

One of the following TLDs

(net, com, org, edu, info)

(com|net|org|edu|info)

With the proliferation of TLDs (top level domains), the English rules above are limited, but the concept of alternation is illustrated. Alternation provides a list of alternatives to consider, enclosed in parenthesis and separated by the pipe character |. The choices in the list do not have to be the same size.

Alternation ()

Alternation provides a list of alternative choices that the search text needs to contain one of. In is simplest for, the list is simply a list of words to match. If we wanted to expand the URL pattern search to include FTP sites as well as website, we could replace the www in the pattern with (www|ftp).

In addition, the items between the pipe characters do not simply have to be words; they can be patterns to search for. Imagine we want a regex pattern to find our assigned help desk person, who we know can be John, Sue, or Bill. One approach to find the technician might be to list all of the variations of their names: (John|Jon|Sue|Susan|Bill|Will). While such an approach can work, we can also use patterns to improve the likelihood of a match.

Patterns in Alternation

As a simple example, let’s find John, Jon, Sue, Susie, Suzie, Bill, or Will.

Table 11: Alternation Patterns

English rule

Regex pattern

John or Jon

Joh{0,1}n

OR

|

Sue

Sue

OR

|

Susie or Suzie

Su[sz]ie

OR

|

Will or Bill

[WB]ill

The regex alternation pattern becomes Joh{0,1}n|Sue|Su[sz]ie|[BW]ill. While the ability to add patterns in addition to literal strings is very powerful, it also can introduce some subtle issues. Looking at the URL pattern we wrote at the beginning of this chapter:

www\.\w{1,}\.(net|com|org|edu).

We could have written it without the parentheses: www\.\w{1,}\.net|com|org|edu . However, this would be interpreted differently by the regex engine. Without the parentheses, this would be interpreted as the following:

Find a string that begins with www, followed by a period, any number of letters, another period, and the word net OR the word com OR the word org OR the word edu. So the following would be the results, probably not what we were expecting.

www.facebook.com

www.yahoo.net

comic books

comedy

By adding the parentheses, the regex www\.\w{1,}\.(net|com|org|edu) returns the expected result, i.e. the two websites.

www.facebook.com

www.yahoo.net

comic books

comedy

Note: Using parentheses in an expression creates a group, which we will cover in more detail in a later chapter. Groups come in handy when we want to use a regular expression to break a larger string to components and manipulate the components. We’ve been focusing on searching with regex in these first few chapters, where the group is less important.

Resolving Alternation

One caveat when working with alternation is to make sure your choices can be reached. Alternation will generally be resolved left to right. If some of the leftmost patterns are always matched, the alternation will stop rather than looking for a better match later in the list. For example, if we wrote the following pattern:

\d*|[A-Fa-F]\*\d*|0X\d*

It might look like a valid pattern to find either a numeric value or a hex number. However, the \d* at the beginning means 0 or more occurrences of a number. Any string at all will match zero occurrences of a number, so this pattern will always resolve the left-most item in the list.

Summary

Alternation lets you choose a match among a delimited list of patterns. The alternation metacharacter is simply the | (vertical pipe) to separate a list of patterns. Sometimes, you’ll need to wrap your alternation syntax in parentheses to make sure your logic intent is clear.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.