left-icon

Regular Expressions Succinctly®
by Joseph D. Booth

Previous
Chapter

of
A
A
A

CHAPTER 12

Replacing Text with Regex

Replacing Text with Regex


The Microsoft Regex class provides a Replace method to update text based on the results of a regular expression. This method has a number of overloads to provide a great deal of flexibility in replacing text. The Replace() method can be called on a regex object or the Static Regex class (which requires passing the regex pattern as a parameter).

Replace()

The simplest variation of Replace takes a source string and replaces any matching pattern with a replacement string. If we wanted to remove extra spaces from a string (replace all multiple spaces with a single space), we could create a regex pattern like \s{2,} which looks for any white space that occurs at least two times. We then use the Replace() methods with two strings.

Replace(sourceString, ReplaceWith)

This is to find all occurrences of the pattern and replace it with the replacement string.

Regex ExtraSpaces = new Regex(@"\s{2,}");

string SourceInput = "Ben  &   Jerry's makes great ice cream...";

string TweakedResult = ExtraSpaces.Replace(SourceInput, " ");

The original string above was: Ben  &   Jerry's makes great ice cream...

And the result string is: Ben & Jerry's makes great ice cream...

For another example, remove all dollar signs and commas from a string to make it a numeric value.

Regex ExtraSpaces = new Regex(@"[$,]");

string SourceInput = "$12,750.00";

string TweakedResult = ExtraSpaces.Replace(SourceInput, "");

// Returns  12750.00

The static method works the same way, except that the second string is the pattern, so the currency cleaner using static methods would look like this:

string SourceInput = "$12,750.00";

string TweakedResult = Regex.Replace(@SourceInput, @"[$,]", "");

// Returns  12750.00

Using Groups

If your regex pattern creates groups (or sub-patterns), you can reference those groups by either name or number in your replacement text parameter. For example, let’s take a simple phone number regex that only handles phone numbers formatted as “xxx-xxx-xxxx.” We are going to break the area code and the phone number into separate groups. Our regex expression is as follows:

(?<area>\d{3})[- ](?<phone>\d{3}-\d{4})

We can create a replacement string to create a formatted phone number which consists of parentheses around the area code, a space, and the seven digit number (with the dash).

 string SourceInput = "610-867-5309";

 string TweakedResult = Regex.Replace(@SourceInput,

                        @"(?<area>\d{3})-(?<phone>\d{3}-\d{4})", "($1) $2");

// Returns  (610) 867-5309

We can also reference the group names instead of group numbers by using ${group}. This can make the replacement text a bit more clear.

 string SourceInput = "610-867-5309";

 string TweakedResult = Regex.Replace(@SourceInput,

                        @"(?<area>\d{3})-(?<phone>\d{3}-\d{4})",

                        "(${area}) ${phone}");

// Returns  (610) 867-5309

This type of replacement comes in very handy when attempting to clean data to some standards. We will provide a few examples in our next chapter.

Other Replace() Parameters

In addition to the simple string examples, there are a number of additional overloaded methods to control the replace operation.

Specifying Regex Options

If you are using the static method for replacement of strings, you can add a fourth parameter, which is the regex option to apply. This allows you to set the options when the replacement is called, rather than when you instantiate the object using the class methods.

Replace( sourceString, Pattern,  ReplaceWith, RegexOptions )

Specifying Time-out (.NET 4.5 and Above)

If you are using the static method for replacement of strings, you can add a fifth parameter, which is a timespan object. This allows you to set the duration that the regex is allowed to run. If you plan on allowing your users to create a regex patterns, I would recommend adding the time span to prevent a poorly designed pattern to backtrack itself into oblivion.

Replace( sourceString, Pattern,  ReplaceWith, RegexOptions, timeSpan )

Controlling Number of Replacements

When creating the regex object, rather than using the static methods, you can add a third parameter of type integer. This parameter tells the Replace method the maximum number of replacements it is allowed to make.

Replace( sourceString,  ReplaceWith, MaxReplacements )

Controlling Where to Start Searching

When creating the regex object, you can also add a fourth parameter of type integer. This parameter tells where in the input string the search begins.

Replace( sourceString,  ReplaceWith, MaxReplacements, StartAt )

Note that in this syntax, you can use -1 to indicate no upper limit on the number of replacements allowed.

Summary

The Replace method allows you to create a new string based on the regex pattern and the replacement string. It is very useful to clean data and create standardized results. In the next chapter, we will use regex and Replace() for the common programmer task of cleaning data.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.