CHAPTER 12
The Microsoft Regex class provides a Replace method to update text based on the results of a regular expression. This method has a number of overloads to provide a great deal of flexibility in replacing text. The Replace() method can be called on a regex object or the Static Regex class (which requires passing the regex pattern as a parameter).
The simplest variation of Replace takes a source string and replaces any matching pattern with a replacement string. If we wanted to remove extra spaces from a string (replace all multiple spaces with a single space), we could create a regex pattern like \s{2,} which looks for any white space that occurs at least two times. We then use the Replace() methods with two strings.
Replace(sourceString, ReplaceWith)
This is to find all occurrences of the pattern and replace it with the replacement string.
Regex ExtraSpaces = new Regex(@"\s{2,}"); string SourceInput = "Ben & Jerry's makes great ice cream..."; string TweakedResult = ExtraSpaces.Replace(SourceInput, " "); |
The original string above was: Ben & Jerry's makes great ice cream...
And the result string is: Ben & Jerry's makes great ice cream...
For another example, remove all dollar signs and commas from a string to make it a numeric value.
Regex ExtraSpaces = new Regex(@"[$,]"); string SourceInput = "$12,750.00"; string TweakedResult = ExtraSpaces.Replace(SourceInput, ""); // Returns 12750.00 |
The static method works the same way, except that the second string is the pattern, so the currency cleaner using static methods would look like this:
string SourceInput = "$12,750.00"; string TweakedResult = Regex.Replace(@SourceInput, @"[$,]", ""); // Returns 12750.00 |
If your regex pattern creates groups (or sub-patterns), you can reference those groups by either name or number in your replacement text parameter. For example, let’s take a simple phone number regex that only handles phone numbers formatted as “xxx-xxx-xxxx.” We are going to break the area code and the phone number into separate groups. Our regex expression is as follows:
(?<area>\d{3})[- ](?<phone>\d{3}-\d{4})
We can create a replacement string to create a formatted phone number which consists of parentheses around the area code, a space, and the seven digit number (with the dash).
string SourceInput = "610-867-5309"; string TweakedResult = Regex.Replace(@SourceInput, @"(?<area>\d{3})-(?<phone>\d{3}-\d{4})", "($1) $2"); // Returns (610) 867-5309 |
We can also reference the group names instead of group numbers by using ${group}. This can make the replacement text a bit more clear.
string SourceInput = "610-867-5309"; string TweakedResult = Regex.Replace(@SourceInput, @"(?<area>\d{3})-(?<phone>\d{3}-\d{4})", "(${area}) ${phone}"); // Returns (610) 867-5309 |
This type of replacement comes in very handy when attempting to clean data to some standards. We will provide a few examples in our next chapter.
In addition to the simple string examples, there are a number of additional overloaded methods to control the replace operation.
If you are using the static method for replacement of strings, you can add a fourth parameter, which is the regex option to apply. This allows you to set the options when the replacement is called, rather than when you instantiate the object using the class methods.
Replace( sourceString, Pattern, ReplaceWith, RegexOptions )
If you are using the static method for replacement of strings, you can add a fifth parameter, which is a timespan object. This allows you to set the duration that the regex is allowed to run. If you plan on allowing your users to create a regex patterns, I would recommend adding the time span to prevent a poorly designed pattern to backtrack itself into oblivion.
Replace( sourceString, Pattern, ReplaceWith, RegexOptions, timeSpan )
When creating the regex object, rather than using the static methods, you can add a third parameter of type integer. This parameter tells the Replace method the maximum number of replacements it is allowed to make.
Replace( sourceString, ReplaceWith, MaxReplacements )
When creating the regex object, you can also add a fourth parameter of type integer. This parameter tells where in the input string the search begins.
Replace( sourceString, ReplaceWith, MaxReplacements, StartAt )
Note that in this syntax, you can use -1 to indicate no upper limit on the number of replacements allowed.
The Replace method allows you to create a new string based on the regex pattern and the replacement string. It is very useful to clean data and create standardized results. In the next chapter, we will use regex and Replace() for the common programmer task of cleaning data.