CHAPTER 16
If you make an error in your regex syntax, the Microsoft engine will trigger an exception. The exception class that is triggered is the System.ArgumentException class (used for any method with argument exceptions). Although there is not a lot of extra detail in the exception, the message text is generally pretty descriptive of the regex error.
There are several type of errors that can occur processing regular expressions. These include:
This exception will occur when the regular expression pattern is null. It is a child exception to the System Argument exception, but you could add it to the exception chain if you wanted to distinguish between null and invalid syntax errors.
Note: A null pattern will trigger the exception, any empty string will not. An empty string is technically a valid regex pattern, although it will not match anything.
This exception gets triggered when the regex engine times out by attempting to process the regular expression pattern. With .NET 4.5, you can specify a timeout on the regex constructor; older versions have infinite time limits.
You can use a standard try catch block to capture any errors in regular expression patterns.
try { string pattern = @"\b((\(\d{3}\) ?)|(\d{3}-))?\d{3}-\d{4}\b"); string source = "(610) 555-1212"; Regex theExpr = new Regex(pattern); Match theMatch = theExpr.Match(source); if (theMatch.Success) { // Do some processing… } } catch (ArgumentNullException ex) { // Pattern is null. } catch (RegexMatchTimeoutException ex) { // Expression exceed time-out limit (.NET 4.5) } catch (Exception ex) { // ex.Message contains descriptive text of the syntax error } |
If you are working with the match object (or a group object within a match), then one issue to be aware of is how groups are handled compared to how captures are handled.
Imagine we have a simple regular expression to extract the numeric portion of a day from a calendar. The calendar day might be entered as a number, such as 4 or it might be entered with text afterwards, such as 22nd. We are using the following regex pattern:
(?<digit>[0-9]+)(?:st|nd|rd|th)?
We used this regex to extract just the numeric portion. Our code looks like the following:
Regex theExpr = new Regex(pattern); Match theMatch = theExpr.Match(source); if (theMatch.Success) { // Do some processing… int theDigit = Convert.ToInt16(theMatch.Groups["digit"].Value); } |
This code will work as expected, and will attempt to convert the value of the group into an integer. However, the following code will not raise an error, but also won’t give you the expected results.
string theDay = theMatch.Groups["Digit"].Value; string theLetters = theMatch.Groups[2].Value; |
When an invalid group name or number is specified, a group object is returned with the Success property set to false and the Value property set to an empty string. In the first case, the group name was “digit,” not “Digit” (case-sensitive), so the group is an empty non-match. In the second case, there is no group number two, so the empty group is returned.
Tip: Remember that group names are case sensitive, and if the group is not found, an empty group is returned rather than an exception triggered.
The captures collection behaves similarly to most .NET collections. If you attempt to reference an invalid capture element, you’ll trigger an ArgumentOutofRange exception. Also, keep in mind that the Captures collection is zero based, while the Groups collection is one based.