C# simplifies regular expressions (very slightly)
I like the Microsoft implementation of regular expressions in C#. I think they learned from Java’s mistake, and built something a tiny bit cleaner and easier. The key differences are:
- No special static “factory method” needed to create a regular expression
- IsMatch method of Regex class used for quick and dirty searches
- array-syntax “indexer” is used to retrieve results from a match
A brief overview of the regular expression
Let’s say I have a text file that contains key value pairs as quoted values in the form:
key1 = “value1″
key2 = “value2″
…and I want to capture each key and value, stripping out all of the rest of the text. The key has no leading space.
This is exactly the sort of thing that regular expressions are designed to do, in any language that supports them. Here is the regular expression I have set up for this purpose:
^([^=]+?)\s*=\s*["](.*?)["]\s*?$
| ^ | start at the beginning of the string | |
| ([^=]+?) | grab anything that isn’t an equals sign (and store in group 1) | |
| \s*=\s* | followed by the equals sign, which may be surrounded by zero or more space characters | |
| ["] | followed by a double-quote character | |
| (.*?) | followed by zero or more characters of any kind (and store in group 2) | |
| ["] | followed by a double-quote character | |
| \s*? | followed by any number of spaces, optionally | |
| $ | until you get to the end of the string |
In Java, one has to use the following objects to create a Regular Expression:
| String | Contains the text of the regular expression. Special characters may require escapes. | |
| Pattern static method | Compiles a regular expression, returning a Pattern object to use. | |
| Pattern | Pattern.matcher( String strInput ) method returns the Matcher object. | |
| Matcher | the Matcher object is used to test for success or failure, and traverse the matches found. |
Here is the example in Java:
private static final String REGEX_EQUALS = "^([^=]+?)\\s*=\\s*[\"](.*?)[\"]\\s*?$”;
…
Pattern patternEquals = Pattern.compile( REGEX_EQUALS );
Matcher matcherEquals = patternEquals.matcher( strInput );
if ( matcherEquals.find() ) {
String key = matcherEquals.group(1); // first match is group 1, not group 0
String value = matcherEquals.group(2);
…
}
Here is how this same process is implemented in C#; there are very few differences, but I like the C# implementation a little bit better:
| String | Contains the text of the regular expression. Special characters may require escapes. | |
| Regex | The regular expression. Constructor takes the String to use as the expression | |
| Match | Contains the results of a match. Returned by Regex.Match(strInput). |
Here is the example using C#.Net:
private static String REGEX_EQUALS = "^([^=]+?)\\s*=\\s*[\"](.*?)[\"]\\s*?$”;
…
Regex regexEquals = new Regex( REGEX_EQUALS );
Match matchEquals = regexEquals.Match(input);
if ( matchEquals.Success ) {
String key = matchEquals.Groups[1].Value;
String value = matchEquals.Groups[2].Value;
…
}
I suppose it isn’t really that big of a change. In Java, you have to use a specialized constructor from the static method Pattern.compile( String strRegex ), while in .Net you just use a regular constructor. But I’ll tell you what — whenever it comes time to use Regular expressions in Java, I am forever returning to the javadocs to figure out how to instantiate all of the objects to make things fit. But here in .Net I see an API that I will never forget. Also, the Regex class has a few nifty methods that I haven’t mentioned here, which are not present in Java, which also make it a good choice when doing really simple searches.
Anyhow, kudos to whoever redesigned the Java Regular Expression API for C# — they did a good job keeping it simple.

