Saturday, February 3, 2007

Regular Expressions in Java

One of the cool things about Perl, Ruby, and other scripting langauges is how easy it is to do some very complicated things. For example, Ruby has the =~ operator, which lets you easily match a string against a regular expression. It returns the position that the match starts or nil if there is no match.

"sector 19" =~ /\d/    # Returns 7

Another cool thing you can do in Ruby is String.scan - this iterates though a string, matching against a regular expression and returns an array with all the matches.

# Returns all the <item> tags
text.scan(/<item>.*?</item>/)

If you know the right Java APIs though, these operations are almost as easy. The following Java code does the same thing as the above Ruby code:

ArrayList<String> tagList = new ArrayList<String>();
Matcher matcher =
Pattern.compile("<item>.*?</item>").matcher(text);
while(matcher.find()) {
String match = feedString.substring(matcher.start(),
matcher.end());
tagList.add(match);
}

A Matcher object is returned by Pattern.matcher - it basically remembers a string, a regular expression to match against, and the last matched position. Matcher.find returns true if there's another match. Matcher.start, and Matcher.end returns the start and end indexes of the previous match.

I'll admit that the code is a bit longer than the Ruby version, but it's still quite easy.

3 comments:

Kevin Chiu said...
This comment has been removed by the author.
Kevin Chiu said...

Hmm, String should be capitalized.

Also - the right bracket on the first item tag should be deleted, in case there are attributes.

Although Ruby is nice, I'm starting to get annoyed by some things, particularly threading.

Andy Hou said...

Opps, I fixed the String thing.

Let's assume there won't be any attributes to keep it simple.