Regular Expressions

My first contact with regular expressions was while programming perl.

I hated it, which I guess is what everybody first thinks when struggling with the strange syntax. I always had the impression that regex are as bitchy as I am sometimes 🙂

By now they are growing to my heart. You can do so much so easily when you get the point and like to give it a try.

Here a couple of examples I wrote recently (using C#):


search string: gna(blub)
gna[\(](.*)[\)]

The ‘[]’ create a character class and matches any character listed. I want to catch a ‘(‘, which has to be escaped. Therefore it’s ‘[\(]’.
After that I catch anything. This is declared by ‘.*’. The brackets make that whatever he finds inside the brackets are saved within a variable. This variable I can use later to do more stuff to it.

search string: gna(tralla, blabla)
gna[\(]([^,]*)[,]([^\)]*)[\)]

Here we’ve got two groups we want to catch, which are separated by a ‘,’.

‘[^,]*’= I want anything except (^ = except) a ‘,’
‘*’ = any number of characters, but it could also be none
‘[^\)]*’ goes the same way, but I want anything except a ‘)’

Let’s parse a date in German format dd.mm.yy or dd.mm.yyyy:
^((0?[1-9]|[12][0-9]|3[0-1])\.(0?[1-9]|1[0-2])\.(19|20)?([0-9]{2}))$

We have four groups:
1: (0?[1-9]|[12][0-9]|3[0-1])
2: (0?[1-9]|1[0-2])
3: (19|20)?
4: ([0-9]{2})

1:
‘0?’ = a leading zero, but doesn’t need to
‘[1-9]’= a number from 1 to 9
‘|’ = or
‘[12]’ = 1 or 2
‘[0-9]’ = 0 to 9
‘3[0-1]’= either 30 or 31

The next group should be self explanatory.

3:
‘(19|20)?’ = either 19 or 20 or none at all

4:
‘[0-9]{2}’ = 0 to 9 twice e.g. 13

The leading ‘^’ at the beginning of the pattern states to start searching at the beginning of the whole search string, while the ending ‘$’ means that the search string has to end after that pattern.

Parse for the 30. or 31. february:
^(30|31).(0?2)$

Parse for the 31. of months who don’t have one:
^31.(0?[469]|11)$

And a C# specific replace example:
Regex.Replace(searchString, “^(‘.+)'(.+’)$”, “$1”$2″)

This is when you have a searchString containing a ”’ and you want to put the value into a db e.g. combine a sql update string. Dbs don’t like ”’ in their insert strings. To escape this, you need to double the ”’.

^(‘.+)'(.+’)$ = from beginning (^) to the end ($) there is a leading ”’ followed by 1 or more characters then another ”’ and again one or more characters finished by ”’

$1”$2 = take the value captured by the first bracket, add a double ”’ and put the captured value by the second bracket

Hmm… most likely if you’ve never heared of regular expressions before, this must sound to you like I have lost my mind!!!

🙂

3 Comments

  1. I’m very impressed how far you have come, really.

    Gal programmers seldom seem to dig that deep (well, the few ones I met anyway).

    Now that you learned to love’em, be prepared to learn them again and again.. because everyone who ever invented a computer language saw fit to create his/her own version of the regular expression set.

    There is a standard already you say ? Yeah, I read that too. Maybe language designers will follow it sometimes. Yeah right, when pigs fly 🙂

  2. Additional comment:

    What I mean by different implementation is mostly the fact that *,?,^,$ don’t always do the same thing.

    Sometimes * and ? are greedy, sometimes they are not. Sometimes you can correct the strange behavior you get when porting regular expressions to different languages by reversing the expression, but sometimes you can’t.

    Sometimes ^ and $ are implicit part of the expression, but most of the time they are not. In a few languages, you can switch that on/off, but sometimes you can’t.
    If they are implicit and you can’t switch them off, you simply can’t do some advanced things – the library is essentially neutered.

  3. much love to you, girl!

    i have fallen in love with functional languages myself, especially xsl[t][fo] (ok, according to michael kay “although XSLT is based on functional programming ideas, it is not as yet a full functional programming language, as it lacks the ability to treat functions as a first-class data type”, but still) it was such a liberation to think outside of the imperative box. try it sometime 🙂

Leave a Reply