Positive and negative lookahead are great features in regex.
Nice MSDN article here.
Say I’m trying to pick all the words starting with the letters “un” in the following:
unite one unethical ethics use untie ultimate
I could use a regular expression like:
\b(?=un)\w+\b
What this says is:
- \b – MATCH A BLANK
- (?=un) – This is the “zero-width positive lookahead”. It says “CHECK IF THE NEXT TWO CHARACTERS ARE EXACTLY “un”. IF SO, THEN MATCH \w+. IF NOT, THEN THERE IS NO MATCH HERE. Note that the zero-width positive lookahead is used to eliminate non-matches but it doesn’t match anything itself (zero-width). It needs to apply to an expression that will be used to match SOMETHING, in this case it will apply to the \w+ immediately following it. The zero-width lookahead APPLIES TO THE EXPRESSION IMMEDIATELY FOLLOWING IT (the \w+ in the example above).
- \b – match a blank