Skip navigation

Monthly Archives: January 2018

Positive and negative lookahead are great features in regex.

Nice MSDN article here.

Say I’m trying to pick all the words starting with the letters “un” in the following:

unite one unethical ethics use untie ultimate

I could use a regular expression like:

\b(?=un)\w+\b

What this says is:

  • \b – MATCH A BLANK
  • (?=un) – This is the “zero-width positive lookahead”. It says “CHECK IF THE NEXT TWO CHARACTERS ARE EXACTLY “un”. IF SO, THEN MATCH \w+. IF NOT, THEN THERE IS NO MATCH HERE. Note that the zero-width positive lookahead is used to eliminate non-matches but it doesn’t match anything itself (zero-width). It needs to apply to an expression that will be used to match SOMETHING, in this case it will apply to the \w+ immediately following it. The zero-width lookahead APPLIES TO THE EXPRESSION IMMEDIATELY FOLLOWING IT (the \w+ in the example above).
  • \b – match a blank