MattHeffron

MattHeffron


  • Name: Matt Heffron
  • Favorite Languages: C#, XAML, WPF
  • Website: [not set]
  • Location: Brea, CA
  • About Me: [not set]

Recent Comments

  • Regular Expressions - Grouping
    01/18/2012 - 14:11

    This is a good post, but I think you are shortchanging the power of back references. The real magic is that you can use them WITHIN the matching regular expression.

    Suppose you are trying to cleanup text. Duplicated words is a common error that even proofreaders sometimes miss:

    $text = "Now is the the time for all good people";
    //Find any word that occurs twice or more in a row.
    //Delete all occurrences except the first.
    preg_replace('\b(\w+)(?:\s+\1\b)+', '\1', $text);

    This will match:
    1. a word boundary followed by
    2. one or more word chars (= capture group 1) followed by
    3. one or more whitespace chars followed by
    4. the same thing as matched capture group 1 followed by
    5. a word boundary
    and will replace the whole match with the capture group 1 match.

    parts 3-5 are enclosed in the (? ) to form a cluster group, which has the + quantifier to match one or more occurrences of the sequence of parts 3-5. (A cluster group is like a capture group, except it doesn't actually capture. It is more efficient if you don't need to reference the matched group.)

    PHP regular expression backreferences are documented at: http://www.php.net/manual/en/regexp.reference.back-references.php
    and more than you would ever want to know about Perl regular expressions are documented at: http://perldoc.perl.org/perlre.html