libraries/spcre/libpcre/pcre/doc/html/pcrematching.html
author Tom Sutcliffe <thomas.sutcliffe@accenture.com>
Sat, 31 Jul 2010 19:07:57 +0100
changeset 23 092bcc217d9d
parent 0 7f656887cf89
permissions -rw-r--r--
Tidied iocli exports, build macro tweaks. Removed 4 overloads of CCommandBase::RunCommand[L] that are no longer used at all, and changed one more to not be exported as it's only used internally to iocli.dll. fixed builds on platforms that don't support btrace or any form of tracing.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     1
<html>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     2
<head>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     3
<title>pcrematching specification</title>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     4
</head>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     5
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     6
<h1>pcrematching man page</h1>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     7
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     8
Return to the <a href="index.html">PCRE index page</a>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     9
</p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    10
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    11
This page is part of the PCRE HTML documentation. It was generated automatically
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    12
from the original man page. If there is any nonsense in it, please consult the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    13
man page, in case the conversion went wrong.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    14
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    15
<ul>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    16
<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    17
<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    18
<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    19
<li><a name="TOC4" href="#SEC4">THE ALTERNATIVE MATCHING ALGORITHM</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    20
<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    21
<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    22
<li><a name="TOC7" href="#SEC7">AUTHOR</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    23
<li><a name="TOC8" href="#SEC8">REVISION</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    24
</ul>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    25
<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    26
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    27
This document describes the two different algorithms that are available in PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    28
for matching a compiled regular expression against a given subject string. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    29
"standard" algorithm is the one provided by the <b>pcre_exec()</b> function.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    30
This works in the same was as Perl's matching function, and provides a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    31
Perl-compatible matching operation.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    32
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    33
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    34
An alternative algorithm is provided by the <b>pcre_dfa_exec()</b> function;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    35
this operates in a different way, and is not Perl-compatible. It has advantages
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    36
and disadvantages compared with the standard algorithm, and these are described
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    37
below.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    38
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    39
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    40
When there is only one possible way in which a given subject string can match a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    41
pattern, the two algorithms give the same answer. A difference arises, however,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    42
when there are multiple possibilities. For example, if the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    43
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    44
  ^&#60;.*&#62;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    45
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    46
is matched against the string
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    47
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    48
  &#60;something&#62; &#60;something else&#62; &#60;something further&#62;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    49
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    50
there are three possible answers. The standard algorithm finds only one of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    51
them, whereas the alternative algorithm finds all three.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    52
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    53
<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    54
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    55
The set of strings that are matched by a regular expression can be represented
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    56
as a tree structure. An unlimited repetition in the pattern makes the tree of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    57
infinite size, but it is still a tree. Matching the pattern to a given subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    58
string (from a given starting point) can be thought of as a search of the tree.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    59
There are two ways to search a tree: depth-first and breadth-first, and these
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    60
correspond to the two matching algorithms provided by PCRE.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    61
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    62
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    63
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    64
In the terminology of Jeffrey Friedl's book "Mastering Regular
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    65
Expressions", the standard algorithm is an "NFA algorithm". It conducts a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    66
depth-first search of the pattern tree. That is, it proceeds along a single
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    67
path through the tree, checking that the subject matches what is required. When
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    68
there is a mismatch, the algorithm tries any alternatives at the current point,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    69
and if they all fail, it backs up to the previous branch point in the tree, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    70
tries the next alternative branch at that level. This often involves backing up
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    71
(moving to the left) in the subject string as well. The order in which
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    72
repetition branches are tried is controlled by the greedy or ungreedy nature of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    73
the quantifier.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    74
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    75
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    76
If a leaf node is reached, a matching string has been found, and at that point
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    77
the algorithm stops. Thus, if there is more than one possible match, this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    78
algorithm returns the first one that it finds. Whether this is the shortest,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    79
the longest, or some intermediate length depends on the way the greedy and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    80
ungreedy repetition quantifiers are specified in the pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    81
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    82
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    83
Because it ends up with a single path through the tree, it is relatively
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    84
straightforward for this algorithm to keep track of the substrings that are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    85
matched by portions of the pattern in parentheses. This provides support for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    86
capturing parentheses and back references.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    87
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    88
<br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    89
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    90
This algorithm conducts a breadth-first search of the tree. Starting from the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    91
first matching point in the subject, it scans the subject string from left to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    92
right, once, character by character, and as it does this, it remembers all the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    93
paths through the tree that represent valid matches. In Friedl's terminology,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    94
this is a kind of "DFA algorithm", though it is not implemented as a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    95
traditional finite state machine (it keeps multiple states active
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    96
simultaneously).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    97
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    98
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    99
The scan continues until either the end of the subject is reached, or there are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   100
no more unterminated paths. At this point, terminated paths represent the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   101
different matching possibilities (if there are none, the match has failed).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   102
Thus, if there is more than one possible match, this algorithm finds all of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   103
them, and in particular, it finds the longest. In PCRE, there is an option to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   104
stop the algorithm after the first match (which is necessarily the shortest)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   105
has been found.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   106
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   107
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   108
Note that all the matches that are found start at the same point in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   109
subject. If the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   110
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   111
  cat(er(pillar)?)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   112
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   113
is matched against the string "the caterpillar catchment", the result will be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   114
the three strings "cat", "cater", and "caterpillar" that start at the fourth
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   115
character of the subject. The algorithm does not automatically move on to find
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   116
matches that start at later positions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   117
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   118
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   119
There are a number of features of PCRE regular expressions that are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   120
supported by the alternative matching algorithm. They are as follows:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   121
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   122
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   123
1. Because the algorithm finds all possible matches, the greedy or ungreedy
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   124
nature of repetition quantifiers is not relevant. Greedy and ungreedy
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   125
quantifiers are treated in exactly the same way. However, possessive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   126
quantifiers can make a difference when what follows could also match what is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   127
quantified, for example in a pattern like this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   128
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   129
  ^a++\w!
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   130
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   131
This pattern matches "aaab!" but not "aaa!", which would be matched by a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   132
non-possessive quantifier. Similarly, if an atomic group is present, it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   133
matched as if it were a standalone pattern at the current point, and the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   134
longest match is then "locked in" for the rest of the overall pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   135
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   136
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   137
2. When dealing with multiple paths through the tree simultaneously, it is not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   138
straightforward to keep track of captured substrings for the different matching
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   139
possibilities, and PCRE's implementation of this algorithm does not attempt to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   140
do this. This means that no captured substrings are available.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   141
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   142
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   143
3. Because no substrings are captured, back references within the pattern are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   144
not supported, and cause errors if encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   145
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   146
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   147
4. For the same reason, conditional expressions that use a backreference as the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   148
condition or test for a specific group recursion are not supported.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   149
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   150
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   151
5. Because many paths through the tree may be active, the \K escape sequence,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   152
which resets the start of the match when encountered (but may be on some paths
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   153
and not on others), is not supported. It causes an error if encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   154
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   155
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   156
6. Callouts are supported, but the value of the <i>capture_top</i> field is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   157
always 1, and the value of the <i>capture_last</i> field is always -1.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   158
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   159
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   160
7. The \C escape sequence, which (in the standard algorithm) matches a single
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   161
byte, even in UTF-8 mode, is not supported because the alternative algorithm
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   162
moves through the subject string one character at a time, for all active paths
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   163
through the tree.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   164
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   165
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   166
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   167
supported. (*FAIL) is supported, and behaves like a failing negative assertion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   168
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   169
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   170
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   171
Using the alternative matching algorithm provides the following advantages:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   172
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   173
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   174
1. All possible matches (at a single point in the subject) are automatically
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   175
found, and in particular, the longest match is found. To find more than one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   176
match using the standard algorithm, you have to do kludgy things with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   177
callouts.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   178
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   179
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   180
2. There is much better support for partial matching. The restrictions on the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   181
content of the pattern that apply when using the standard algorithm for partial
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   182
matching do not apply to the alternative algorithm. For non-anchored patterns,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   183
the starting position of a partial match is available.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   184
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   185
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   186
3. Because the alternative algorithm scans the subject string just once, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   187
never needs to backtrack, it is possible to pass very long subject strings to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   188
the matching function in several pieces, checking for partial matching each
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   189
time.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   190
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   191
<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   192
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   193
The alternative algorithm suffers from a number of disadvantages:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   194
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   195
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   196
1. It is substantially slower than the standard algorithm. This is partly
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   197
because it has to search for all possible matches, but is also because it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   198
less susceptible to optimization.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   199
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   200
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   201
2. Capturing parentheses and back references are not supported.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   202
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   203
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   204
3. Although atomic groups are supported, their use does not provide the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   205
performance advantage that it does for the standard algorithm.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   206
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   207
<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   208
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   209
Philip Hazel
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   210
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   211
University Computing Service
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   212
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   213
Cambridge CB2 3QH, England.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   214
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   215
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   216
<br><a name="SEC8" href="#TOC1">REVISION</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   217
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   218
Last updated: 19 April 2008
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   219
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   220
Copyright &copy; 1997-2008 University of Cambridge.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   221
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   222
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   223
Return to the <a href="index.html">PCRE index page</a>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   224
</p>