libraries/spcre/libpcre/pcre/doc/pcrematching.3
author Tom Sutcliffe <thomas.sutcliffe@accenture.com>
Sat, 31 Jul 2010 19:07:57 +0100
changeset 23 092bcc217d9d
parent 0 7f656887cf89
permissions -rw-r--r--
Tidied iocli exports, build macro tweaks. Removed 4 overloads of CCommandBase::RunCommand[L] that are no longer used at all, and changed one more to not be exported as it's only used internally to iocli.dll. fixed builds on platforms that don't support btrace or any form of tracing.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     1
.TH PCREMATCHING 3
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     2
.SH NAME
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     3
PCRE - Perl-compatible regular expressions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     4
.SH "PCRE MATCHING ALGORITHMS"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     5
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     6
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     7
This document describes the two different algorithms that are available in PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     8
for matching a compiled regular expression against a given subject string. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     9
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    10
This works in the same was as Perl's matching function, and provides a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    11
Perl-compatible matching operation.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    12
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    13
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    14
this operates in a different way, and is not Perl-compatible. It has advantages
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    15
and disadvantages compared with the standard algorithm, and these are described
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    16
below.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    17
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    18
When there is only one possible way in which a given subject string can match a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    19
pattern, the two algorithms give the same answer. A difference arises, however,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    20
when there are multiple possibilities. For example, if the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    21
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    22
  ^<.*>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    23
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    24
is matched against the string
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    25
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    26
  <something> <something else> <something further>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    27
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    28
there are three possible answers. The standard algorithm finds only one of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    29
them, whereas the alternative algorithm finds all three.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    30
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    31
.SH "REGULAR EXPRESSIONS AS TREES"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    32
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    33
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    34
The set of strings that are matched by a regular expression can be represented
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    35
as a tree structure. An unlimited repetition in the pattern makes the tree of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    36
infinite size, but it is still a tree. Matching the pattern to a given subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    37
string (from a given starting point) can be thought of as a search of the tree.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    38
There are two ways to search a tree: depth-first and breadth-first, and these
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    39
correspond to the two matching algorithms provided by PCRE.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    40
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    41
.SH "THE STANDARD MATCHING ALGORITHM"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    42
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    43
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    44
In the terminology of Jeffrey Friedl's book "Mastering Regular
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    45
Expressions", the standard algorithm is an "NFA algorithm". It conducts a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    46
depth-first search of the pattern tree. That is, it proceeds along a single
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    47
path through the tree, checking that the subject matches what is required. When
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    48
there is a mismatch, the algorithm tries any alternatives at the current point,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    49
and if they all fail, it backs up to the previous branch point in the tree, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    50
tries the next alternative branch at that level. This often involves backing up
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    51
(moving to the left) in the subject string as well. The order in which
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    52
repetition branches are tried is controlled by the greedy or ungreedy nature of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    53
the quantifier.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    54
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    55
If a leaf node is reached, a matching string has been found, and at that point
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    56
the algorithm stops. Thus, if there is more than one possible match, this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    57
algorithm returns the first one that it finds. Whether this is the shortest,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    58
the longest, or some intermediate length depends on the way the greedy and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    59
ungreedy repetition quantifiers are specified in the pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    60
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    61
Because it ends up with a single path through the tree, it is relatively
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    62
straightforward for this algorithm to keep track of the substrings that are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    63
matched by portions of the pattern in parentheses. This provides support for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    64
capturing parentheses and back references.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    65
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    66
.SH "THE ALTERNATIVE MATCHING ALGORITHM"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    67
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    68
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    69
This algorithm conducts a breadth-first search of the tree. Starting from the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    70
first matching point in the subject, it scans the subject string from left to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    71
right, once, character by character, and as it does this, it remembers all the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    72
paths through the tree that represent valid matches. In Friedl's terminology,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    73
this is a kind of "DFA algorithm", though it is not implemented as a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    74
traditional finite state machine (it keeps multiple states active
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    75
simultaneously).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    76
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    77
The scan continues until either the end of the subject is reached, or there are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    78
no more unterminated paths. At this point, terminated paths represent the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    79
different matching possibilities (if there are none, the match has failed).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    80
Thus, if there is more than one possible match, this algorithm finds all of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    81
them, and in particular, it finds the longest. In PCRE, there is an option to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    82
stop the algorithm after the first match (which is necessarily the shortest)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    83
has been found.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    84
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    85
Note that all the matches that are found start at the same point in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    86
subject. If the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    87
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    88
  cat(er(pillar)?)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    89
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    90
is matched against the string "the caterpillar catchment", the result will be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    91
the three strings "cat", "cater", and "caterpillar" that start at the fourth
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    92
character of the subject. The algorithm does not automatically move on to find
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    93
matches that start at later positions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    94
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    95
There are a number of features of PCRE regular expressions that are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    96
supported by the alternative matching algorithm. They are as follows:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    97
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    98
1. Because the algorithm finds all possible matches, the greedy or ungreedy
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    99
nature of repetition quantifiers is not relevant. Greedy and ungreedy
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   100
quantifiers are treated in exactly the same way. However, possessive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   101
quantifiers can make a difference when what follows could also match what is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   102
quantified, for example in a pattern like this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   103
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   104
  ^a++\ew!
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   105
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   106
This pattern matches "aaab!" but not "aaa!", which would be matched by a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   107
non-possessive quantifier. Similarly, if an atomic group is present, it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   108
matched as if it were a standalone pattern at the current point, and the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   109
longest match is then "locked in" for the rest of the overall pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   110
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   111
2. When dealing with multiple paths through the tree simultaneously, it is not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   112
straightforward to keep track of captured substrings for the different matching
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   113
possibilities, and PCRE's implementation of this algorithm does not attempt to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   114
do this. This means that no captured substrings are available.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   115
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   116
3. Because no substrings are captured, back references within the pattern are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   117
not supported, and cause errors if encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   118
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   119
4. For the same reason, conditional expressions that use a backreference as the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   120
condition or test for a specific group recursion are not supported.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   121
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   122
5. Because many paths through the tree may be active, the \eK escape sequence,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   123
which resets the start of the match when encountered (but may be on some paths
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   124
and not on others), is not supported. It causes an error if encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   125
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   126
6. Callouts are supported, but the value of the \fIcapture_top\fP field is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   127
always 1, and the value of the \fIcapture_last\fP field is always -1.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   128
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   129
7. The \eC escape sequence, which (in the standard algorithm) matches a single
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   130
byte, even in UTF-8 mode, is not supported because the alternative algorithm
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   131
moves through the subject string one character at a time, for all active paths
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   132
through the tree.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   133
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   134
8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   135
supported. (*FAIL) is supported, and behaves like a failing negative assertion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   136
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   137
.SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   138
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   139
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   140
Using the alternative matching algorithm provides the following advantages:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   141
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   142
1. All possible matches (at a single point in the subject) are automatically
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   143
found, and in particular, the longest match is found. To find more than one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   144
match using the standard algorithm, you have to do kludgy things with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   145
callouts.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   146
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   147
2. There is much better support for partial matching. The restrictions on the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   148
content of the pattern that apply when using the standard algorithm for partial
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   149
matching do not apply to the alternative algorithm. For non-anchored patterns,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   150
the starting position of a partial match is available.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   151
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   152
3. Because the alternative algorithm scans the subject string just once, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   153
never needs to backtrack, it is possible to pass very long subject strings to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   154
the matching function in several pieces, checking for partial matching each
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   155
time.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   156
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   157
.SH "DISADVANTAGES OF THE ALTERNATIVE ALGORITHM"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   158
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   159
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   160
The alternative algorithm suffers from a number of disadvantages:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   161
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   162
1. It is substantially slower than the standard algorithm. This is partly
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   163
because it has to search for all possible matches, but is also because it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   164
less susceptible to optimization.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   165
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   166
2. Capturing parentheses and back references are not supported.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   167
.P
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   168
3. Although atomic groups are supported, their use does not provide the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   169
performance advantage that it does for the standard algorithm.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   170
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   171
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   172
.SH AUTHOR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   173
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   174
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   175
.nf
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   176
Philip Hazel
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   177
University Computing Service
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   178
Cambridge CB2 3QH, England.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   179
.fi
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   180
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   181
.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   182
.SH REVISION
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   183
.rs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   184
.sp
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   185
.nf
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   186
Last updated: 19 April 2008
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   187
Copyright (c) 1997-2008 University of Cambridge.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   188
.fi