libraries/spcre/libpcre/pcre/HACKING
author Tom Sutcliffe <thomas.sutcliffe@accenture.com>
Thu, 26 Aug 2010 00:49:35 +0100
changeset 45 534b01198c2d
parent 0 7f656887cf89
permissions -rw-r--r--
Added ENotifyKeypresses and ECaptureCtrlC flags to CCommandBase. Commands can now get keypresses and handle ctrl-C via callbacks instead of having to implement custom active objects. As part of this extended the CCommandBase extension interface to MCommandExtensionsV2 for the new virtual functions KeyPressed(TUint aKeyCode, TUint aModifiers) and CtrlCPressed(). sudo now cleans up correctly by using ECaptureCtrlC.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     1
Technical Notes about PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     2
--------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     3
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     4
These are very rough technical notes that record potentially useful information 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     5
about PCRE internals.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     6
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     7
Historical note 1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     8
-----------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     9
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    10
Many years ago I implemented some regular expression functions to an algorithm
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    11
suggested by Martin Richards. These were not Unix-like in form, and were quite
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    12
restricted in what they could do by comparison with Perl. The interesting part
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    13
about the algorithm was that the amount of space required to hold the compiled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    14
form of an expression was known in advance. The code to apply an expression did
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    15
not operate by backtracking, as the original Henry Spencer code and current
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    16
Perl code does, but instead checked all possibilities simultaneously by keeping
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    17
a list of current states and checking all of them as it advanced through the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    18
subject string. In the terminology of Jeffrey Friedl's book, it was a "DFA
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    19
algorithm", though it was not a traditional Finite State Machine (FSM). When
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    20
the pattern was all used up, all remaining states were possible matches, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    21
the one matching the longest subset of the subject string was chosen. This did
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    22
not necessarily maximize the individual wild portions of the pattern, as is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    23
expected in Unix and Perl-style regular expressions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    24
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    25
Historical note 2
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    26
-----------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    27
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    28
By contrast, the code originally written by Henry Spencer (which was
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    29
subsequently heavily modified for Perl) compiles the expression twice: once in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    30
a dummy mode in order to find out how much store will be needed, and then for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    31
real. (The Perl version probably doesn't do this any more; I'm talking about
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    32
the original library.) The execution function operates by backtracking and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    33
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    34
matches individual wild portions of the pattern. This is an "NFA algorithm" in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    35
Friedl's terminology.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    36
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    37
OK, here's the real stuff
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    38
-------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    39
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    40
For the set of functions that form the "basic" PCRE library (which are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    41
unrelated to those mentioned above), I tried at first to invent an algorithm
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    42
that used an amount of store bounded by a multiple of the number of characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    43
in the pattern, to save on compiling time. However, because of the greater
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    44
complexity in Perl regular expressions, I couldn't do this. In any case, a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    45
first pass through the pattern is helpful for other reasons. 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    46
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    47
Computing the memory requirement: how it was
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    48
--------------------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    49
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    50
Up to and including release 6.7, PCRE worked by running a very degenerate first
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    51
pass to calculate a maximum store size, and then a second pass to do the real
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    52
compile - which might use a bit less than the predicted amount of memory. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    53
idea was that this would turn out faster than the Henry Spencer code because
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    54
the first pass is degenerate and the second pass can just store stuff straight
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    55
into the vector, which it knows is big enough.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    56
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    57
Computing the memory requirement: how it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    58
-------------------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    59
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    60
By the time I was working on a potential 6.8 release, the degenerate first pass
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    61
had become very complicated and hard to maintain. Indeed one of the early
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    62
things I did for 6.8 was to fix Yet Another Bug in the memory computation. Then
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    63
I had a flash of inspiration as to how I could run the real compile function in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    64
a "fake" mode that enables it to compute how much memory it would need, while
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    65
actually only ever using a few hundred bytes of working memory, and without too
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    66
many tests of the mode that might slow it down. So I re-factored the compiling
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    67
functions to work this way. This got rid of about 600 lines of source. It
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    68
should make future maintenance and development easier. As this was such a major 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    69
change, I never released 6.8, instead upping the number to 7.0 (other quite 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    70
major changes are also present in the 7.0 release).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    71
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    72
A side effect of this work is that the previous limit of 200 on the nesting
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    73
depth of parentheses was removed. However, there is a downside: pcre_compile()
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    74
runs more slowly than before (30% or more, depending on the pattern) because it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    75
is doing a full analysis of the pattern. My hope is that this is not a big
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    76
issue.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    77
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    78
Traditional matching function
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    79
-----------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    80
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    81
The "traditional", and original, matching function is called pcre_exec(), and 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    82
it implements an NFA algorithm, similar to the original Henry Spencer algorithm 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    83
and the way that Perl works. Not surprising, since it is intended to be as 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    84
compatible with Perl as possible. This is the function most users of PCRE will 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    85
use most of the time.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    86
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    87
Supplementary matching function
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    88
-------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    89
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    90
From PCRE 6.0, there is also a supplementary matching function called 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    91
pcre_dfa_exec(). This implements a DFA matching algorithm that searches 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    92
simultaneously for all possible matches that start at one point in the subject 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    93
string. (Going back to my roots: see Historical Note 1 above.) This function 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    94
intreprets the same compiled pattern data as pcre_exec(); however, not all the 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    95
facilities are available, and those that are do not always work in quite the 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    96
same way. See the user documentation for details.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    97
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    98
The algorithm that is used for pcre_dfa_exec() is not a traditional FSM, 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    99
because it may have a number of states active at one time. More work would be 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   100
needed at compile time to produce a traditional FSM where only one state is 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   101
ever active at once. I believe some other regex matchers work this way.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   102
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   103
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   104
Format of compiled patterns
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   105
---------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   106
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   107
The compiled form of a pattern is a vector of bytes, containing items of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   108
variable length. The first byte in an item is an opcode, and the length of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   109
item is either implicit in the opcode or contained in the data bytes that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   110
follow it. 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   111
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   112
In many cases below LINK_SIZE data values are specified for offsets within the 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   113
compiled pattern. The default value for LINK_SIZE is 2, but PCRE can be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   114
compiled to use 3-byte or 4-byte values for these offsets (impairing the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   115
performance). This is necessary only when patterns whose compiled length is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   116
greater than 64K are going to be processed. In this description, we assume the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   117
"normal" compilation options. Data values that are counts (e.g. for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   118
quantifiers) are always just two bytes long.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   119
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   120
A list of the opcodes follows:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   121
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   122
Opcodes with no following data
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   123
------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   124
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   125
These items are all just one byte long
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   126
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   127
  OP_END                 end of pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   128
  OP_ANY                 match any one character other than newline
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   129
  OP_ALLANY              match any one character, including newline
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   130
  OP_ANYBYTE             match any single byte, even in UTF-8 mode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   131
  OP_SOD                 match start of data: \A
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   132
  OP_SOM,                start of match (subject + offset): \G
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   133
  OP_SET_SOM,            set start of match (\K) 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   134
  OP_CIRC                ^ (start of data, or after \n in multiline)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   135
  OP_NOT_WORD_BOUNDARY   \W
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   136
  OP_WORD_BOUNDARY       \w
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   137
  OP_NOT_DIGIT           \D
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   138
  OP_DIGIT               \d
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   139
  OP_NOT_HSPACE          \H
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   140
  OP_HSPACE              \h  
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   141
  OP_NOT_WHITESPACE      \S
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   142
  OP_WHITESPACE          \s
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   143
  OP_NOT_VSPACE          \V
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   144
  OP_VSPACE              \v  
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   145
  OP_NOT_WORDCHAR        \W
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   146
  OP_WORDCHAR            \w
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   147
  OP_EODN                match end of data or \n at end: \Z
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   148
  OP_EOD                 match end of data: \z
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   149
  OP_DOLL                $ (end of data, or before \n in multiline)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   150
  OP_EXTUNI              match an extended Unicode character 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   151
  OP_ANYNL               match any Unicode newline sequence 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   152
  
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   153
  OP_ACCEPT              )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   154
  OP_COMMIT              ) 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   155
  OP_FAIL                ) These are Perl 5.10's "backtracking     
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   156
  OP_PRUNE               ) control verbs".                         
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   157
  OP_SKIP                )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   158
  OP_THEN                )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   159
  
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   160
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   161
Repeating single characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   162
---------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   163
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   164
The common repeats (*, +, ?) when applied to a single character use the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   165
following opcodes:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   166
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   167
  OP_STAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   168
  OP_MINSTAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   169
  OP_POSSTAR 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   170
  OP_PLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   171
  OP_MINPLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   172
  OP_POSPLUS 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   173
  OP_QUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   174
  OP_MINQUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   175
  OP_POSQUERY 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   176
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   177
In ASCII mode, these are two-byte items; in UTF-8 mode, the length is variable.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   178
Those with "MIN" in their name are the minimizing versions. Those with "POS" in 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   179
their names are possessive versions. Each is followed by the character that is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   180
to be repeated. Other repeats make use of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   181
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   182
  OP_UPTO
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   183
  OP_MINUPTO
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   184
  OP_POSUPTO 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   185
  OP_EXACT
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   186
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   187
which are followed by a two-byte count (most significant first) and the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   188
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   189
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   190
OP_UPTO (or OP_MINUPTO or OPT_POSUPTO).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   191
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   192
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   193
Repeating character types
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   194
-------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   195
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   196
Repeats of things like \d are done exactly as for single characters, except
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   197
that instead of a character, the opcode for the type is stored in the data
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   198
byte. The opcodes are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   199
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   200
  OP_TYPESTAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   201
  OP_TYPEMINSTAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   202
  OP_TYPEPOSSTAR 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   203
  OP_TYPEPLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   204
  OP_TYPEMINPLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   205
  OP_TYPEPOSPLUS 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   206
  OP_TYPEQUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   207
  OP_TYPEMINQUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   208
  OP_TYPEPOSQUERY 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   209
  OP_TYPEUPTO
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   210
  OP_TYPEMINUPTO
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   211
  OP_TYPEPOSUPTO 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   212
  OP_TYPEEXACT
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   213
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   214
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   215
Match by Unicode property
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   216
-------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   217
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   218
OP_PROP and OP_NOTPROP are used for positive and negative matches of a 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   219
character by testing its Unicode property (the \p and \P escape sequences).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   220
Each is followed by two bytes that encode the desired property as a type and a 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   221
value.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   222
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   223
Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   224
three bytes: OP_PROP or OP_NOTPROP and then the desired property type and 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   225
value.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   226
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   227
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   228
Matching literal characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   229
---------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   230
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   231
The OP_CHAR opcode is followed by a single character that is to be matched 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   232
casefully. For caseless matching, OP_CHARNC is used. In UTF-8 mode, the 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   233
character may be more than one byte long. (Earlier versions of PCRE used 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   234
multi-character strings, but this was changed to allow some new features to be 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   235
added.)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   236
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   237
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   238
Character classes
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   239
-----------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   240
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   241
If there is only one character, OP_CHAR or OP_CHARNC is used for a positive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   242
class, and OP_NOT for a negative one (that is, for something like [^a]).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   243
However, in UTF-8 mode, the use of OP_NOT applies only to characters with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   244
values < 128, because OP_NOT is confined to single bytes.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   245
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   246
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   247
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   248
repeated positive single-character class.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   249
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   250
When there's more than one character in a class and all the characters are less
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   251
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   252
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   253
bit for every character that is acceptable. The bits are counted from the least
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   254
significant end of each byte.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   255
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   256
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   257
subject characters with values greater than 256 can be handled correctly. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   258
OP_CLASS they don't match, whereas for OP_NCLASS they do.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   259
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   260
For classes containing characters with values > 255, OP_XCLASS is used. It
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   261
optionally uses a bit map (if any characters lie within it), followed by a list
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   262
of pairs and single characters. There is a flag character than indicates
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   263
whether it's a positive or a negative class.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   264
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   265
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   266
Back references
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   267
---------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   268
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   269
OP_REF is followed by two bytes containing the reference number.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   270
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   271
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   272
Repeating character classes and back references
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   273
-----------------------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   274
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   275
Single-character classes are handled specially (see above). This section
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   276
applies to OP_CLASS and OP_REF. In both cases, the repeat information follows
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   277
the base item. The matching code looks at the following opcode to see if it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   278
one of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   279
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   280
  OP_CRSTAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   281
  OP_CRMINSTAR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   282
  OP_CRPLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   283
  OP_CRMINPLUS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   284
  OP_CRQUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   285
  OP_CRMINQUERY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   286
  OP_CRRANGE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   287
  OP_CRMINRANGE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   288
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   289
All but the last two are just single-byte items. The others are followed by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   290
four bytes of data, comprising the minimum and maximum repeat counts. There are 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   291
no special possessive opcodes for these repeats; a possessive repeat is 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   292
compiled into an atomic group.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   293
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   294
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   295
Brackets and alternation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   296
------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   297
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   298
A pair of non-capturing (round) brackets is wrapped round each expression at
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   299
compile time, so alternation always happens in the context of brackets.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   300
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   301
[Note for North Americans: "bracket" to some English speakers, including
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   302
myself, can be round, square, curly, or pointy. Hence this usage.]
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   303
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   304
Non-capturing brackets use the opcode OP_BRA. Originally PCRE was limited to 99
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   305
capturing brackets and it used a different opcode for each one. From release
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   306
3.5, the limit was removed by putting the bracket number into the data for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   307
higher-numbered brackets. From release 7.0 all capturing brackets are handled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   308
this way, using the single opcode OP_CBRA.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   309
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   310
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   311
next alternative OP_ALT or, if there aren't any branches, to the matching
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   312
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   313
the next one, or to the OP_KET opcode. For capturing brackets, the bracket 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   314
number immediately follows the offset, always as a 2-byte item.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   315
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   316
OP_KET is used for subpatterns that do not repeat indefinitely, while
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   317
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   318
maximally respectively. All three are followed by LINK_SIZE bytes giving (as a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   319
positive number) the offset back to the matching bracket opcode.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   320
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   321
If a subpattern is quantified such that it is permitted to match zero times, it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   322
is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   323
single-byte opcodes that tell the matcher that skipping the following
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   324
subpattern entirely is a valid branch. In the case of the first two, not 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   325
skipping the pattern is also valid (greedy and non-greedy). The third is used 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   326
when a pattern has the quantifier {0,0}. It cannot be entirely discarded, 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   327
because it may be called as a subroutine from elsewhere in the regex.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   328
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   329
A subpattern with an indefinite maximum repetition is replicated in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   330
compiled data its minimum number of times (or once with OP_BRAZERO if the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   331
minimum is zero), with the final copy terminating with OP_KETRMIN or OP_KETRMAX
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   332
as appropriate.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   333
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   334
A subpattern with a bounded maximum repetition is replicated in a nested
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   335
fashion up to the maximum number of times, with OP_BRAZERO or OP_BRAMINZERO
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   336
before each replication after the minimum, so that, for example, (abc){2,5} is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   337
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?, except that each bracketed group 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   338
has the same number.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   339
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   340
When a repeated subpattern has an unbounded upper limit, it is checked to see 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   341
whether it could match an empty string. If this is the case, the opcode in the 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   342
final replication is changed to OP_SBRA or OP_SCBRA. This tells the matcher
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   343
that it needs to check for matching an empty string when it hits OP_KETRMIN or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   344
OP_KETRMAX, and if so, to break the loop.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   345
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   346
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   347
Assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   348
----------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   349
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   350
Forward assertions are just like other subpatterns, but starting with one of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   351
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   352
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   353
is OP_REVERSE, followed by a two byte count of the number of characters to move
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   354
back the pointer in the subject string. When operating in UTF-8 mode, the count
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   355
is a character count rather than a byte count. A separate count is present in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   356
each alternative of a lookbehind assertion, allowing them to have different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   357
fixed lengths.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   358
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   359
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   360
Once-only (atomic) subpatterns
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   361
------------------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   362
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   363
These are also just like other subpatterns, but they start with the opcode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   364
OP_ONCE. The check for matching an empty string in an unbounded repeat is 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   365
handled entirely at runtime, so there is just this one opcode.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   366
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   367
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   368
Conditional subpatterns
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   369
-----------------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   370
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   371
These are like other subpatterns, but they start with the opcode OP_COND, or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   372
OP_SCOND for one that might match an empty string in an unbounded repeat. If
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   373
the condition is a back reference, this is stored at the start of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   374
subpattern using the opcode OP_CREF followed by two bytes containing the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   375
reference number. If the condition is "in recursion" (coded as "(?(R)"), or "in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   376
recursion of group x" (coded as "(?(Rx)"), the group number is stored at the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   377
start of the subpattern using the opcode OP_RREF, and a value of zero for "the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   378
whole pattern". For a DEFINE condition, just the single byte OP_DEF is used (it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   379
has no associated data). Otherwise, a conditional subpattern always starts with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   380
one of the assertions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   381
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   382
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   383
Recursion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   384
---------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   385
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   386
Recursion either matches the current regex, or some subexpression. The opcode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   387
OP_RECURSE is followed by an value which is the offset to the starting bracket
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   388
from the start of the whole pattern. From release 6.5, OP_RECURSE is 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   389
automatically wrapped inside OP_ONCE brackets (because otherwise some patterns 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   390
broke it). OP_RECURSE is also used for "subroutine" calls, even though they 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   391
are not strictly a recursion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   392
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   393
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   394
Callout
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   395
-------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   396
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   397
OP_CALLOUT is followed by one byte of data that holds a callout number in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   398
range 0 to 254 for manual callouts, or 255 for an automatic callout. In both 
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   399
cases there follows a two-byte value giving the offset in the pattern to the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   400
start of the following item, and another two-byte item giving the length of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   401
next item.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   402
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   403
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   404
Changing options
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   405
----------------
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   406
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   407
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   408
opcode is compiled, followed by one byte containing the new settings of these
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   409
flags. If there are several alternatives, there is an occurrence of OP_OPT at
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   410
the start of all those following the first options change, to set appropriate
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   411
options for the start of the alternative. Immediately after the end of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   412
group there is another such item to reset the flags to their previous values. A
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   413
change of flag right at the very start of the pattern can be handled entirely
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   414
at compile time, and so does not cause anything to be put into the compiled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   415
data.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   416
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   417
Philip Hazel
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   418
April 2008