libraries/spcre/libpcre/pcre/doc/html/pcrepattern.html
author Tom Sutcliffe <thomas.sutcliffe@accenture.com>
Sat, 31 Jul 2010 19:07:57 +0100
changeset 23 092bcc217d9d
parent 0 7f656887cf89
permissions -rw-r--r--
Tidied iocli exports, build macro tweaks. Removed 4 overloads of CCommandBase::RunCommand[L] that are no longer used at all, and changed one more to not be exported as it's only used internally to iocli.dll. fixed builds on platforms that don't support btrace or any form of tracing.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     1
<html>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     2
<head>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     3
<title>pcrepattern specification</title>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     4
</head>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     5
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     6
<h1>pcrepattern man page</h1>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     7
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     8
Return to the <a href="index.html">PCRE index page</a>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
     9
</p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    10
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    11
This page is part of the PCRE HTML documentation. It was generated automatically
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    12
from the original man page. If there is any nonsense in it, please consult the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    13
man page, in case the conversion went wrong.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    14
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    15
<ul>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    16
<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION DETAILS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    17
<li><a name="TOC2" href="#SEC2">NEWLINE CONVENTIONS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    18
<li><a name="TOC3" href="#SEC3">CHARACTERS AND METACHARACTERS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    19
<li><a name="TOC4" href="#SEC4">BACKSLASH</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    20
<li><a name="TOC5" href="#SEC5">CIRCUMFLEX AND DOLLAR</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    21
<li><a name="TOC6" href="#SEC6">FULL STOP (PERIOD, DOT)</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    22
<li><a name="TOC7" href="#SEC7">MATCHING A SINGLE BYTE</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    23
<li><a name="TOC8" href="#SEC8">SQUARE BRACKETS AND CHARACTER CLASSES</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    24
<li><a name="TOC9" href="#SEC9">POSIX CHARACTER CLASSES</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    25
<li><a name="TOC10" href="#SEC10">VERTICAL BAR</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    26
<li><a name="TOC11" href="#SEC11">INTERNAL OPTION SETTING</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    27
<li><a name="TOC12" href="#SEC12">SUBPATTERNS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    28
<li><a name="TOC13" href="#SEC13">DUPLICATE SUBPATTERN NUMBERS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    29
<li><a name="TOC14" href="#SEC14">NAMED SUBPATTERNS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    30
<li><a name="TOC15" href="#SEC15">REPETITION</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    31
<li><a name="TOC16" href="#SEC16">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    32
<li><a name="TOC17" href="#SEC17">BACK REFERENCES</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    33
<li><a name="TOC18" href="#SEC18">ASSERTIONS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    34
<li><a name="TOC19" href="#SEC19">CONDITIONAL SUBPATTERNS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    35
<li><a name="TOC20" href="#SEC20">COMMENTS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    36
<li><a name="TOC21" href="#SEC21">RECURSIVE PATTERNS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    37
<li><a name="TOC22" href="#SEC22">SUBPATTERNS AS SUBROUTINES</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    38
<li><a name="TOC23" href="#SEC23">ONIGURUMA SUBROUTINE SYNTAX</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    39
<li><a name="TOC24" href="#SEC24">CALLOUTS</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    40
<li><a name="TOC25" href="#SEC25">BACKTRACKING CONTROL</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    41
<li><a name="TOC26" href="#SEC26">SEE ALSO</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    42
<li><a name="TOC27" href="#SEC27">AUTHOR</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    43
<li><a name="TOC28" href="#SEC28">REVISION</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    44
</ul>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    45
<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION DETAILS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    46
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    47
The syntax and semantics of the regular expressions that are supported by PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    48
are described in detail below. There is a quick-reference syntax summary in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    49
<a href="pcresyntax.html"><b>pcresyntax</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    50
page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    51
also supports some alternative regular expression syntax (which does not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    52
conflict with the Perl syntax) in order to provide some compatibility with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    53
regular expressions in Python, .NET, and Oniguruma.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    54
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    55
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    56
Perl's regular expressions are described in its own documentation, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    57
regular expressions in general are covered in a number of books, some of which
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    58
have copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    59
published by O'Reilly, covers regular expressions in great detail. This
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    60
description of PCRE's regular expressions is intended as reference material.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    61
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    62
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    63
The original operation of PCRE was on strings of one-byte characters. However,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    64
there is now also support for UTF-8 character strings. To use this, you must
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    65
build PCRE to include UTF-8 support, and then call <b>pcre_compile()</b> with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    66
the PCRE_UTF8 option. How this affects pattern matching is mentioned in several
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    67
places below. There is also a summary of UTF-8 features in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    68
<a href="pcre.html#utf8support">section on UTF-8 support</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    69
in the main
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    70
<a href="pcre.html"><b>pcre</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    71
page.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    72
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    73
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    74
The remainder of this document discusses the patterns that are supported by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    75
PCRE when its main matching function, <b>pcre_exec()</b>, is used.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    76
From release 6.0, PCRE offers a second matching function,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    77
<b>pcre_dfa_exec()</b>, which matches using a different algorithm that is not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    78
Perl-compatible. Some of the features discussed below are not available when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    79
<b>pcre_dfa_exec()</b> is used. The advantages and disadvantages of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    80
alternative function, and how it differs from the normal function, are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    81
discussed in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    82
<a href="pcrematching.html"><b>pcrematching</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    83
page.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    84
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    85
<br><a name="SEC2" href="#TOC1">NEWLINE CONVENTIONS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    86
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    87
PCRE supports five different conventions for indicating line breaks in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    88
strings: a single CR (carriage return) character, a single LF (linefeed)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    89
character, the two-character sequence CRLF, any of the three preceding, or any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    90
Unicode newline sequence. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    91
<a href="pcreapi.html"><b>pcreapi</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    92
page has
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    93
<a href="pcreapi.html#newlines">further discussion</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    94
about newlines, and shows how to set the newline convention in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    95
<i>options</i> arguments for the compiling and matching functions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    96
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    97
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    98
It is also possible to specify a newline convention by starting a pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
    99
string with one of the following five sequences:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   100
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   101
  (*CR)        carriage return
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   102
  (*LF)        linefeed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   103
  (*CRLF)      carriage return, followed by linefeed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   104
  (*ANYCRLF)   any of the three above
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   105
  (*ANY)       all Unicode newline sequences
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   106
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   107
These override the default and the options given to <b>pcre_compile()</b>. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   108
example, on a Unix system where LF is the default newline sequence, the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   109
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   110
  (*CR)a.b
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   111
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   112
changes the convention to CR. That pattern matches "a\nb" because LF is no
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   113
longer a newline. Note that these special settings, which are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   114
Perl-compatible, are recognized only at the very start of a pattern, and that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   115
they must be in upper case. If more than one of them is present, the last one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   116
is used.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   117
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   118
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   119
The newline convention does not affect what the \R escape sequence matches. By
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   120
default, this is any Unicode newline sequence, for Perl compatibility. However,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   121
this can be changed; see the description of \R in the section entitled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   122
<a href="#newlineseq">"Newline sequences"</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   123
below. A change of \R setting can be combined with a change of newline
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   124
convention.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   125
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   126
<br><a name="SEC3" href="#TOC1">CHARACTERS AND METACHARACTERS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   127
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   128
A regular expression is a pattern that is matched against a subject string from
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   129
left to right. Most characters stand for themselves in a pattern, and match the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   130
corresponding characters in the subject. As a trivial example, the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   131
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   132
  The quick brown fox
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   133
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   134
matches a portion of a subject string that is identical to itself. When
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   135
caseless matching is specified (the PCRE_CASELESS option), letters are matched
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   136
independently of case. In UTF-8 mode, PCRE always understands the concept of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   137
case for characters whose values are less than 128, so caseless matching is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   138
always possible. For characters with higher values, the concept of case is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   139
supported if PCRE is compiled with Unicode property support, but not otherwise.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   140
If you want to use caseless matching for characters 128 and above, you must
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   141
ensure that PCRE is compiled with Unicode property support as well as with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   142
UTF-8 support.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   143
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   144
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   145
The power of regular expressions comes from the ability to include alternatives
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   146
and repetitions in the pattern. These are encoded in the pattern by the use of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   147
<i>metacharacters</i>, which do not stand for themselves but instead are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   148
interpreted in some special way.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   149
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   150
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   151
There are two different sets of metacharacters: those that are recognized
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   152
anywhere in the pattern except within square brackets, and those that are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   153
recognized within square brackets. Outside square brackets, the metacharacters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   154
are as follows:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   155
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   156
  \      general escape character with several uses
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   157
  ^      assert start of string (or line, in multiline mode)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   158
  $      assert end of string (or line, in multiline mode)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   159
  .      match any character except newline (by default)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   160
  [      start character class definition
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   161
  |      start of alternative branch
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   162
  (      start subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   163
  )      end subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   164
  ?      extends the meaning of (
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   165
         also 0 or 1 quantifier
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   166
         also quantifier minimizer
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   167
  *      0 or more quantifier
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   168
  +      1 or more quantifier
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   169
         also "possessive quantifier"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   170
  {      start min/max quantifier
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   171
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   172
Part of a pattern that is in square brackets is called a "character class". In
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   173
a character class the only metacharacters are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   174
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   175
  \      general escape character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   176
  ^      negate the class, but only if the first character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   177
  -      indicates character range
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   178
  [      POSIX character class (only if followed by POSIX syntax)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   179
  ]      terminates the character class
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   180
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   181
The following sections describe the use of each of the metacharacters.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   182
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   183
<br><a name="SEC4" href="#TOC1">BACKSLASH</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   184
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   185
The backslash character has several uses. Firstly, if it is followed by a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   186
non-alphanumeric character, it takes away any special meaning that character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   187
may have. This use of backslash as an escape character applies both inside and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   188
outside character classes.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   189
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   190
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   191
For example, if you want to match a * character, you write \* in the pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   192
This escaping action applies whether or not the following character would
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   193
otherwise be interpreted as a metacharacter, so it is always safe to precede a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   194
non-alphanumeric with backslash to specify that it stands for itself. In
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   195
particular, if you want to match a backslash, you write \\.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   196
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   197
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   198
If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   199
pattern (other than in a character class) and characters between a # outside
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   200
a character class and the next newline are ignored. An escaping backslash can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   201
be used to include a whitespace or # character as part of the pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   202
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   203
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   204
If you want to remove the special meaning from a sequence of characters, you
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   205
can do so by putting them between \Q and \E. This is different from Perl in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   206
that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   207
Perl, $ and @ cause variable interpolation. Note the following examples:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   208
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   209
  Pattern            PCRE matches   Perl matches
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   210
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   211
  \Qabc$xyz\E        abc$xyz        abc followed by the contents of $xyz
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   212
  \Qabc\$xyz\E       abc\$xyz       abc\$xyz
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   213
  \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   214
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   215
The \Q...\E sequence is recognized both inside and outside character classes.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   216
<a name="digitsafterbackslash"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   217
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   218
Non-printing characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   219
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   220
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   221
A second use of backslash provides a way of encoding non-printing characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   222
in patterns in a visible manner. There is no restriction on the appearance of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   223
non-printing characters, apart from the binary zero that terminates a pattern,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   224
but when a pattern is being prepared by text editing, it is usually easier to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   225
use one of the following escape sequences than the binary character it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   226
represents:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   227
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   228
  \a        alarm, that is, the BEL character (hex 07)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   229
  \cx       "control-x", where x is any character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   230
  \e        escape (hex 1B)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   231
  \f        formfeed (hex 0C)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   232
  \n        linefeed (hex 0A)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   233
  \r        carriage return (hex 0D)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   234
  \t        tab (hex 09)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   235
  \ddd      character with octal code ddd, or backreference
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   236
  \xhh      character with hex code hh
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   237
  \x{hhh..} character with hex code hhh..
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   238
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   239
The precise effect of \cx is as follows: if x is a lower case letter, it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   240
is converted to upper case. Then bit 6 of the character (hex 40) is inverted.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   241
Thus \cz becomes hex 1A, but \c{ becomes hex 3B, while \c; becomes hex
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   242
7B.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   243
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   244
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   245
After \x, from zero to two hexadecimal digits are read (letters can be in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   246
upper or lower case). Any number of hexadecimal digits may appear between \x{
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   247
and }, but the value of the character code must be less than 256 in non-UTF-8
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   248
mode, and less than 2**31 in UTF-8 mode. That is, the maximum value in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   249
hexadecimal is 7FFFFFFF. Note that this is bigger than the largest Unicode code
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   250
point, which is 10FFFF.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   251
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   252
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   253
If characters other than hexadecimal digits appear between \x{ and }, or if
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   254
there is no terminating }, this form of escape is not recognized. Instead, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   255
initial \x will be interpreted as a basic hexadecimal escape, with no
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   256
following digits, giving a character whose value is zero.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   257
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   258
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   259
Characters whose value is less than 256 can be defined by either of the two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   260
syntaxes for \x. There is no difference in the way they are handled. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   261
example, \xdc is exactly the same as \x{dc}.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   262
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   263
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   264
After \0 up to two further octal digits are read. If there are fewer than two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   265
digits, just those that are present are used. Thus the sequence \0\x\07
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   266
specifies two binary zeros followed by a BEL character (code value 7). Make
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   267
sure you supply two digits after the initial zero if the pattern character that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   268
follows is itself an octal digit.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   269
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   270
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   271
The handling of a backslash followed by a digit other than 0 is complicated.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   272
Outside a character class, PCRE reads it and any following digits as a decimal
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   273
number. If the number is less than 10, or if there have been at least that many
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   274
previous capturing left parentheses in the expression, the entire sequence is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   275
taken as a <i>back reference</i>. A description of how this works is given
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   276
<a href="#backreferences">later,</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   277
following the discussion of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   278
<a href="#subpattern">parenthesized subpatterns.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   279
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   280
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   281
Inside a character class, or if the decimal number is greater than 9 and there
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   282
have not been that many capturing subpatterns, PCRE re-reads up to three octal
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   283
digits following the backslash, and uses them to generate a data character. Any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   284
subsequent digits stand for themselves. In non-UTF-8 mode, the value of a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   285
character specified in octal must be less than \400. In UTF-8 mode, values up
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   286
to \777 are permitted. For example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   287
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   288
  \040   is another way of writing a space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   289
  \40    is the same, provided there are fewer than 40 previous capturing subpatterns
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   290
  \7     is always a back reference
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   291
  \11    might be a back reference, or another way of writing a tab
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   292
  \011   is always a tab
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   293
  \0113  is a tab followed by the character "3"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   294
  \113   might be a back reference, otherwise the character with octal code 113
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   295
  \377   might be a back reference, otherwise the byte consisting entirely of 1 bits
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   296
  \81    is either a back reference, or a binary zero followed by the two characters "8" and "1"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   297
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   298
Note that octal values of 100 or greater must not be introduced by a leading
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   299
zero, because no more than three octal digits are ever read.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   300
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   301
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   302
All the sequences that define a single character value can be used both inside
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   303
and outside character classes. In addition, inside a character class, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   304
sequence \b is interpreted as the backspace character (hex 08), and the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   305
sequences \R and \X are interpreted as the characters "R" and "X",
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   306
respectively. Outside a character class, these sequences have different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   307
meanings
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   308
<a href="#uniextseq">(see below).</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   309
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   310
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   311
Absolute and relative back references
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   312
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   313
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   314
The sequence \g followed by an unsigned or a negative number, optionally
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   315
enclosed in braces, is an absolute or relative back reference. A named back
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   316
reference can be coded as \g{name}. Back references are discussed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   317
<a href="#backreferences">later,</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   318
following the discussion of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   319
<a href="#subpattern">parenthesized subpatterns.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   320
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   321
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   322
Absolute and relative subroutine calls
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   323
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   324
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   325
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   326
a number enclosed either in angle brackets or single quotes, is an alternative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   327
syntax for referencing a subpattern as a "subroutine". Details are discussed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   328
<a href="#onigurumasubroutines">later.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   329
Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   330
synonymous. The former is a back reference; the latter is a subroutine call.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   331
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   332
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   333
Generic character types
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   334
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   335
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   336
Another use of backslash is for specifying generic character types. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   337
following are always recognized:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   338
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   339
  \d     any decimal digit
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   340
  \D     any character that is not a decimal digit
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   341
  \h     any horizontal whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   342
  \H     any character that is not a horizontal whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   343
  \s     any whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   344
  \S     any character that is not a whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   345
  \v     any vertical whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   346
  \V     any character that is not a vertical whitespace character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   347
  \w     any "word" character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   348
  \W     any "non-word" character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   349
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   350
Each pair of escape sequences partitions the complete set of characters into
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   351
two disjoint sets. Any given character matches one, and only one, of each pair.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   352
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   353
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   354
These character type sequences can appear both inside and outside character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   355
classes. They each match one character of the appropriate type. If the current
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   356
matching point is at the end of the subject string, all of them fail, since
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   357
there is no character to match.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   358
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   359
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   360
For compatibility with Perl, \s does not match the VT character (code 11).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   361
This makes it different from the the POSIX "space" class. The \s characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   362
are HT (9), LF (10), FF (12), CR (13), and space (32). If "use locale;" is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   363
included in a Perl script, \s may match the VT character. In PCRE, it never
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   364
does.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   365
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   366
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   367
In UTF-8 mode, characters with values greater than 128 never match \d, \s, or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   368
\w, and always match \D, \S, and \W. This is true even when Unicode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   369
character property support is available. These sequences retain their original
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   370
meanings from before UTF-8 support was available, mainly for efficiency
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   371
reasons.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   372
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   373
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   374
The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   375
other sequences, these do match certain high-valued codepoints in UTF-8 mode.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   376
The horizontal space characters are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   377
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   378
  U+0009     Horizontal tab
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   379
  U+0020     Space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   380
  U+00A0     Non-break space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   381
  U+1680     Ogham space mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   382
  U+180E     Mongolian vowel separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   383
  U+2000     En quad
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   384
  U+2001     Em quad
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   385
  U+2002     En space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   386
  U+2003     Em space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   387
  U+2004     Three-per-em space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   388
  U+2005     Four-per-em space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   389
  U+2006     Six-per-em space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   390
  U+2007     Figure space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   391
  U+2008     Punctuation space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   392
  U+2009     Thin space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   393
  U+200A     Hair space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   394
  U+202F     Narrow no-break space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   395
  U+205F     Medium mathematical space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   396
  U+3000     Ideographic space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   397
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   398
The vertical space characters are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   399
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   400
  U+000A     Linefeed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   401
  U+000B     Vertical tab
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   402
  U+000C     Formfeed
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   403
  U+000D     Carriage return
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   404
  U+0085     Next line
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   405
  U+2028     Line separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   406
  U+2029     Paragraph separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   407
</PRE>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   408
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   409
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   410
A "word" character is an underscore or any character less than 256 that is a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   411
letter or digit. The definition of letters and digits is controlled by PCRE's
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   412
low-valued character tables, and may vary if locale-specific matching is taking
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   413
place (see
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   414
<a href="pcreapi.html#localesupport">"Locale support"</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   415
in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   416
<a href="pcreapi.html"><b>pcreapi</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   417
page). For example, in a French locale such as "fr_FR" in Unix-like systems,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   418
or "french" in Windows, some character codes greater than 128 are used for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   419
accented letters, and these are matched by \w. The use of locales with Unicode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   420
is discouraged.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   421
<a name="newlineseq"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   422
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   423
Newline sequences
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   424
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   425
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   426
Outside a character class, by default, the escape sequence \R matches any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   427
Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8 mode \R is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   428
equivalent to the following:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   429
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   430
  (?&#62;\r\n|\n|\x0b|\f|\r|\x85)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   431
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   432
This is an example of an "atomic group", details of which are given
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   433
<a href="#atomicgroup">below.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   434
This particular group matches either the two-character sequence CR followed by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   435
LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   436
U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   437
line, U+0085). The two-character sequence is treated as a single unit that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   438
cannot be split.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   439
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   440
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   441
In UTF-8 mode, two additional characters whose codepoints are greater than 255
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   442
are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   443
Unicode character property support is not needed for these characters to be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   444
recognized.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   445
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   446
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   447
It is possible to restrict \R to match only CR, LF, or CRLF (instead of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   448
complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   449
either at compile time or when the pattern is matched. (BSR is an abbrevation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   450
for "backslash R".) This can be made the default when PCRE is built; if this is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   451
the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   452
It is also possible to specify these settings by starting a pattern string with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   453
one of the following sequences:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   454
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   455
  (*BSR_ANYCRLF)   CR, LF, or CRLF only
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   456
  (*BSR_UNICODE)   any Unicode newline sequence
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   457
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   458
These override the default and the options given to <b>pcre_compile()</b>, but
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   459
they can be overridden by options given to <b>pcre_exec()</b>. Note that these
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   460
special settings, which are not Perl-compatible, are recognized only at the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   461
very start of a pattern, and that they must be in upper case. If more than one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   462
of them is present, the last one is used. They can be combined with a change of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   463
newline convention, for example, a pattern can start with:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   464
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   465
  (*ANY)(*BSR_ANYCRLF)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   466
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   467
Inside a character class, \R matches the letter "R".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   468
<a name="uniextseq"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   469
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   470
Unicode character properties
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   471
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   472
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   473
When PCRE is built with Unicode character property support, three additional
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   474
escape sequences that match characters with specific properties are available.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   475
When not in UTF-8 mode, these sequences are of course limited to testing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   476
characters whose codepoints are less than 256, but they do work in this mode.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   477
The extra escape sequences are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   478
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   479
  \p{<i>xx</i>}   a character with the <i>xx</i> property
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   480
  \P{<i>xx</i>}   a character without the <i>xx</i> property
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   481
  \X       an extended Unicode sequence
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   482
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   483
The property names represented by <i>xx</i> above are limited to the Unicode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   484
script names, the general category properties, and "Any", which matches any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   485
character (including newline). Other properties such as "InMusicalSymbols" are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   486
not currently supported by PCRE. Note that \P{Any} does not match any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   487
characters, so always causes a match failure.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   488
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   489
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   490
Sets of Unicode characters are defined as belonging to certain scripts. A
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   491
character from one of these sets can be matched using a script name. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   492
example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   493
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   494
  \p{Greek}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   495
  \P{Han}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   496
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   497
Those that are not part of an identified script are lumped together as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   498
"Common". The current list of scripts is:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   499
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   500
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   501
Arabic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   502
Armenian,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   503
Balinese,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   504
Bengali,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   505
Bopomofo,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   506
Braille,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   507
Buginese,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   508
Buhid,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   509
Canadian_Aboriginal,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   510
Cherokee,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   511
Common,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   512
Coptic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   513
Cuneiform,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   514
Cypriot,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   515
Cyrillic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   516
Deseret,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   517
Devanagari,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   518
Ethiopic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   519
Georgian,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   520
Glagolitic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   521
Gothic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   522
Greek,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   523
Gujarati,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   524
Gurmukhi,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   525
Han,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   526
Hangul,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   527
Hanunoo,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   528
Hebrew,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   529
Hiragana,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   530
Inherited,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   531
Kannada,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   532
Katakana,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   533
Kharoshthi,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   534
Khmer,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   535
Lao,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   536
Latin,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   537
Limbu,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   538
Linear_B,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   539
Malayalam,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   540
Mongolian,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   541
Myanmar,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   542
New_Tai_Lue,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   543
Nko,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   544
Ogham,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   545
Old_Italic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   546
Old_Persian,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   547
Oriya,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   548
Osmanya,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   549
Phags_Pa,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   550
Phoenician,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   551
Runic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   552
Shavian,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   553
Sinhala,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   554
Syloti_Nagri,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   555
Syriac,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   556
Tagalog,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   557
Tagbanwa,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   558
Tai_Le,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   559
Tamil,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   560
Telugu,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   561
Thaana,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   562
Thai,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   563
Tibetan,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   564
Tifinagh,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   565
Ugaritic,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   566
Yi.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   567
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   568
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   569
Each character has exactly one general category property, specified by a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   570
two-letter abbreviation. For compatibility with Perl, negation can be specified
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   571
by including a circumflex between the opening brace and the property name. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   572
example, \p{^Lu} is the same as \P{Lu}.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   573
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   574
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   575
If only one letter is specified with \p or \P, it includes all the general
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   576
category properties that start with that letter. In this case, in the absence
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   577
of negation, the curly brackets in the escape sequence are optional; these two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   578
examples have the same effect:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   579
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   580
  \p{L}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   581
  \pL
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   582
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   583
The following general category property codes are supported:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   584
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   585
  C     Other
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   586
  Cc    Control
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   587
  Cf    Format
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   588
  Cn    Unassigned
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   589
  Co    Private use
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   590
  Cs    Surrogate
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   591
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   592
  L     Letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   593
  Ll    Lower case letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   594
  Lm    Modifier letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   595
  Lo    Other letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   596
  Lt    Title case letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   597
  Lu    Upper case letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   598
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   599
  M     Mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   600
  Mc    Spacing mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   601
  Me    Enclosing mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   602
  Mn    Non-spacing mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   603
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   604
  N     Number
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   605
  Nd    Decimal number
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   606
  Nl    Letter number
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   607
  No    Other number
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   608
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   609
  P     Punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   610
  Pc    Connector punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   611
  Pd    Dash punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   612
  Pe    Close punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   613
  Pf    Final punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   614
  Pi    Initial punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   615
  Po    Other punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   616
  Ps    Open punctuation
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   617
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   618
  S     Symbol
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   619
  Sc    Currency symbol
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   620
  Sk    Modifier symbol
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   621
  Sm    Mathematical symbol
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   622
  So    Other symbol
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   623
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   624
  Z     Separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   625
  Zl    Line separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   626
  Zp    Paragraph separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   627
  Zs    Space separator
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   628
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   629
The special property L& is also supported: it matches a character that has
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   630
the Lu, Ll, or Lt property, in other words, a letter that is not classified as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   631
a modifier or "other".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   632
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   633
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   634
The Cs (Surrogate) property applies only to characters in the range U+D800 to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   635
U+DFFF. Such characters are not valid in UTF-8 strings (see RFC 3629) and so
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   636
cannot be tested by PCRE, unless UTF-8 validity checking has been turned off
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   637
(see the discussion of PCRE_NO_UTF8_CHECK in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   638
<a href="pcreapi.html"><b>pcreapi</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   639
page).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   640
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   641
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   642
The long synonyms for these properties that Perl supports (such as \p{Letter})
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   643
are not supported by PCRE, nor is it permitted to prefix any of these
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   644
properties with "Is".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   645
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   646
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   647
No character that is in the Unicode table has the Cn (unassigned) property.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   648
Instead, this property is assumed for any code point that is not in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   649
Unicode table.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   650
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   651
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   652
Specifying caseless matching does not affect these escape sequences. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   653
example, \p{Lu} always matches only upper case letters.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   654
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   655
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   656
The \X escape matches any number of Unicode characters that form an extended
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   657
Unicode sequence. \X is equivalent to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   658
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   659
  (?&#62;\PM\pM*)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   660
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   661
That is, it matches a character without the "mark" property, followed by zero
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   662
or more characters with the "mark" property, and treats the sequence as an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   663
atomic group
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   664
<a href="#atomicgroup">(see below).</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   665
Characters with the "mark" property are typically accents that affect the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   666
preceding character. None of them have codepoints less than 256, so in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   667
non-UTF-8 mode \X matches any one character.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   668
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   669
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   670
Matching characters by Unicode property is not fast, because PCRE has to search
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   671
a structure that contains data for over fifteen thousand characters. That is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   672
why the traditional escape sequences such as \d and \w do not use Unicode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   673
properties in PCRE.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   674
<a name="resetmatchstart"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   675
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   676
Resetting the match start
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   677
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   678
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   679
The escape sequence \K, which is a Perl 5.10 feature, causes any previously
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   680
matched characters not to be included in the final matched sequence. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   681
example, the pattern:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   682
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   683
  foo\Kbar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   684
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   685
matches "foobar", but reports that it has matched "bar". This feature is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   686
similar to a lookbehind assertion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   687
<a href="#lookbehind">(described below).</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   688
However, in this case, the part of the subject before the real match does not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   689
have to be of fixed length, as lookbehind assertions do. The use of \K does
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   690
not interfere with the setting of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   691
<a href="#subpattern">captured substrings.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   692
For example, when the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   693
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   694
  (foo)\Kbar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   695
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   696
matches "foobar", the first substring is still set to "foo".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   697
<a name="smallassertions"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   698
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   699
Simple assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   700
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   701
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   702
The final use of backslash is for certain simple assertions. An assertion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   703
specifies a condition that has to be met at a particular point in a match,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   704
without consuming any characters from the subject string. The use of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   705
subpatterns for more complicated assertions is described
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   706
<a href="#bigassertions">below.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   707
The backslashed assertions are:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   708
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   709
  \b     matches at a word boundary
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   710
  \B     matches when not at a word boundary
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   711
  \A     matches at the start of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   712
  \Z     matches at the end of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   713
          also matches before a newline at the end of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   714
  \z     matches only at the end of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   715
  \G     matches at the first matching position in the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   716
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   717
These assertions may not appear in character classes (but note that \b has a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   718
different meaning, namely the backspace character, inside a character class).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   719
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   720
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   721
A word boundary is a position in the subject string where the current character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   722
and the previous character do not both match \w or \W (i.e. one matches
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   723
\w and the other matches \W), or the start or end of the string if the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   724
first or last character matches \w, respectively.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   725
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   726
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   727
The \A, \Z, and \z assertions differ from the traditional circumflex and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   728
dollar (described in the next section) in that they only ever match at the very
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   729
start and end of the subject string, whatever options are set. Thus, they are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   730
independent of multiline mode. These three assertions are not affected by the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   731
PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   732
circumflex and dollar metacharacters. However, if the <i>startoffset</i>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   733
argument of <b>pcre_exec()</b> is non-zero, indicating that matching is to start
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   734
at a point other than the beginning of the subject, \A can never match. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   735
difference between \Z and \z is that \Z matches before a newline at the end
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   736
of the string as well as at the very end, whereas \z matches only at the end.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   737
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   738
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   739
The \G assertion is true only when the current matching position is at the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   740
start point of the match, as specified by the <i>startoffset</i> argument of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   741
<b>pcre_exec()</b>. It differs from \A when the value of <i>startoffset</i> is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   742
non-zero. By calling <b>pcre_exec()</b> multiple times with appropriate
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   743
arguments, you can mimic Perl's /g option, and it is in this kind of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   744
implementation where \G can be useful.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   745
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   746
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   747
Note, however, that PCRE's interpretation of \G, as the start of the current
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   748
match, is subtly different from Perl's, which defines it as the end of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   749
previous match. In Perl, these can be different when the previously matched
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   750
string was empty. Because PCRE does just one match at a time, it cannot
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   751
reproduce this behaviour.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   752
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   753
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   754
If all the alternatives of a pattern begin with \G, the expression is anchored
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   755
to the starting match position, and the "anchored" flag is set in the compiled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   756
regular expression.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   757
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   758
<br><a name="SEC5" href="#TOC1">CIRCUMFLEX AND DOLLAR</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   759
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   760
Outside a character class, in the default matching mode, the circumflex
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   761
character is an assertion that is true only if the current matching point is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   762
at the start of the subject string. If the <i>startoffset</i> argument of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   763
<b>pcre_exec()</b> is non-zero, circumflex can never match if the PCRE_MULTILINE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   764
option is unset. Inside a character class, circumflex has an entirely different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   765
meaning
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   766
<a href="#characterclass">(see below).</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   767
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   768
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   769
Circumflex need not be the first character of the pattern if a number of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   770
alternatives are involved, but it should be the first thing in each alternative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   771
in which it appears if the pattern is ever to match that branch. If all
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   772
possible alternatives start with a circumflex, that is, if the pattern is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   773
constrained to match only at the start of the subject, it is said to be an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   774
"anchored" pattern. (There are also other constructs that can cause a pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   775
to be anchored.)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   776
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   777
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   778
A dollar character is an assertion that is true only if the current matching
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   779
point is at the end of the subject string, or immediately before a newline
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   780
at the end of the string (by default). Dollar need not be the last character of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   781
the pattern if a number of alternatives are involved, but it should be the last
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   782
item in any branch in which it appears. Dollar has no special meaning in a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   783
character class.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   784
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   785
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   786
The meaning of dollar can be changed so that it matches only at the very end of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   787
the string, by setting the PCRE_DOLLAR_ENDONLY option at compile time. This
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   788
does not affect the \Z assertion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   789
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   790
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   791
The meanings of the circumflex and dollar characters are changed if the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   792
PCRE_MULTILINE option is set. When this is the case, a circumflex matches
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   793
immediately after internal newlines as well as at the start of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   794
string. It does not match after a newline that ends the string. A dollar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   795
matches before any newlines in the string, as well as at the very end, when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   796
PCRE_MULTILINE is set. When newline is specified as the two-character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   797
sequence CRLF, isolated CR and LF characters do not indicate newlines.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   798
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   799
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   800
For example, the pattern /^abc$/ matches the subject string "def\nabc" (where
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   801
\n represents a newline) in multiline mode, but not otherwise. Consequently,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   802
patterns that are anchored in single line mode because all branches start with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   803
^ are not anchored in multiline mode, and a match for circumflex is possible
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   804
when the <i>startoffset</i> argument of <b>pcre_exec()</b> is non-zero. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   805
PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   806
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   807
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   808
Note that the sequences \A, \Z, and \z can be used to match the start and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   809
end of the subject in both modes, and if all branches of a pattern start with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   810
\A it is always anchored, whether or not PCRE_MULTILINE is set.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   811
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   812
<br><a name="SEC6" href="#TOC1">FULL STOP (PERIOD, DOT)</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   813
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   814
Outside a character class, a dot in the pattern matches any one character in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   815
the subject string except (by default) a character that signifies the end of a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   816
line. In UTF-8 mode, the matched character may be more than one byte long.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   817
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   818
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   819
When a line ending is defined as a single character, dot never matches that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   820
character; when the two-character sequence CRLF is used, dot does not match CR
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   821
if it is immediately followed by LF, but otherwise it matches all characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   822
(including isolated CRs and LFs). When any Unicode line endings are being
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   823
recognized, dot does not match CR or LF or any of the other line ending
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   824
characters.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   825
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   826
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   827
The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   828
option is set, a dot matches any one character, without exception. If the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   829
two-character sequence CRLF is present in the subject string, it takes two dots
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   830
to match it.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   831
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   832
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   833
The handling of dot is entirely independent of the handling of circumflex and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   834
dollar, the only relationship being that they both involve newlines. Dot has no
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   835
special meaning in a character class.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   836
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   837
<br><a name="SEC7" href="#TOC1">MATCHING A SINGLE BYTE</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   838
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   839
Outside a character class, the escape sequence \C matches any one byte, both
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   840
in and out of UTF-8 mode. Unlike a dot, it always matches any line-ending
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   841
characters. The feature is provided in Perl in order to match individual bytes
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   842
in UTF-8 mode. Because it breaks up UTF-8 characters into individual bytes,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   843
what remains in the string may be a malformed UTF-8 string. For this reason,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   844
the \C escape sequence is best avoided.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   845
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   846
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   847
PCRE does not allow \C to appear in lookbehind assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   848
<a href="#lookbehind">(described below),</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   849
because in UTF-8 mode this would make it impossible to calculate the length of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   850
the lookbehind.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   851
<a name="characterclass"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   852
<br><a name="SEC8" href="#TOC1">SQUARE BRACKETS AND CHARACTER CLASSES</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   853
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   854
An opening square bracket introduces a character class, terminated by a closing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   855
square bracket. A closing square bracket on its own is not special. If a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   856
closing square bracket is required as a member of the class, it should be the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   857
first data character in the class (after an initial circumflex, if present) or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   858
escaped with a backslash.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   859
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   860
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   861
A character class matches a single character in the subject. In UTF-8 mode, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   862
character may occupy more than one byte. A matched character must be in the set
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   863
of characters defined by the class, unless the first character in the class
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   864
definition is a circumflex, in which case the subject character must not be in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   865
the set defined by the class. If a circumflex is actually required as a member
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   866
of the class, ensure it is not the first character, or escape it with a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   867
backslash.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   868
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   869
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   870
For example, the character class [aeiou] matches any lower case vowel, while
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   871
[^aeiou] matches any character that is not a lower case vowel. Note that a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   872
circumflex is just a convenient notation for specifying the characters that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   873
are in the class by enumerating those that are not. A class that starts with a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   874
circumflex is not an assertion: it still consumes a character from the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   875
string, and therefore it fails if the current pointer is at the end of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   876
string.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   877
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   878
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   879
In UTF-8 mode, characters with values greater than 255 can be included in a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   880
class as a literal string of bytes, or by using the \x{ escaping mechanism.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   881
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   882
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   883
When caseless matching is set, any letters in a class represent both their
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   884
upper case and lower case versions, so for example, a caseless [aeiou] matches
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   885
"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   886
caseful version would. In UTF-8 mode, PCRE always understands the concept of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   887
case for characters whose values are less than 128, so caseless matching is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   888
always possible. For characters with higher values, the concept of case is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   889
supported if PCRE is compiled with Unicode property support, but not otherwise.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   890
If you want to use caseless matching for characters 128 and above, you must
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   891
ensure that PCRE is compiled with Unicode property support as well as with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   892
UTF-8 support.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   893
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   894
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   895
Characters that might indicate line breaks are never treated in any special way
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   896
when matching character classes, whatever line-ending sequence is in use, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   897
whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   898
such as [^a] always matches one of these characters.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   899
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   900
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   901
The minus (hyphen) character can be used to specify a range of characters in a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   902
character class. For example, [d-m] matches any letter between d and m,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   903
inclusive. If a minus character is required in a class, it must be escaped with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   904
a backslash or appear in a position where it cannot be interpreted as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   905
indicating a range, typically as the first or last character in the class.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   906
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   907
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   908
It is not possible to have the literal character "]" as the end character of a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   909
range. A pattern such as [W-]46] is interpreted as a class of two characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   910
("W" and "-") followed by a literal string "46]", so it would match "W46]" or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   911
"-46]". However, if the "]" is escaped with a backslash it is interpreted as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   912
the end of range, so [W-\]46] is interpreted as a class containing a range
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   913
followed by two other characters. The octal or hexadecimal representation of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   914
"]" can also be used to end a range.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   915
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   916
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   917
Ranges operate in the collating sequence of character values. They can also be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   918
used for characters specified numerically, for example [\000-\037]. In UTF-8
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   919
mode, ranges can include characters whose values are greater than 255, for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   920
example [\x{100}-\x{2ff}].
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   921
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   922
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   923
If a range that includes letters is used when caseless matching is set, it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   924
matches the letters in either case. For example, [W-c] is equivalent to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   925
[][\\^_`wxyzabc], matched caselessly, and in non-UTF-8 mode, if character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   926
tables for a French locale are in use, [\xc8-\xcb] matches accented E
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   927
characters in both cases. In UTF-8 mode, PCRE supports the concept of case for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   928
characters with values greater than 128 only when it is compiled with Unicode
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   929
property support.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   930
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   931
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   932
The character types \d, \D, \p, \P, \s, \S, \w, and \W may also appear
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   933
in a character class, and add the characters that they match to the class. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   934
example, [\dABCDEF] matches any hexadecimal digit. A circumflex can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   935
conveniently be used with the upper case character types to specify a more
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   936
restricted set of characters than the matching lower case type. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   937
the class [^\W_] matches any letter or digit, but not underscore.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   938
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   939
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   940
The only metacharacters that are recognized in character classes are backslash,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   941
hyphen (only where it can be interpreted as specifying a range), circumflex
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   942
(only at the start), opening square bracket (only when it can be interpreted as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   943
introducing a POSIX class name - see the next section), and the terminating
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   944
closing square bracket. However, escaping other non-alphanumeric characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   945
does no harm.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   946
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   947
<br><a name="SEC9" href="#TOC1">POSIX CHARACTER CLASSES</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   948
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   949
Perl supports the POSIX notation for character classes. This uses names
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   950
enclosed by [: and :] within the enclosing square brackets. PCRE also supports
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   951
this notation. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   952
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   953
  [01[:alpha:]%]
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   954
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   955
matches "0", "1", any alphabetic character, or "%". The supported class names
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   956
are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   957
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   958
  alnum    letters and digits
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   959
  alpha    letters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   960
  ascii    character codes 0 - 127
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   961
  blank    space or tab only
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   962
  cntrl    control characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   963
  digit    decimal digits (same as \d)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   964
  graph    printing characters, excluding space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   965
  lower    lower case letters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   966
  print    printing characters, including space
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   967
  punct    printing characters, excluding letters and digits
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   968
  space    white space (not quite the same as \s)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   969
  upper    upper case letters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   970
  word     "word" characters (same as \w)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   971
  xdigit   hexadecimal digits
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   972
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   973
The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   974
space (32). Notice that this list includes the VT character (code 11). This
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   975
makes "space" different to \s, which does not include VT (for Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   976
compatibility).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   977
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   978
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   979
The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   980
5.8. Another Perl extension is negation, which is indicated by a ^ character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   981
after the colon. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   982
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   983
  [12[:^digit:]]
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   984
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   985
matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   986
syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   987
supported, and an error is given if they are encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   988
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   989
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   990
In UTF-8 mode, characters with values greater than 128 do not match any of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   991
the POSIX character classes.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   992
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   993
<br><a name="SEC10" href="#TOC1">VERTICAL BAR</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   994
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   995
Vertical bar characters are used to separate alternative patterns. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   996
the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   997
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   998
  gilbert|sullivan
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
   999
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1000
matches either "gilbert" or "sullivan". Any number of alternatives may appear,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1001
and an empty alternative is permitted (matching the empty string). The matching
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1002
process tries each alternative in turn, from left to right, and the first one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1003
that succeeds is used. If the alternatives are within a subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1004
<a href="#subpattern">(defined below),</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1005
"succeeds" means matching the rest of the main pattern as well as the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1006
alternative in the subpattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1007
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1008
<br><a name="SEC11" href="#TOC1">INTERNAL OPTION SETTING</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1009
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1010
The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1011
PCRE_EXTENDED options (which are Perl-compatible) can be changed from within
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1012
the pattern by a sequence of Perl option letters enclosed between "(?" and ")".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1013
The option letters are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1014
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1015
  i  for PCRE_CASELESS
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1016
  m  for PCRE_MULTILINE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1017
  s  for PCRE_DOTALL
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1018
  x  for PCRE_EXTENDED
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1019
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1020
For example, (?im) sets caseless, multiline matching. It is also possible to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1021
unset these options by preceding the letter with a hyphen, and a combined
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1022
setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1023
PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1024
permitted. If a letter appears both before and after the hyphen, the option is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1025
unset.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1026
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1027
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1028
The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1029
changed in the same way as the Perl-compatible options by using the characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1030
J, U and X respectively.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1031
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1032
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1033
When an option change occurs at top level (that is, not inside subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1034
parentheses), the change applies to the remainder of the pattern that follows.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1035
If the change is placed right at the start of a pattern, PCRE extracts it into
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1036
the global options (and it will therefore show up in data extracted by the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1037
<b>pcre_fullinfo()</b> function).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1038
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1039
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1040
An option change within a subpattern (see below for a description of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1041
subpatterns) affects only that part of the current pattern that follows it, so
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1042
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1043
  (a(?i)b)c
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1044
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1045
matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1046
By this means, options can be made to have different settings in different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1047
parts of the pattern. Any changes made in one alternative do carry on
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1048
into subsequent branches within the same subpattern. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1049
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1050
  (a(?i)b|c)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1051
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1052
matches "ab", "aB", "c", and "C", even though when matching "C" the first
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1053
branch is abandoned before the option setting. This is because the effects of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1054
option settings happen at compile time. There would be some very weird
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1055
behaviour otherwise.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1056
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1057
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1058
<b>Note:</b> There are other PCRE-specific options that can be set by the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1059
application when the compile or match functions are called. In some cases the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1060
pattern can contain special leading sequences to override what the application
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1061
has set or what has been defaulted. Details are given in the section entitled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1062
<a href="#newlineseq">"Newline sequences"</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1063
above.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1064
<a name="subpattern"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1065
<br><a name="SEC12" href="#TOC1">SUBPATTERNS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1066
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1067
Subpatterns are delimited by parentheses (round brackets), which can be nested.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1068
Turning part of a pattern into a subpattern does two things:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1069
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1070
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1071
1. It localizes a set of alternatives. For example, the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1072
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1073
  cat(aract|erpillar|)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1074
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1075
matches one of the words "cat", "cataract", or "caterpillar". Without the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1076
parentheses, it would match "cataract", "erpillar" or an empty string.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1077
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1078
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1079
2. It sets up the subpattern as a capturing subpattern. This means that, when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1080
the whole pattern matches, that portion of the subject string that matched the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1081
subpattern is passed back to the caller via the <i>ovector</i> argument of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1082
<b>pcre_exec()</b>. Opening parentheses are counted from left to right (starting
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1083
from 1) to obtain numbers for the capturing subpatterns.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1084
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1085
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1086
For example, if the string "the red king" is matched against the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1087
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1088
  the ((red|white) (king|queen))
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1089
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1090
the captured substrings are "red king", "red", and "king", and are numbered 1,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1091
2, and 3, respectively.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1092
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1093
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1094
The fact that plain parentheses fulfil two functions is not always helpful.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1095
There are often times when a grouping subpattern is required without a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1096
capturing requirement. If an opening parenthesis is followed by a question mark
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1097
and a colon, the subpattern does not do any capturing, and is not counted when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1098
computing the number of any subsequent capturing subpatterns. For example, if
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1099
the string "the white queen" is matched against the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1100
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1101
  the ((?:red|white) (king|queen))
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1102
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1103
the captured substrings are "white queen" and "queen", and are numbered 1 and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1104
2. The maximum number of capturing subpatterns is 65535.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1105
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1106
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1107
As a convenient shorthand, if any option settings are required at the start of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1108
a non-capturing subpattern, the option letters may appear between the "?" and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1109
the ":". Thus the two patterns
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1110
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1111
  (?i:saturday|sunday)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1112
  (?:(?i)saturday|sunday)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1113
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1114
match exactly the same set of strings. Because alternative branches are tried
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1115
from left to right, and options are not reset until the end of the subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1116
is reached, an option setting in one branch does affect subsequent branches, so
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1117
the above patterns match "SUNDAY" as well as "Saturday".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1118
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1119
<br><a name="SEC13" href="#TOC1">DUPLICATE SUBPATTERN NUMBERS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1120
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1121
Perl 5.10 introduced a feature whereby each alternative in a subpattern uses
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1122
the same numbers for its capturing parentheses. Such a subpattern starts with
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1123
(?| and is itself a non-capturing subpattern. For example, consider this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1124
pattern:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1125
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1126
  (?|(Sat)ur|(Sun))day
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1127
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1128
Because the two alternatives are inside a (?| group, both sets of capturing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1129
parentheses are numbered one. Thus, when the pattern matches, you can look
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1130
at captured substring number one, whichever alternative matched. This construct
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1131
is useful when you want to capture part, but not all, of one of a number of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1132
alternatives. Inside a (?| group, parentheses are numbered as usual, but the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1133
number is reset at the start of each branch. The numbers of any capturing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1134
buffers that follow the subpattern start after the highest number used in any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1135
branch. The following example is taken from the Perl documentation.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1136
The numbers underneath show in which buffer the captured content will be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1137
stored.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1138
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1139
  # before  ---------------branch-reset----------- after
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1140
  / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1141
  # 1            2         2  3        2     3     4
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1142
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1143
A backreference or a recursive call to a numbered subpattern always refers to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1144
the first one in the pattern with the given number.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1145
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1146
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1147
An alternative approach to using this "branch reset" feature is to use
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1148
duplicate named subpatterns, as described in the next section.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1149
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1150
<br><a name="SEC14" href="#TOC1">NAMED SUBPATTERNS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1151
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1152
Identifying capturing parentheses by number is simple, but it can be very hard
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1153
to keep track of the numbers in complicated regular expressions. Furthermore,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1154
if an expression is modified, the numbers may change. To help with this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1155
difficulty, PCRE supports the naming of subpatterns. This feature was not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1156
added to Perl until release 5.10. Python had the feature earlier, and PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1157
introduced it at release 4.0, using the Python syntax. PCRE now supports both
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1158
the Perl and the Python syntax.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1159
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1160
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1161
In PCRE, a subpattern can be named in one of three ways: (?&#60;name&#62;...) or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1162
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. References to capturing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1163
parentheses from other parts of the pattern, such as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1164
<a href="#backreferences">backreferences,</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1165
<a href="#recursion">recursion,</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1166
and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1167
<a href="#conditions">conditions,</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1168
can be made by name as well as by number.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1169
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1170
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1171
Names consist of up to 32 alphanumeric characters and underscores. Named
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1172
capturing parentheses are still allocated numbers as well as names, exactly as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1173
if the names were not present. The PCRE API provides function calls for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1174
extracting the name-to-number translation table from a compiled pattern. There
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1175
is also a convenience function for extracting a captured substring by name.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1176
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1177
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1178
By default, a name must be unique within a pattern, but it is possible to relax
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1179
this constraint by setting the PCRE_DUPNAMES option at compile time. This can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1180
be useful for patterns where only one instance of the named parentheses can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1181
match. Suppose you want to match the name of a weekday, either as a 3-letter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1182
abbreviation or as the full name, and in both cases you want to extract the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1183
abbreviation. This pattern (ignoring the line breaks) does the job:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1184
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1185
  (?&#60;DN&#62;Mon|Fri|Sun)(?:day)?|
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1186
  (?&#60;DN&#62;Tue)(?:sday)?|
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1187
  (?&#60;DN&#62;Wed)(?:nesday)?|
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1188
  (?&#60;DN&#62;Thu)(?:rsday)?|
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1189
  (?&#60;DN&#62;Sat)(?:urday)?
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1190
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1191
There are five capturing substrings, but only one is ever set after a match.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1192
(An alternative way of solving this problem is to use a "branch reset"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1193
subpattern, as described in the previous section.)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1194
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1195
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1196
The convenience function for extracting the data by name returns the substring
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1197
for the first (and in this example, the only) subpattern of that name that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1198
matched. This saves searching to find which numbered subpattern it was. If you
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1199
make a reference to a non-unique named subpattern from elsewhere in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1200
pattern, the one that corresponds to the lowest number is used. For further
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1201
details of the interfaces for handling named subpatterns, see the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1202
<a href="pcreapi.html"><b>pcreapi</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1203
documentation.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1204
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1205
<br><a name="SEC15" href="#TOC1">REPETITION</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1206
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1207
Repetition is specified by quantifiers, which can follow any of the following
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1208
items:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1209
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1210
  a literal data character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1211
  the dot metacharacter
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1212
  the \C escape sequence
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1213
  the \X escape sequence (in UTF-8 mode with Unicode properties)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1214
  the \R escape sequence
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1215
  an escape such as \d that matches a single character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1216
  a character class
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1217
  a back reference (see next section)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1218
  a parenthesized subpattern (unless it is an assertion)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1219
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1220
The general repetition quantifier specifies a minimum and maximum number of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1221
permitted matches, by giving the two numbers in curly brackets (braces),
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1222
separated by a comma. The numbers must be less than 65536, and the first must
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1223
be less than or equal to the second. For example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1224
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1225
  z{2,4}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1226
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1227
matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1228
character. If the second number is omitted, but the comma is present, there is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1229
no upper limit; if the second number and the comma are both omitted, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1230
quantifier specifies an exact number of required matches. Thus
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1231
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1232
  [aeiou]{3,}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1233
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1234
matches at least 3 successive vowels, but may match many more, while
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1235
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1236
  \d{8}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1237
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1238
matches exactly 8 digits. An opening curly bracket that appears in a position
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1239
where a quantifier is not allowed, or one that does not match the syntax of a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1240
quantifier, is taken as a literal character. For example, {,6} is not a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1241
quantifier, but a literal string of four characters.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1242
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1243
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1244
In UTF-8 mode, quantifiers apply to UTF-8 characters rather than to individual
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1245
bytes. Thus, for example, \x{100}{2} matches two UTF-8 characters, each of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1246
which is represented by a two-byte sequence. Similarly, when Unicode property
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1247
support is available, \X{3} matches three Unicode extended sequences, each of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1248
which may be several bytes long (and they may be of different lengths).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1249
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1250
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1251
The quantifier {0} is permitted, causing the expression to behave as if the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1252
previous item and the quantifier were not present. This may be useful for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1253
subpatterns that are referenced as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1254
<a href="#subpatternsassubroutines">subroutines</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1255
from elsewhere in the pattern. Items other than subpatterns that have a {0}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1256
quantifier are omitted from the compiled pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1257
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1258
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1259
For convenience, the three most common quantifiers have single-character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1260
abbreviations:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1261
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1262
  *    is equivalent to {0,}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1263
  +    is equivalent to {1,}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1264
  ?    is equivalent to {0,1}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1265
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1266
It is possible to construct infinite loops by following a subpattern that can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1267
match no characters with a quantifier that has no upper limit, for example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1268
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1269
  (a?)*
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1270
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1271
Earlier versions of Perl and PCRE used to give an error at compile time for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1272
such patterns. However, because there are cases where this can be useful, such
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1273
patterns are now accepted, but if any repetition of the subpattern does in fact
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1274
match no characters, the loop is forcibly broken.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1275
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1276
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1277
By default, the quantifiers are "greedy", that is, they match as much as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1278
possible (up to the maximum number of permitted times), without causing the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1279
rest of the pattern to fail. The classic example of where this gives problems
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1280
is in trying to match comments in C programs. These appear between /* and */
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1281
and within the comment, individual * and / characters may appear. An attempt to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1282
match C comments by applying the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1283
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1284
  /\*.*\*/
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1285
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1286
to the string
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1287
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1288
  /* first comment */  not comment  /* second comment */
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1289
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1290
fails, because it matches the entire string owing to the greediness of the .*
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1291
item.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1292
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1293
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1294
However, if a quantifier is followed by a question mark, it ceases to be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1295
greedy, and instead matches the minimum number of times possible, so the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1296
pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1297
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1298
  /\*.*?\*/
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1299
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1300
does the right thing with the C comments. The meaning of the various
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1301
quantifiers is not otherwise changed, just the preferred number of matches.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1302
Do not confuse this use of question mark with its use as a quantifier in its
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1303
own right. Because it has two uses, it can sometimes appear doubled, as in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1304
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1305
  \d??\d
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1306
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1307
which matches one digit by preference, but can match two if that is the only
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1308
way the rest of the pattern matches.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1309
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1310
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1311
If the PCRE_UNGREEDY option is set (an option that is not available in Perl),
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1312
the quantifiers are not greedy by default, but individual ones can be made
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1313
greedy by following them with a question mark. In other words, it inverts the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1314
default behaviour.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1315
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1316
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1317
When a parenthesized subpattern is quantified with a minimum repeat count that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1318
is greater than 1 or with a limited maximum, more memory is required for the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1319
compiled pattern, in proportion to the size of the minimum or maximum.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1320
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1321
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1322
If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1323
to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1324
implicitly anchored, because whatever follows will be tried against every
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1325
character position in the subject string, so there is no point in retrying the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1326
overall match at any position after the first. PCRE normally treats such a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1327
pattern as though it were preceded by \A.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1328
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1329
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1330
In cases where it is known that the subject string contains no newlines, it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1331
worth setting PCRE_DOTALL in order to obtain this optimization, or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1332
alternatively using ^ to indicate anchoring explicitly.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1333
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1334
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1335
However, there is one situation where the optimization cannot be used. When .*
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1336
is inside capturing parentheses that are the subject of a backreference
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1337
elsewhere in the pattern, a match at the start may fail where a later one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1338
succeeds. Consider, for example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1339
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1340
  (.*)abc\1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1341
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1342
If the subject is "xyz123abc123" the match point is the fourth character. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1343
this reason, such a pattern is not implicitly anchored.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1344
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1345
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1346
When a capturing subpattern is repeated, the value captured is the substring
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1347
that matched the final iteration. For example, after
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1348
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1349
  (tweedle[dume]{3}\s*)+
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1350
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1351
has matched "tweedledum tweedledee" the value of the captured substring is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1352
"tweedledee". However, if there are nested capturing subpatterns, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1353
corresponding captured values may have been set in previous iterations. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1354
example, after
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1355
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1356
  /(a|(b))+/
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1357
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1358
matches "aba" the value of the second captured substring is "b".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1359
<a name="atomicgroup"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1360
<br><a name="SEC16" href="#TOC1">ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1361
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1362
With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy")
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1363
repetition, failure of what follows normally causes the repeated item to be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1364
re-evaluated to see if a different number of repeats allows the rest of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1365
pattern to match. Sometimes it is useful to prevent this, either to change the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1366
nature of the match, or to cause it fail earlier than it otherwise might, when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1367
the author of the pattern knows there is no point in carrying on.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1368
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1369
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1370
Consider, for example, the pattern \d+foo when applied to the subject line
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1371
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1372
  123456bar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1373
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1374
After matching all 6 digits and then failing to match "foo", the normal
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1375
action of the matcher is to try again with only 5 digits matching the \d+
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1376
item, and then with 4, and so on, before ultimately failing. "Atomic grouping"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1377
(a term taken from Jeffrey Friedl's book) provides the means for specifying
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1378
that once a subpattern has matched, it is not to be re-evaluated in this way.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1379
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1380
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1381
If we use atomic grouping for the previous example, the matcher gives up
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1382
immediately on failing to match "foo" the first time. The notation is a kind of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1383
special parenthesis, starting with (?&#62; as in this example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1384
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1385
  (?&#62;\d+)foo
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1386
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1387
This kind of parenthesis "locks up" the  part of the pattern it contains once
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1388
it has matched, and a failure further into the pattern is prevented from
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1389
backtracking into it. Backtracking past it to previous items, however, works as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1390
normal.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1391
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1392
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1393
An alternative description is that a subpattern of this type matches the string
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1394
of characters that an identical standalone pattern would match, if anchored at
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1395
the current point in the subject string.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1396
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1397
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1398
Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1399
the above example can be thought of as a maximizing repeat that must swallow
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1400
everything it can. So, while both \d+ and \d+? are prepared to adjust the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1401
number of digits they match in order to make the rest of the pattern match,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1402
(?&#62;\d+) can only match an entire sequence of digits.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1403
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1404
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1405
Atomic groups in general can of course contain arbitrarily complicated
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1406
subpatterns, and can be nested. However, when the subpattern for an atomic
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1407
group is just a single repeated item, as in the example above, a simpler
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1408
notation, called a "possessive quantifier" can be used. This consists of an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1409
additional + character following a quantifier. Using this notation, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1410
previous example can be rewritten as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1411
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1412
  \d++foo
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1413
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1414
Note that a possessive quantifier can be used with an entire group, for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1415
example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1416
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1417
  (abc|xyz){2,3}+
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1418
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1419
Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1420
option is ignored. They are a convenient notation for the simpler forms of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1421
atomic group. However, there is no difference in the meaning of a possessive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1422
quantifier and the equivalent atomic group, though there may be a performance
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1423
difference; possessive quantifiers should be slightly faster.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1424
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1425
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1426
The possessive quantifier syntax is an extension to the Perl 5.8 syntax.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1427
Jeffrey Friedl originated the idea (and the name) in the first edition of his
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1428
book. Mike McCloskey liked it, so implemented it when he built Sun's Java
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1429
package, and PCRE copied it from there. It ultimately found its way into Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1430
at release 5.10.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1431
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1432
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1433
PCRE has an optimization that automatically "possessifies" certain simple
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1434
pattern constructs. For example, the sequence A+B is treated as A++B because
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1435
there is no point in backtracking into a sequence of A's when B must follow.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1436
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1437
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1438
When a pattern contains an unlimited repeat inside a subpattern that can itself
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1439
be repeated an unlimited number of times, the use of an atomic group is the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1440
only way to avoid some failing matches taking a very long time indeed. The
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1441
pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1442
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1443
  (\D+|&#60;\d+&#62;)*[!?]
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1444
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1445
matches an unlimited number of substrings that either consist of non-digits, or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1446
digits enclosed in &#60;&#62;, followed by either ! or ?. When it matches, it runs
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1447
quickly. However, if it is applied to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1448
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1449
  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1450
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1451
it takes a long time before reporting failure. This is because the string can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1452
be divided between the internal \D+ repeat and the external * repeat in a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1453
large number of ways, and all have to be tried. (The example uses [!?] rather
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1454
than a single character at the end, because both PCRE and Perl have an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1455
optimization that allows for fast failure when a single character is used. They
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1456
remember the last single character that is required for a match, and fail early
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1457
if it is not present in the string.) If the pattern is changed so that it uses
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1458
an atomic group, like this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1459
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1460
  ((?&#62;\D+)|&#60;\d+&#62;)*[!?]
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1461
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1462
sequences of non-digits cannot be broken, and failure happens quickly.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1463
<a name="backreferences"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1464
<br><a name="SEC17" href="#TOC1">BACK REFERENCES</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1465
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1466
Outside a character class, a backslash followed by a digit greater than 0 (and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1467
possibly further digits) is a back reference to a capturing subpattern earlier
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1468
(that is, to its left) in the pattern, provided there have been that many
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1469
previous capturing left parentheses.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1470
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1471
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1472
However, if the decimal number following the backslash is less than 10, it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1473
always taken as a back reference, and causes an error only if there are not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1474
that many capturing left parentheses in the entire pattern. In other words, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1475
parentheses that are referenced need not be to the left of the reference for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1476
numbers less than 10. A "forward back reference" of this type can make sense
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1477
when a repetition is involved and the subpattern to the right has participated
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1478
in an earlier iteration.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1479
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1480
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1481
It is not possible to have a numerical "forward back reference" to a subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1482
whose number is 10 or more using this syntax because a sequence such as \50 is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1483
interpreted as a character defined in octal. See the subsection entitled
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1484
"Non-printing characters"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1485
<a href="#digitsafterbackslash">above</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1486
for further details of the handling of digits following a backslash. There is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1487
no such problem when named parentheses are used. A back reference to any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1488
subpattern is possible using named parentheses (see below).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1489
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1490
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1491
Another way of avoiding the ambiguity inherent in the use of digits following a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1492
backslash is to use the \g escape sequence, which is a feature introduced in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1493
Perl 5.10. This escape must be followed by an unsigned number or a negative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1494
number, optionally enclosed in braces. These examples are all identical:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1495
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1496
  (ring), \1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1497
  (ring), \g1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1498
  (ring), \g{1}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1499
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1500
An unsigned number specifies an absolute reference without the ambiguity that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1501
is present in the older syntax. It is also useful when literal digits follow
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1502
the reference. A negative number is a relative reference. Consider this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1503
example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1504
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1505
  (abc(def)ghi)\g{-1}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1506
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1507
The sequence \g{-1} is a reference to the most recently started capturing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1508
subpattern before \g, that is, is it equivalent to \2. Similarly, \g{-2}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1509
would be equivalent to \1. The use of relative references can be helpful in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1510
long patterns, and also in patterns that are created by joining together
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1511
fragments that contain references within themselves.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1512
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1513
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1514
A back reference matches whatever actually matched the capturing subpattern in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1515
the current subject string, rather than anything matching the subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1516
itself (see
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1517
<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1518
below for a way of doing that). So the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1519
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1520
  (sens|respons)e and \1ibility
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1521
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1522
matches "sense and sensibility" and "response and responsibility", but not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1523
"sense and responsibility". If caseful matching is in force at the time of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1524
back reference, the case of letters is relevant. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1525
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1526
  ((?i)rah)\s+\1
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1527
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1528
matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1529
capturing subpattern is matched caselessly.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1530
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1531
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1532
There are several different ways of writing back references to named
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1533
subpatterns. The .NET syntax \k{name} and the Perl syntax \k&#60;name&#62; or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1534
\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1535
back reference syntax, in which \g can be used for both numeric and named
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1536
references, is also supported. We could rewrite the above example in any of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1537
the following ways:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1538
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1539
  (?&#60;p1&#62;(?i)rah)\s+\k&#60;p1&#62;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1540
  (?'p1'(?i)rah)\s+\k{p1}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1541
  (?P&#60;p1&#62;(?i)rah)\s+(?P=p1)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1542
  (?&#60;p1&#62;(?i)rah)\s+\g{p1}
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1543
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1544
A subpattern that is referenced by name may appear in the pattern before or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1545
after the reference.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1546
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1547
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1548
There may be more than one back reference to the same subpattern. If a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1549
subpattern has not actually been used in a particular match, any back
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1550
references to it always fail. For example, the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1551
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1552
  (a|(bc))\2
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1553
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1554
always fails if it starts to match "a" rather than "bc". Because there may be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1555
many capturing parentheses in a pattern, all digits following the backslash are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1556
taken as part of a potential back reference number. If the pattern continues
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1557
with a digit character, some delimiter must be used to terminate the back
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1558
reference. If the PCRE_EXTENDED option is set, this can be whitespace.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1559
Otherwise an empty comment (see
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1560
<a href="#comments">"Comments"</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1561
below) can be used.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1562
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1563
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1564
A back reference that occurs inside the parentheses to which it refers fails
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1565
when the subpattern is first used, so, for example, (a\1) never matches.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1566
However, such references can be useful inside repeated subpatterns. For
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1567
example, the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1568
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1569
  (a|b\1)+
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1570
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1571
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1572
the subpattern, the back reference matches the character string corresponding
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1573
to the previous iteration. In order for this to work, the pattern must be such
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1574
that the first iteration does not need to match the back reference. This can be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1575
done using alternation, as in the example above, or by a quantifier with a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1576
minimum of zero.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1577
<a name="bigassertions"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1578
<br><a name="SEC18" href="#TOC1">ASSERTIONS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1579
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1580
An assertion is a test on the characters following or preceding the current
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1581
matching point that does not actually consume any characters. The simple
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1582
assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1583
<a href="#smallassertions">above.</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1584
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1585
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1586
More complicated assertions are coded as subpatterns. There are two kinds:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1587
those that look ahead of the current position in the subject string, and those
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1588
that look behind it. An assertion subpattern is matched in the normal way,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1589
except that it does not cause the current matching position to be changed.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1590
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1591
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1592
Assertion subpatterns are not capturing subpatterns, and may not be repeated,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1593
because it makes no sense to assert the same thing several times. If any kind
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1594
of assertion contains capturing subpatterns within it, these are counted for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1595
the purposes of numbering the capturing subpatterns in the whole pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1596
However, substring capturing is carried out only for positive assertions,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1597
because it does not make sense for negative assertions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1598
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1599
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1600
Lookahead assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1601
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1602
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1603
Lookahead assertions start with (?= for positive assertions and (?! for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1604
negative assertions. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1605
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1606
  \w+(?=;)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1607
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1608
matches a word followed by a semicolon, but does not include the semicolon in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1609
the match, and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1610
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1611
  foo(?!bar)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1612
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1613
matches any occurrence of "foo" that is not followed by "bar". Note that the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1614
apparently similar pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1615
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1616
  (?!foo)bar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1617
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1618
does not find an occurrence of "bar" that is preceded by something other than
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1619
"foo"; it finds any occurrence of "bar" whatsoever, because the assertion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1620
(?!foo) is always true when the next three characters are "bar". A
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1621
lookbehind assertion is needed to achieve the other effect.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1622
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1623
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1624
If you want to force a matching failure at some point in a pattern, the most
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1625
convenient way to do it is with (?!) because an empty string always matches, so
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1626
an assertion that requires there not to be an empty string must always fail.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1627
<a name="lookbehind"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1628
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1629
Lookbehind assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1630
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1631
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1632
Lookbehind assertions start with (?&#60;= for positive assertions and (?&#60;! for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1633
negative assertions. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1634
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1635
  (?&#60;!foo)bar
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1636
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1637
does find an occurrence of "bar" that is not preceded by "foo". The contents of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1638
a lookbehind assertion are restricted such that all the strings it matches must
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1639
have a fixed length. However, if there are several top-level alternatives, they
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1640
do not all have to have the same fixed length. Thus
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1641
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1642
  (?&#60;=bullock|donkey)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1643
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1644
is permitted, but
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1645
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1646
  (?&#60;!dogs?|cats?)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1647
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1648
causes an error at compile time. Branches that match different length strings
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1649
are permitted only at the top level of a lookbehind assertion. This is an
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1650
extension compared with Perl (at least for 5.8), which requires all branches to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1651
match the same length of string. An assertion such as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1652
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1653
  (?&#60;=ab(c|de))
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1654
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1655
is not permitted, because its single top-level branch can match two different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1656
lengths, but it is acceptable if rewritten to use two top-level branches:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1657
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1658
  (?&#60;=abc|abde)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1659
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1660
In some cases, the Perl 5.10 escape sequence \K
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1661
<a href="#resetmatchstart">(see above)</a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1662
can be used instead of a lookbehind assertion; this is not restricted to a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1663
fixed-length.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1664
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1665
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1666
The implementation of lookbehind assertions is, for each alternative, to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1667
temporarily move the current position back by the fixed length and then try to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1668
match. If there are insufficient characters before the current position, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1669
assertion fails.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1670
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1671
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1672
PCRE does not allow the \C escape (which matches a single byte in UTF-8 mode)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1673
to appear in lookbehind assertions, because it makes it impossible to calculate
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1674
the length of the lookbehind. The \X and \R escapes, which can match
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1675
different numbers of bytes, are also not permitted.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1676
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1677
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1678
Possessive quantifiers can be used in conjunction with lookbehind assertions to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1679
specify efficient matching at the end of the subject string. Consider a simple
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1680
pattern such as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1681
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1682
  abcd$
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1683
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1684
when applied to a long string that does not match. Because matching proceeds
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1685
from left to right, PCRE will look for each "a" in the subject and then see if
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1686
what follows matches the rest of the pattern. If the pattern is specified as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1687
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1688
  ^.*abcd$
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1689
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1690
the initial .* matches the entire string at first, but when this fails (because
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1691
there is no following "a"), it backtracks to match all but the last character,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1692
then all but the last two characters, and so on. Once again the search for "a"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1693
covers the entire string, from right to left, so we are no better off. However,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1694
if the pattern is written as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1695
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1696
  ^.*+(?&#60;=abcd)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1697
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1698
there can be no backtracking for the .*+ item; it can match only the entire
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1699
string. The subsequent lookbehind assertion does a single test on the last four
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1700
characters. If it fails, the match fails immediately. For long strings, this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1701
approach makes a significant difference to the processing time.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1702
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1703
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1704
Using multiple assertions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1705
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1706
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1707
Several assertions (of any sort) may occur in succession. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1708
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1709
  (?&#60;=\d{3})(?&#60;!999)foo
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1710
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1711
matches "foo" preceded by three digits that are not "999". Notice that each of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1712
the assertions is applied independently at the same point in the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1713
string. First there is a check that the previous three characters are all
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1714
digits, and then there is a check that the same three characters are not "999".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1715
This pattern does <i>not</i> match "foo" preceded by six characters, the first
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1716
of which are digits and the last three of which are not "999". For example, it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1717
doesn't match "123abcfoo". A pattern to do that is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1718
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1719
  (?&#60;=\d{3}...)(?&#60;!999)foo
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1720
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1721
This time the first assertion looks at the preceding six characters, checking
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1722
that the first three are digits, and then the second assertion checks that the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1723
preceding three characters are not "999".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1724
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1725
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1726
Assertions can be nested in any combination. For example,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1727
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1728
  (?&#60;=(?&#60;!foo)bar)baz
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1729
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1730
matches an occurrence of "baz" that is preceded by "bar" which in turn is not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1731
preceded by "foo", while
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1732
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1733
  (?&#60;=\d{3}(?!999)...)foo
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1734
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1735
is another pattern that matches "foo" preceded by three digits and any three
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1736
characters that are not "999".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1737
<a name="conditions"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1738
<br><a name="SEC19" href="#TOC1">CONDITIONAL SUBPATTERNS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1739
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1740
It is possible to cause the matching process to obey a subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1741
conditionally or to choose between two alternative subpatterns, depending on
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1742
the result of an assertion, or whether a previous capturing subpattern matched
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1743
or not. The two possible forms of conditional subpattern are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1744
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1745
  (?(condition)yes-pattern)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1746
  (?(condition)yes-pattern|no-pattern)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1747
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1748
If the condition is satisfied, the yes-pattern is used; otherwise the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1749
no-pattern (if present) is used. If there are more than two alternatives in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1750
subpattern, a compile-time error occurs.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1751
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1752
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1753
There are four kinds of condition: references to subpatterns, references to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1754
recursion, a pseudo-condition called DEFINE, and assertions.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1755
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1756
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1757
Checking for a used subpattern by number
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1758
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1759
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1760
If the text between the parentheses consists of a sequence of digits, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1761
condition is true if the capturing subpattern of that number has previously
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1762
matched. An alternative notation is to precede the digits with a plus or minus
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1763
sign. In this case, the subpattern number is relative rather than absolute.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1764
The most recently opened parentheses can be referenced by (?(-1), the next most
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1765
recent by (?(-2), and so on. In looping constructs it can also make sense to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1766
refer to subsequent groups with constructs such as (?(+2).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1767
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1768
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1769
Consider the following pattern, which contains non-significant white space to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1770
make it more readable (assume the PCRE_EXTENDED option) and to divide it into
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1771
three parts for ease of discussion:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1772
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1773
  ( \( )?    [^()]+    (?(1) \) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1774
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1775
The first part matches an optional opening parenthesis, and if that
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1776
character is present, sets it as the first captured substring. The second part
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1777
matches one or more characters that are not parentheses. The third part is a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1778
conditional subpattern that tests whether the first set of parentheses matched
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1779
or not. If they did, that is, if subject started with an opening parenthesis,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1780
the condition is true, and so the yes-pattern is executed and a closing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1781
parenthesis is required. Otherwise, since no-pattern is not present, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1782
subpattern matches nothing. In other words, this pattern matches a sequence of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1783
non-parentheses, optionally enclosed in parentheses.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1784
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1785
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1786
If you were embedding this pattern in a larger one, you could use a relative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1787
reference:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1788
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1789
  ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1790
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1791
This makes the fragment independent of the parentheses in the larger pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1792
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1793
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1794
Checking for a used subpattern by name
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1795
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1796
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1797
Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1798
subpattern by name. For compatibility with earlier versions of PCRE, which had
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1799
this facility before Perl, the syntax (?(name)...) is also recognized. However,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1800
there is a possible ambiguity with this syntax, because subpattern names may
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1801
consist entirely of digits. PCRE looks first for a named subpattern; if it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1802
cannot find one and the name consists entirely of digits, PCRE looks for a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1803
subpattern of that number, which must be greater than zero. Using subpattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1804
names that consist entirely of digits is not recommended.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1805
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1806
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1807
Rewriting the above example to use a named subpattern gives this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1808
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1809
  (?&#60;OPEN&#62; \( )?    [^()]+    (?(&#60;OPEN&#62;) \) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1810
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1811
</PRE>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1812
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1813
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1814
Checking for pattern recursion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1815
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1816
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1817
If the condition is the string (R), and there is no subpattern with the name R,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1818
the condition is true if a recursive call to the whole pattern or any
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1819
subpattern has been made. If digits or a name preceded by ampersand follow the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1820
letter R, for example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1821
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1822
  (?(R3)...) or (?(R&name)...)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1823
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1824
the condition is true if the most recent recursion is into the subpattern whose
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1825
number or name is given. This condition does not check the entire recursion
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1826
stack.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1827
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1828
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1829
At "top level", all these recursion test conditions are false. Recursive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1830
patterns are described below.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1831
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1832
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1833
Defining subpatterns for use by reference only
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1834
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1835
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1836
If the condition is the string (DEFINE), and there is no subpattern with the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1837
name DEFINE, the condition is always false. In this case, there may be only one
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1838
alternative in the subpattern. It is always skipped if control reaches this
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1839
point in the pattern; the idea of DEFINE is that it can be used to define
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1840
"subroutines" that can be referenced from elsewhere. (The use of "subroutines"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1841
is described below.) For example, a pattern to match an IPv4 address could be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1842
written like this (ignore whitespace and line breaks):
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1843
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1844
  (?(DEFINE) (?&#60;byte&#62; 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1845
  \b (?&byte) (\.(?&byte)){3} \b
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1846
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1847
The first part of the pattern is a DEFINE group inside which a another group
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1848
named "byte" is defined. This matches an individual component of an IPv4
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1849
address (a number less than 256). When matching takes place, this part of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1850
pattern is skipped because DEFINE acts like a false condition.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1851
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1852
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1853
The rest of the pattern uses references to the named group to match the four
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1854
dot-separated components of an IPv4 address, insisting on a word boundary at
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1855
each end.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1856
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1857
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1858
Assertion conditions
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1859
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1860
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1861
If the condition is not in any of the above formats, it must be an assertion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1862
This may be a positive or negative lookahead or lookbehind assertion. Consider
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1863
this pattern, again containing non-significant white space, and with the two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1864
alternatives on the second line:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1865
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1866
  (?(?=[^a-z]*[a-z])
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1867
  \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1868
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1869
The condition is a positive lookahead assertion that matches an optional
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1870
sequence of non-letters followed by a letter. In other words, it tests for the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1871
presence of at least one letter in the subject. If a letter is found, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1872
subject is matched against the first alternative; otherwise it is matched
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1873
against the second. This pattern matches strings in one of the two forms
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1874
dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1875
<a name="comments"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1876
<br><a name="SEC20" href="#TOC1">COMMENTS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1877
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1878
The sequence (?# marks the start of a comment that continues up to the next
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1879
closing parenthesis. Nested parentheses are not permitted. The characters
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1880
that make up a comment play no part in the pattern matching at all.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1881
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1882
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1883
If the PCRE_EXTENDED option is set, an unescaped # character outside a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1884
character class introduces a comment that continues to immediately after the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1885
next newline in the pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1886
<a name="recursion"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1887
<br><a name="SEC21" href="#TOC1">RECURSIVE PATTERNS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1888
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1889
Consider the problem of matching a string in parentheses, allowing for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1890
unlimited nested parentheses. Without the use of recursion, the best that can
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1891
be done is to use a pattern that matches up to some fixed depth of nesting. It
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1892
is not possible to handle an arbitrary nesting depth.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1893
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1894
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1895
For some time, Perl has provided a facility that allows regular expressions to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1896
recurse (amongst other things). It does this by interpolating Perl code in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1897
expression at run time, and the code can refer to the expression itself. A Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1898
pattern using code interpolation to solve the parentheses problem can be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1899
created like this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1900
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1901
  $re = qr{\( (?: (?&#62;[^()]+) | (?p{$re}) )* \)}x;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1902
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1903
The (?p{...}) item interpolates Perl code at run time, and in this case refers
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1904
recursively to the pattern in which it appears.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1905
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1906
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1907
Obviously, PCRE cannot support the interpolation of Perl code. Instead, it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1908
supports special syntax for recursion of the entire pattern, and also for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1909
individual subpattern recursion. After its introduction in PCRE and Python,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1910
this kind of recursion was introduced into Perl at release 5.10.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1911
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1912
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1913
A special item that consists of (? followed by a number greater than zero and a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1914
closing parenthesis is a recursive call of the subpattern of the given number,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1915
provided that it occurs inside that subpattern. (If not, it is a "subroutine"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1916
call, which is described in the next section.) The special item (?R) or (?0) is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1917
a recursive call of the entire regular expression.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1918
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1919
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1920
In PCRE (like Python, but unlike Perl), a recursive subpattern call is always
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1921
treated as an atomic group. That is, once it has matched some of the subject
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1922
string, it is never re-entered, even if it contains untried alternatives and
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1923
there is a subsequent matching failure.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1924
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1925
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1926
This PCRE pattern solves the nested parentheses problem (assume the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1927
PCRE_EXTENDED option is set so that white space is ignored):
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1928
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1929
  \( ( (?&#62;[^()]+) | (?R) )* \)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1930
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1931
First it matches an opening parenthesis. Then it matches any number of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1932
substrings which can either be a sequence of non-parentheses, or a recursive
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1933
match of the pattern itself (that is, a correctly parenthesized substring).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1934
Finally there is a closing parenthesis.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1935
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1936
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1937
If this were part of a larger pattern, you would not want to recurse the entire
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1938
pattern, so instead you could use this:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1939
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1940
  ( \( ( (?&#62;[^()]+) | (?1) )* \) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1941
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1942
We have put the pattern into parentheses, and caused the recursion to refer to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1943
them instead of the whole pattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1944
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1945
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1946
In a larger pattern, keeping track of parenthesis numbers can be tricky. This
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1947
is made easier by the use of relative references. (A Perl 5.10 feature.)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1948
Instead of (?1) in the pattern above you can write (?-2) to refer to the second
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1949
most recently opened parentheses preceding the recursion. In other words, a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1950
negative number counts capturing parentheses leftwards from the point at which
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1951
it is encountered.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1952
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1953
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1954
It is also possible to refer to subsequently opened parentheses, by writing
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1955
references such as (?+2). However, these cannot be recursive because the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1956
reference is not inside the parentheses that are referenced. They are always
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1957
"subroutine" calls, as described in the next section.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1958
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1959
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1960
An alternative approach is to use named parentheses instead. The Perl syntax
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1961
for this is (?&name); PCRE's earlier syntax (?P&#62;name) is also supported. We
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1962
could rewrite the above example as follows:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1963
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1964
  (?&#60;pn&#62; \( ( (?&#62;[^()]+) | (?&pn) )* \) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1965
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1966
If there is more than one subpattern with the same name, the earliest one is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1967
used.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1968
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1969
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1970
This particular example pattern that we have been looking at contains nested
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1971
unlimited repeats, and so the use of atomic grouping for matching strings of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1972
non-parentheses is important when applying the pattern to strings that do not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1973
match. For example, when this pattern is applied to
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1974
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1975
  (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1976
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1977
it yields "no match" quickly. However, if atomic grouping is not used,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1978
the match runs for a very long time indeed because there are so many different
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1979
ways the + and * repeats can carve up the subject, and all have to be tested
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1980
before failure can be reported.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1981
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1982
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1983
At the end of a match, the values set for any capturing subpatterns are those
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1984
from the outermost level of the recursion at which the subpattern value is set.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1985
If you want to obtain intermediate values, a callout function can be used (see
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1986
below and the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1987
<a href="pcrecallout.html"><b>pcrecallout</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1988
documentation). If the pattern above is matched against
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1989
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1990
  (ab(cd)ef)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1991
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1992
the value for the capturing parentheses is "ef", which is the last value taken
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1993
on at the top level. If additional parentheses are added, giving
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1994
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1995
  \( ( ( (?&#62;[^()]+) | (?R) )* ) \)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1996
     ^                        ^
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1997
     ^                        ^
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1998
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  1999
the string they capture is "ab(cd)ef", the contents of the top level
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2000
parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2001
has to obtain extra memory to store data during a recursion, which it does by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2002
using <b>pcre_malloc</b>, freeing it via <b>pcre_free</b> afterwards. If no
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2003
memory can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2004
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2005
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2006
Do not confuse the (?R) item with the condition (R), which tests for recursion.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2007
Consider this pattern, which matches text in angle brackets, allowing for
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2008
arbitrary nesting. Only digits are allowed in nested brackets (that is, when
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2009
recursing), whereas any characters are permitted at the outer level.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2010
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2011
  &#60; (?: (?(R) \d++  | [^&#60;&#62;]*+) | (?R)) * &#62;
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2012
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2013
In this pattern, (?(R) is the start of a conditional subpattern, with two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2014
different alternatives for the recursive and non-recursive cases. The (?R) item
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2015
is the actual recursive call.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2016
<a name="subpatternsassubroutines"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2017
<br><a name="SEC22" href="#TOC1">SUBPATTERNS AS SUBROUTINES</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2018
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2019
If the syntax for a recursive subpattern reference (either by number or by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2020
name) is used outside the parentheses to which it refers, it operates like a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2021
subroutine in a programming language. The "called" subpattern may be defined
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2022
before or after the reference. A numbered reference can be absolute or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2023
relative, as in these examples:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2024
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2025
  (...(absolute)...)...(?2)...
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2026
  (...(relative)...)...(?-1)...
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2027
  (...(?+1)...(relative)...
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2028
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2029
An earlier example pointed out that the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2030
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2031
  (sens|respons)e and \1ibility
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2032
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2033
matches "sense and sensibility" and "response and responsibility", but not
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2034
"sense and responsibility". If instead the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2035
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2036
  (sens|respons)e and (?1)ibility
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2037
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2038
is used, it does match "sense and responsibility" as well as the other two
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2039
strings. Another example is given in the discussion of DEFINE above.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2040
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2041
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2042
Like recursive subpatterns, a "subroutine" call is always treated as an atomic
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2043
group. That is, once it has matched some of the subject string, it is never
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2044
re-entered, even if it contains untried alternatives and there is a subsequent
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2045
matching failure.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2046
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2047
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2048
When a subpattern is used as a subroutine, processing options such as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2049
case-independence are fixed when the subpattern is defined. They cannot be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2050
changed for different calls. For example, consider this pattern:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2051
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2052
  (abc)(?i:(?-1))
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2053
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2054
It matches "abcabc". It does not match "abcABC" because the change of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2055
processing option does not affect the called subpattern.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2056
<a name="onigurumasubroutines"></a></P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2057
<br><a name="SEC23" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2058
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2059
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2060
a number enclosed either in angle brackets or single quotes, is an alternative
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2061
syntax for referencing a subpattern as a subroutine, possibly recursively. Here
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2062
are two of the examples used above, rewritten using this syntax:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2063
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2064
  (?&#60;pn&#62; \( ( (?&#62;[^()]+) | \g&#60;pn&#62; )* \) )
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2065
  (sens|respons)e and \g'1'ibility
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2066
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2067
PCRE supports an extension to Oniguruma: if a number is preceded by a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2068
plus or a minus sign it is taken as a relative reference. For example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2069
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2070
  (abc)(?i:\g&#60;-1&#62;)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2071
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2072
Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2073
synonymous. The former is a back reference; the latter is a subroutine call.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2074
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2075
<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2076
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2077
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2078
code to be obeyed in the middle of matching a regular expression. This makes it
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2079
possible, amongst other things, to extract different substrings that match the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2080
same pair of parentheses when there is a repetition.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2081
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2082
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2083
PCRE provides a similar feature, but of course it cannot obey arbitrary Perl
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2084
code. The feature is called "callout". The caller of PCRE provides an external
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2085
function by putting its entry point in the global variable <i>pcre_callout</i>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2086
By default, this variable contains NULL, which disables all calling out.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2087
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2088
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2089
Within a regular expression, (?C) indicates the points at which the external
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2090
function is to be called. If you want to identify different callout points, you
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2091
can put a number less than 256 after the letter C. The default value is zero.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2092
For example, this pattern has two callout points:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2093
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2094
  (?C1)abc(?C2)def
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2095
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2096
If the PCRE_AUTO_CALLOUT flag is passed to <b>pcre_compile()</b>, callouts are
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2097
automatically installed before each item in the pattern. They are all numbered
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2098
255.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2099
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2100
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2101
During matching, when PCRE reaches a callout point (and <i>pcre_callout</i> is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2102
set), the external function is called. It is provided with the number of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2103
callout, the position in the pattern, and, optionally, one item of data
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2104
originally supplied by the caller of <b>pcre_exec()</b>. The callout function
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2105
may cause matching to proceed, to backtrack, or to fail altogether. A complete
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2106
description of the interface to the callout function is given in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2107
<a href="pcrecallout.html"><b>pcrecallout</b></a>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2108
documentation.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2109
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2110
<br><a name="SEC25" href="#TOC1">BACKTRACKING CONTROL</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2111
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2112
Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2113
are described in the Perl documentation as "experimental and subject to change
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2114
or removal in a future version of Perl". It goes on to say: "Their usage in
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2115
production code should be noted to avoid problems during upgrades." The same
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2116
remarks apply to the PCRE features described in this section.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2117
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2118
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2119
Since these verbs are specifically related to backtracking, most of them can be
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2120
used only when the pattern is to be matched using <b>pcre_exec()</b>, which uses
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2121
a backtracking algorithm. With the exception of (*FAIL), which behaves like a
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2122
failing negative assertion, they cause an error if encountered by
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2123
<b>pcre_dfa_exec()</b>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2124
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2125
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2126
The new verbs make use of what was previously invalid syntax: an opening
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2127
parenthesis followed by an asterisk. In Perl, they are generally of the form
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2128
(*VERB:ARG) but PCRE does not support the use of arguments, so its general
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2129
form is just (*VERB). Any number of these verbs may occur in a pattern. There
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2130
are two kinds:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2131
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2132
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2133
Verbs that act immediately
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2134
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2135
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2136
The following verbs act as soon as they are encountered:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2137
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2138
   (*ACCEPT)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2139
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2140
This verb causes the match to end successfully, skipping the remainder of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2141
pattern. When inside a recursion, only the innermost pattern is ended
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2142
immediately. PCRE differs from Perl in what happens if the (*ACCEPT) is inside
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2143
capturing parentheses. In Perl, the data so far is captured: in PCRE no data is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2144
captured. For example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2145
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2146
  A(A|B(*ACCEPT)|C)D
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2147
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2148
This matches "AB", "AAD", or "ACD", but when it matches "AB", no data is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2149
captured.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2150
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2151
  (*FAIL) or (*F)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2152
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2153
This verb causes the match to fail, forcing backtracking to occur. It is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2154
equivalent to (?!) but easier to read. The Perl documentation notes that it is
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2155
probably useful only when combined with (?{}) or (??{}). Those are, of course,
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2156
Perl features that are not present in PCRE. The nearest equivalent is the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2157
callout feature, as for example in this pattern:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2158
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2159
  a+(?C)(*FAIL)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2160
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2161
A match with the string "aaaa" always fails, but the callout is taken before
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2162
each backtrack happens (in this example, 10 times).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2163
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2164
<br><b>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2165
Verbs that act after backtracking
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2166
</b><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2167
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2168
The following verbs do nothing when they are encountered. Matching continues
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2169
with what follows, but if there is no subsequent match, a failure is forced.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2170
The verbs differ in exactly what kind of failure occurs.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2171
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2172
  (*COMMIT)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2173
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2174
This verb causes the whole match to fail outright if the rest of the pattern
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2175
does not match. Even if the pattern is unanchored, no further attempts to find
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2176
a match by advancing the start point take place. Once (*COMMIT) has been
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2177
passed, <b>pcre_exec()</b> is committed to finding a match at the current
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2178
starting point, or not at all. For example:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2179
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2180
  a+(*COMMIT)b
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2181
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2182
This matches "xxaab" but not "aacaab". It can be thought of as a kind of
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2183
dynamic anchor, or "I've started, so I must finish."
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2184
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2185
  (*PRUNE)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2186
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2187
This verb causes the match to fail at the current position if the rest of the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2188
pattern does not match. If the pattern is unanchored, the normal "bumpalong"
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2189
advance to the next starting character then happens. Backtracking can occur as
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2190
usual to the left of (*PRUNE), or when matching to the right of (*PRUNE), but
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2191
if there is no match to the right, backtracking cannot cross (*PRUNE).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2192
In simple cases, the use of (*PRUNE) is just an alternative to an atomic
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2193
group or possessive quantifier, but there are some uses of (*PRUNE) that cannot
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2194
be expressed in any other way.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2195
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2196
  (*SKIP)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2197
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2198
This verb is like (*PRUNE), except that if the pattern is unanchored, the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2199
"bumpalong" advance is not to the next character, but to the position in the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2200
subject where (*SKIP) was encountered. (*SKIP) signifies that whatever text
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2201
was matched leading up to it cannot be part of a successful match. Consider:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2202
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2203
  a+(*SKIP)b
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2204
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2205
If the subject is "aaaac...", after the first match attempt fails (starting at
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2206
the first character in the string), the starting point skips on to start the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2207
next attempt at "c". Note that a possessive quantifer does not have the same
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2208
effect in this example; although it would suppress backtracking during the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2209
first match attempt, the second attempt would start at the second character
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2210
instead of skipping on to "c".
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2211
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2212
  (*THEN)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2213
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2214
This verb causes a skip to the next alternation if the rest of the pattern does
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2215
not match. That is, it cancels pending backtracking, but only within the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2216
current alternation. Its name comes from the observation that it can be used
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2217
for a pattern-based if-then-else block:
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2218
<pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2219
  ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2220
</pre>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2221
If the COND1 pattern matches, FOO is tried (and possibly further items after
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2222
the end of the group if FOO succeeds); on failure the matcher skips to the
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2223
second alternative and tries COND2, without backtracking into COND1. If (*THEN)
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2224
is used outside of any alternation, it acts exactly like (*PRUNE).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2225
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2226
<br><a name="SEC26" href="#TOC1">SEE ALSO</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2227
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2228
<b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3), <b>pcre</b>(3).
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2229
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2230
<br><a name="SEC27" href="#TOC1">AUTHOR</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2231
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2232
Philip Hazel
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2233
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2234
University Computing Service
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2235
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2236
Cambridge CB2 3QH, England.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2237
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2238
</P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2239
<br><a name="SEC28" href="#TOC1">REVISION</a><br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2240
<P>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2241
Last updated: 19 April 2008
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2242
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2243
Copyright &copy; 1997-2008 University of Cambridge.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2244
<br>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2245
<p>
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2246
Return to the <a href="index.html">PCRE index page</a>.
7f656887cf89 First submission to Symbian Foundation staging server.
Tom Sutcliffe <thomas.sutcliffe@accenture.com>
parents:
diff changeset
  2247
</p>