MCL/sf/os/fshell: comparison libraries/spcre/libpcre/pcre/doc/pcre.txt

equal deleted inserted replaced

--1:000000000000
+:7f656887cf89
+-----------------------------------------------------------------------------
+This file contains a concatenation of the PCRE man pages, converted to plain
+text format for ease of searching with a text editor, or for use on systems
+that do not have a man page processor. The small individual files that give
+synopses of each function in the library have not been included. There are
+separate text files for the pcregrep and pcretest commands.
+-----------------------------------------------------------------------------
+PCRE(3)                                                                PCRE(3)
+NAME
+PCRE - Perl-compatible regular expressions
+INTRODUCTION
+The  PCRE  library is a set of functions that implement regular expres-
+sion pattern matching using the same syntax and semantics as Perl, with
+just  a  few  differences. Certain features that appeared in Python and
+PCRE before they appeared in Perl are also available using  the  Python
+syntax.  There is also some support for certain .NET and Oniguruma syn-
+tax items, and there is an option for  requesting  some  minor  changes
+that give better JavaScript compatibility.
+The  current  implementation of PCRE (release 7.x) corresponds approxi-
+mately with Perl 5.10, including support for UTF-8 encoded strings  and
+Unicode general category properties. However, UTF-8 and Unicode support
+has to be explicitly enabled; it is not the default. The Unicode tables
+correspond to Unicode release 5.0.0.
+In  addition to the Perl-compatible matching function, PCRE contains an
+alternative matching function that matches the same  compiled  patterns
+in  a different way. In certain circumstances, the alternative function
+has some advantages. For a discussion of the two  matching  algorithms,
+see the pcrematching page.
+PCRE  is  written  in C and released as a C library. A number of people
+have written wrappers and interfaces of various kinds.  In  particular,
+Google  Inc.   have  provided  a comprehensive C++ wrapper. This is now
+included as part of the PCRE distribution. The pcrecpp page has details
+of  this  interface.  Other  people's contributions can be found in the
+Contrib directory at the primary FTP site, which is:
+ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
+Details of exactly which Perl regular expression features are  and  are
+not supported by PCRE are given in separate documents. See the pcrepat-
+tern and pcrecompat pages. There is a syntax summary in the  pcresyntax
+page.
+Some  features  of  PCRE can be included, excluded, or changed when the
+library is built. The pcre_config() function makes it  possible  for  a
+client  to  discover  which  features are available. The features them-
+selves are described in the pcrebuild page. Documentation about  build-
+ing  PCRE for various operating systems can be found in the README file
+in the source distribution.
+The library contains a number of undocumented  internal  functions  and
+data  tables  that  are  used by more than one of the exported external
+functions, but which are not intended  for  use  by  external  callers.
+Their  names  all begin with "_pcre_", which hopefully will not provoke
+any name clashes. In some environments, it is possible to control which
+external  symbols  are  exported when a shared library is built, and in
+these cases the undocumented symbols are not exported.
+USER DOCUMENTATION
+The user documentation for PCRE comprises a number  of  different  sec-
+tions.  In the "man" format, each of these is a separate "man page". In
+the HTML format, each is a separate page, linked from the  index  page.
+In  the  plain text format, all the sections are concatenated, for ease
+of searching. The sections are as follows:
+pcre              this document
+pcre-config       show PCRE installation configuration information
+pcreapi           details of PCRE's native C API
+pcrebuild         options for building PCRE
+pcrecallout       details of the callout feature
+pcrecompat        discussion of Perl compatibility
+pcrecpp           details of the C++ wrapper
+pcregrep          description of the pcregrep command
+pcrematching      discussion of the two matching algorithms
+pcrepartial       details of the partial matching facility
+pcrepattern       syntax and semantics of supported
+regular expressions
+pcresyntax        quick syntax reference
+pcreperform       discussion of performance issues
+pcreposix         the POSIX-compatible C API
+pcreprecompile    details of saving and re-using precompiled patterns
+pcresample        discussion of the sample program
+pcrestack         discussion of stack usage
+pcretest          description of the pcretest testing command
+In  addition,  in the "man" and HTML formats, there is a short page for
+each C library function, listing its arguments and results.
+LIMITATIONS
+There are some size limitations in PCRE but it is hoped that they  will
+never in practice be relevant.
+The  maximum  length of a compiled pattern is 65539 (sic) bytes if PCRE
+is compiled with the default internal linkage size of 2. If you want to
+process  regular  expressions  that are truly enormous, you can compile
+PCRE with an internal linkage size of 3 or 4 (see the  README  file  in
+the  source  distribution and the pcrebuild documentation for details).
+In these cases the limit is substantially larger.  However,  the  speed
+of execution is slower.
+All values in repeating quantifiers must be less than 65536.
+There is no limit to the number of parenthesized subpatterns, but there
+can be no more than 65535 capturing subpatterns.
+The maximum length of name for a named subpattern is 32 characters, and
+the maximum number of named subpatterns is 10000.
+The  maximum  length of a subject string is the largest positive number
+that an integer variable can hold. However, when using the  traditional
+matching function, PCRE uses recursion to handle subpatterns and indef-
+inite repetition.  This means that the available stack space may  limit
+the size of a subject string that can be processed by certain patterns.
+For a discussion of stack issues, see the pcrestack documentation.
+UTF-8 AND UNICODE PROPERTY SUPPORT
+From release 3.3, PCRE has  had  some  support  for  character  strings
+encoded  in the UTF-8 format. For release 4.0 this was greatly extended
+to cover most common requirements, and in release 5.0  additional  sup-
+port for Unicode general category properties was added.
+In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
+support in the code, and, in addition,  you  must  call  pcre_compile()
+with  the PCRE_UTF8 option flag. When you do this, both the pattern and
+any subject strings that are matched against it are  treated  as  UTF-8
+strings instead of just strings of bytes.
+If  you compile PCRE with UTF-8 support, but do not use it at run time,
+the library will be a bit bigger, but the additional run time  overhead
+is limited to testing the PCRE_UTF8 flag occasionally, so should not be
+very big.
+If PCRE is built with Unicode character property support (which implies
+UTF-8  support),  the  escape sequences \p{..}, \P{..}, and \X are sup-
+ported.  The available properties that can be tested are limited to the
+general  category  properties such as Lu for an upper case letter or Nd
+for a decimal number, the Unicode script names such as Arabic  or  Han,
+and  the  derived  properties  Any  and L&. A full list is given in the
+pcrepattern documentation. Only the short names for properties are sup-
+ported.  For example, \p{L} matches a letter. Its Perl synonym, \p{Let-
+ter}, is not supported.  Furthermore,  in  Perl,  many  properties  may
+optionally  be  prefixed by "Is", for compatibility with Perl 5.6. PCRE
+does not support this.
+Validity of UTF-8 strings
+When you set the PCRE_UTF8 flag, the strings  passed  as  patterns  and
+subjects are (by default) checked for validity on entry to the relevant
+functions. From release 7.3 of PCRE, the check is according  the  rules
+of  RFC  3629, which are themselves derived from the Unicode specifica-
+tion. Earlier releases of PCRE followed the rules of  RFC  2279,  which
+allows  the  full range of 31-bit values (0 to 0x7FFFFFFF). The current
+check allows only values in the range U+0 to U+10FFFF, excluding U+D800
+to U+DFFF.
+The  excluded  code  points are the "Low Surrogate Area" of Unicode, of
+which the Unicode Standard says this: "The Low Surrogate Area does  not
+contain  any  character  assignments,  consequently  no  character code
+charts or namelists are provided for this area. Surrogates are reserved
+for  use  with  UTF-16 and then must be used in pairs." The code points
+that are encoded by UTF-16 pairs  are  available  as  independent  code
+points  in  the  UTF-8  encoding.  (In other words, the whole surrogate
+thing is a fudge for UTF-16 which unfortunately messes up UTF-8.)
+If an  invalid  UTF-8  string  is  passed  to  PCRE,  an  error  return
+(PCRE_ERROR_BADUTF8) is given. In some situations, you may already know
+that your strings are valid, and therefore want to skip these checks in
+order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag at
+compile time or at run time, PCRE assumes that the pattern  or  subject
+it  is  given  (respectively)  contains only valid UTF-8 codes. In this
+case, it does not diagnose an invalid UTF-8 string.
+If you pass an invalid UTF-8 string  when  PCRE_NO_UTF8_CHECK  is  set,
+what  happens  depends on why the string is invalid. If the string con-
+forms to the "old" definition of UTF-8 (RFC 2279), it is processed as a
+string  of  characters  in  the  range 0 to 0x7FFFFFFF. In other words,
+apart from the initial validity test, PCRE (when in UTF-8 mode) handles
+strings  according  to  the more liberal rules of RFC 2279. However, if
+the string does not even conform to RFC 2279, the result is  undefined.
+Your program may crash.
+If  you  want  to  process  strings  of  values  in the full range 0 to
+0x7FFFFFFF, encoded in a UTF-8-like manner as per the old RFC, you  can
+set PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in
+this situation, you will have to apply your own validity check.
+General comments about UTF-8 mode
+1. An unbraced hexadecimal escape sequence (such  as  \xb3)  matches  a
+two-byte UTF-8 character if the value is greater than 127.
+2.  Octal  numbers  up to \777 are recognized, and match two-byte UTF-8
+characters for values greater than \177.
+3. Repeat quantifiers apply to complete UTF-8 characters, not to  indi-
+vidual bytes, for example: \x{100}{3}.
+4.  The dot metacharacter matches one UTF-8 character instead of a sin-
+gle byte.
+5. The escape sequence \C can be used to match a single byte  in  UTF-8
+mode,  but  its  use can lead to some strange effects. This facility is
+not available in the alternative matching function, pcre_dfa_exec().
+6. The character escapes \b, \B, \d, \D, \s, \S, \w, and  \W  correctly
+test  characters of any code value, but the characters that PCRE recog-
+nizes as digits, spaces, or word characters  remain  the  same  set  as
+before, all with values less than 256. This remains true even when PCRE
+includes Unicode property support, because to do otherwise  would  slow
+down  PCRE in many common cases. If you really want to test for a wider
+sense of, say, "digit", you must use Unicode  property  tests  such  as
+\p{Nd}.
+7.  Similarly,  characters that match the POSIX named character classes
+are all low-valued characters.
+8. However, the Perl 5.10 horizontal and vertical  whitespace  matching
+escapes (\h, \H, \v, and \V) do match all the appropriate Unicode char-
+acters.
+9. Case-insensitive matching applies only to  characters  whose  values
+are  less than 128, unless PCRE is built with Unicode property support.
+Even when Unicode property support is available, PCRE  still  uses  its
+own  character  tables when checking the case of low-valued characters,
+so as not to degrade performance.  The Unicode property information  is
+used only for characters with higher values. Even when Unicode property
+support is available, PCRE supports case-insensitive matching only when
+there  is  a  one-to-one  mapping between a letter's cases. There are a
+small number of many-to-one mappings in Unicode;  these  are  not  sup-
+ported by PCRE.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+Putting  an actual email address here seems to have been a spam magnet,
+so I've taken it away. If you want to email me, use  my  two  initials,
+followed by the two digits 10, at the domain cam.ac.uk.
+REVISION
+Last updated: 12 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCREBUILD(3)                                                      PCREBUILD(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE BUILD-TIME OPTIONS
+This  document  describes  the  optional  features  of PCRE that can be
+selected when the library is compiled. It assumes use of the  configure
+script,  where the optional features are selected or deselected by pro-
+viding options to configure before running the make  command.  However,
+the  same  options  can be selected in both Unix-like and non-Unix-like
+environments using the GUI facility of  CMakeSetup  if  you  are  using
+CMake instead of configure to build PCRE.
+The complete list of options for configure (which includes the standard
+ones such as the  selection  of  the  installation  directory)  can  be
+obtained by running
+./configure --help
+The  following  sections  include  descriptions  of options whose names
+begin with --enable or --disable. These settings specify changes to the
+defaults  for  the configure command. Because of the way that configure
+works, --enable and --disable always come in pairs, so  the  complemen-
+tary  option always exists as well, but as it specifies the default, it
+is not described.
+C++ SUPPORT
+By default, the configure script will search for a C++ compiler and C++
+header files. If it finds them, it automatically builds the C++ wrapper
+library for PCRE. You can disable this by adding
+--disable-cpp
+to the configure command.
+UTF-8 SUPPORT
+To build PCRE with support for UTF-8 character strings, add
+--enable-utf8
+to the configure command. Of itself, this  does  not  make  PCRE  treat
+strings  as UTF-8. As well as compiling PCRE with this option, you also
+have have to set the PCRE_UTF8 option when you call the  pcre_compile()
+function.
+UNICODE CHARACTER PROPERTY SUPPORT
+UTF-8  support allows PCRE to process character values greater than 255
+in the strings that it handles. On its own, however, it does  not  pro-
+vide any facilities for accessing the properties of such characters. If
+you want to be able to use the pattern escapes \P, \p,  and  \X,  which
+refer to Unicode character properties, you must add
+--enable-unicode-properties
+to  the configure command. This implies UTF-8 support, even if you have
+not explicitly requested it.
+Including Unicode property support adds around 30K  of  tables  to  the
+PCRE  library.  Only  the general category properties such as Lu and Nd
+are supported. Details are given in the pcrepattern documentation.
+CODE VALUE OF NEWLINE
+By default, PCRE interprets character 10 (linefeed, LF)  as  indicating
+the  end  of  a line. This is the normal newline character on Unix-like
+systems. You can compile PCRE to use character 13 (carriage return, CR)
+instead, by adding
+--enable-newline-is-cr
+to  the  configure  command.  There  is  also  a --enable-newline-is-lf
+option, which explicitly specifies linefeed as the newline character.
+Alternatively, you can specify that line endings are to be indicated by
+the two character sequence CRLF. If you want this, add
+--enable-newline-is-crlf
+to the configure command. There is a fourth option, specified by
+--enable-newline-is-anycrlf
+which  causes  PCRE  to recognize any of the three sequences CR, LF, or
+CRLF as indicating a line ending. Finally, a fifth option, specified by
+--enable-newline-is-any
+causes PCRE to recognize any Unicode newline sequence.
+Whatever  line  ending convention is selected when PCRE is built can be
+overridden when the library functions are called. At build time  it  is
+conventional to use the standard for your operating system.
+WHAT \R MATCHES
+By  default,  the  sequence \R in a pattern matches any Unicode newline
+sequence, whatever has been selected as the line  ending  sequence.  If
+you specify
+--enable-bsr-anycrlf
+the  default  is changed so that \R matches only CR, LF, or CRLF. What-
+ever is selected when PCRE is built can be overridden when the  library
+functions are called.
+BUILDING SHARED AND STATIC LIBRARIES
+The  PCRE building process uses libtool to build both shared and static
+Unix libraries by default. You can suppress one of these by adding  one
+of
+--disable-shared
+--disable-static
+to the configure command, as required.
+POSIX MALLOC USAGE
+When PCRE is called through the POSIX interface (see the pcreposix doc-
+umentation), additional working storage is  required  for  holding  the
+pointers  to capturing substrings, because PCRE requires three integers
+per substring, whereas the POSIX interface provides only  two.  If  the
+number of expected substrings is small, the wrapper function uses space
+on the stack, because this is faster than using malloc() for each call.
+The default threshold above which the stack is no longer used is 10; it
+can be changed by adding a setting such as
+--with-posix-malloc-threshold=20
+to the configure command.
+HANDLING VERY LARGE PATTERNS
+Within a compiled pattern, offset values are used  to  point  from  one
+part  to another (for example, from an opening parenthesis to an alter-
+nation metacharacter). By default, two-byte values are used  for  these
+offsets,  leading  to  a  maximum size for a compiled pattern of around
+64K. This is sufficient to handle all but the most  gigantic  patterns.
+Nevertheless,  some  people do want to process enormous patterns, so it
+is possible to compile PCRE to use three-byte or four-byte  offsets  by
+adding a setting such as
+--with-link-size=3
+to  the  configure  command.  The value given must be 2, 3, or 4. Using
+longer offsets slows down the operation of PCRE because it has to  load
+additional bytes when handling them.
+AVOIDING EXCESSIVE STACK USAGE
+When matching with the pcre_exec() function, PCRE implements backtrack-
+ing by making recursive calls to an internal function  called  match().
+In  environments  where  the size of the stack is limited, this can se-
+verely limit PCRE's operation. (The Unix environment does  not  usually
+suffer from this problem, but it may sometimes be necessary to increase
+the maximum stack size.  There is a discussion in the  pcrestack  docu-
+mentation.)  An alternative approach to recursion that uses memory from
+the heap to remember data, instead of using recursive  function  calls,
+has  been  implemented to work round the problem of limited stack size.
+If you want to build a version of PCRE that works this way, add
+--disable-stack-for-recursion
+to the configure command. With this configuration, PCRE  will  use  the
+pcre_stack_malloc  and pcre_stack_free variables to call memory manage-
+ment functions. By default these point to malloc() and free(), but  you
+can replace the pointers so that your own functions are used.
+Separate  functions  are  provided  rather  than  using pcre_malloc and
+pcre_free because the  usage  is  very  predictable:  the  block  sizes
+requested  are  always  the  same,  and  the blocks are always freed in
+reverse order. A calling program might be able to  implement  optimized
+functions  that  perform  better  than  malloc()  and free(). PCRE runs
+noticeably more slowly when built in this way. This option affects only
+the   pcre_exec()   function;   it   is   not   relevant  for  the  the
+pcre_dfa_exec() function.
+LIMITING PCRE RESOURCE USAGE
+Internally, PCRE has a function called match(), which it calls  repeat-
+edly   (sometimes   recursively)  when  matching  a  pattern  with  the
+pcre_exec() function. By controlling the maximum number of  times  this
+function  may be called during a single matching operation, a limit can
+be placed on the resources used by a single call  to  pcre_exec().  The
+limit  can be changed at run time, as described in the pcreapi documen-
+tation. The default is 10 million, but this can be changed by adding  a
+setting such as
+--with-match-limit=500000
+to   the   configure  command.  This  setting  has  no  effect  on  the
+pcre_dfa_exec() matching function.
+In some environments it is desirable to limit the  depth  of  recursive
+calls of match() more strictly than the total number of calls, in order
+to restrict the maximum amount of stack (or heap,  if  --disable-stack-
+for-recursion is specified) that is used. A second limit controls this;
+it defaults to the value that  is  set  for  --with-match-limit,  which
+imposes  no  additional constraints. However, you can set a lower limit
+by adding, for example,
+--with-match-limit-recursion=10000
+to the configure command. This value can  also  be  overridden  at  run
+time.
+CREATING CHARACTER TABLES AT BUILD TIME
+PCRE  uses fixed tables for processing characters whose code values are
+less than 256. By default, PCRE is built with a set of tables that  are
+distributed  in  the  file pcre_chartables.c.dist. These tables are for
+ASCII codes only. If you add
+--enable-rebuild-chartables
+to the configure command, the distributed tables are  no  longer  used.
+Instead,  a  program  called dftables is compiled and run. This outputs
+the source for new set of tables, created in the default locale of your
+C runtime system. (This method of replacing the tables does not work if
+you are cross compiling, because dftables is run on the local host.  If
+you  need  to  create alternative tables when cross compiling, you will
+have to do so "by hand".)
+USING EBCDIC CODE
+PCRE assumes by default that it will run in an  environment  where  the
+character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
+This is the case for most computer operating systems.  PCRE  can,  how-
+ever, be compiled to run in an EBCDIC environment by adding
+--enable-ebcdic
+to the configure command. This setting implies --enable-rebuild-charta-
+bles. You should only use it if you know that  you  are  in  an  EBCDIC
+environment (for example, an IBM mainframe operating system).
+PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT
+By default, pcregrep reads all files as plain text. You can build it so
+that it recognizes files whose names end in .gz or .bz2, and reads them
+with libz or libbz2, respectively, by adding one or both of
+--enable-pcregrep-libz
+--enable-pcregrep-libbz2
+to the configure command. These options naturally require that the rel-
+evant libraries are installed on your system. Configuration  will  fail
+if they are not.
+PCRETEST OPTION FOR LIBREADLINE SUPPORT
+If you add
+--enable-pcretest-libreadline
+to  the  configure  command,  pcretest  is  linked with the libreadline
+library, and when its input is from a terminal, it reads it  using  the
+readline() function. This provides line-editing and history facilities.
+Note that libreadline is GPL-licenced, so if you distribute a binary of
+pcretest linked in this way, there may be licensing issues.
+Setting  this  option  causes  the -lreadline option to be added to the
+pcretest build. In many operating environments with  a  sytem-installed
+libreadline this is sufficient. However, in some environments (e.g.  if
+an unmodified distribution version of readline is in use),  some  extra
+configuration  may  be necessary. The INSTALL file for libreadline says
+this:
+"Readline uses the termcap functions, but does not link with the
+termcap or curses library itself, allowing applications which link
+with readline the to choose an appropriate library."
+If your environment has not been set up so that an appropriate  library
+is automatically included, you may need to add something like
+LIBS="-ncurses"
+immediately before the configure command.
+SEE ALSO
+pcreapi(3), pcre_config(3).
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 13 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCREMATCHING(3)                                                PCREMATCHING(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE MATCHING ALGORITHMS
+This document describes the two different algorithms that are available
+in PCRE for matching a compiled regular expression against a given sub-
+ject  string.  The  "standard"  algorithm  is  the  one provided by the
+pcre_exec() function.  This works in the same was  as  Perl's  matching
+function, and provides a Perl-compatible matching operation.
+An  alternative  algorithm is provided by the pcre_dfa_exec() function;
+this operates in a different way, and is not  Perl-compatible.  It  has
+advantages  and disadvantages compared with the standard algorithm, and
+these are described below.
+When there is only one possible way in which a given subject string can
+match  a pattern, the two algorithms give the same answer. A difference
+arises, however, when there are multiple possibilities. For example, if
+the pattern
+^<.*>
+is matched against the string
+<something> <something else> <something further>
+there are three possible answers. The standard algorithm finds only one
+of them, whereas the alternative algorithm finds all three.
+REGULAR EXPRESSIONS AS TREES
+The set of strings that are matched by a regular expression can be rep-
+resented  as  a  tree structure. An unlimited repetition in the pattern
+makes the tree of infinite size, but it is still a tree.  Matching  the
+pattern  to a given subject string (from a given starting point) can be
+thought of as a search of the tree.  There are two  ways  to  search  a
+tree:  depth-first  and  breadth-first, and these correspond to the two
+matching algorithms provided by PCRE.
+THE STANDARD MATCHING ALGORITHM
+In the terminology of Jeffrey Friedl's book "Mastering Regular  Expres-
+sions",  the  standard  algorithm  is an "NFA algorithm". It conducts a
+depth-first search of the pattern tree. That is, it  proceeds  along  a
+single path through the tree, checking that the subject matches what is
+required. When there is a mismatch, the algorithm  tries  any  alterna-
+tives  at  the  current point, and if they all fail, it backs up to the
+previous branch point in the  tree,  and  tries  the  next  alternative
+branch  at  that  level.  This often involves backing up (moving to the
+left) in the subject string as well.  The  order  in  which  repetition
+branches  are  tried  is controlled by the greedy or ungreedy nature of
+the quantifier.
+If a leaf node is reached, a matching string has  been  found,  and  at
+that  point the algorithm stops. Thus, if there is more than one possi-
+ble match, this algorithm returns the first one that it finds.  Whether
+this  is the shortest, the longest, or some intermediate length depends
+on the way the greedy and ungreedy repetition quantifiers are specified
+in the pattern.
+Because  it  ends  up  with a single path through the tree, it is rela-
+tively straightforward for this algorithm to keep  track  of  the  sub-
+strings  that  are  matched  by portions of the pattern in parentheses.
+This provides support for capturing parentheses and back references.
+THE ALTERNATIVE MATCHING ALGORITHM
+This algorithm conducts a breadth-first search of  the  tree.  Starting
+from  the  first  matching  point  in the subject, it scans the subject
+string from left to right, once, character by character, and as it does
+this,  it remembers all the paths through the tree that represent valid
+matches. In Friedl's terminology, this is a kind  of  "DFA  algorithm",
+though  it is not implemented as a traditional finite state machine (it
+keeps multiple states active simultaneously).
+The scan continues until either the end of the subject is  reached,  or
+there  are  no more unterminated paths. At this point, terminated paths
+represent the different matching possibilities (if there are none,  the
+match  has  failed).   Thus,  if there is more than one possible match,
+this algorithm finds all of them, and in particular, it finds the long-
+est.  In PCRE, there is an option to stop the algorithm after the first
+match (which is necessarily the shortest) has been found.
+Note that all the matches that are found start at the same point in the
+subject. If the pattern
+cat(er(pillar)?)
+is  matched  against the string "the caterpillar catchment", the result
+will be the three strings "cat", "cater", and "caterpillar" that  start
+at the fourth character of the subject. The algorithm does not automat-
+ically move on to find matches that start at later positions.
+There are a number of features of PCRE regular expressions that are not
+supported by the alternative matching algorithm. They are as follows:
+1.  Because  the  algorithm  finds  all possible matches, the greedy or
+ungreedy nature of repetition quantifiers is not relevant.  Greedy  and
+ungreedy quantifiers are treated in exactly the same way. However, pos-
+sessive quantifiers can make a difference when what follows could  also
+match what is quantified, for example in a pattern like this:
+^a++\w!
+This  pattern matches "aaab!" but not "aaa!", which would be matched by
+a non-possessive quantifier. Similarly, if an atomic group is  present,
+it  is matched as if it were a standalone pattern at the current point,
+and the longest match is then "locked in" for the rest of  the  overall
+pattern.
+2. When dealing with multiple paths through the tree simultaneously, it
+is not straightforward to keep track of  captured  substrings  for  the
+different  matching  possibilities,  and  PCRE's implementation of this
+algorithm does not attempt to do this. This means that no captured sub-
+strings are available.
+3.  Because no substrings are captured, back references within the pat-
+tern are not supported, and cause errors if encountered.
+4. For the same reason, conditional expressions that use  a  backrefer-
+ence  as  the  condition or test for a specific group recursion are not
+supported.
+5. Because many paths through the tree may be  active,  the  \K  escape
+sequence, which resets the start of the match when encountered (but may
+be on some paths and not on others), is not  supported.  It  causes  an
+error if encountered.
+6.  Callouts  are  supported, but the value of the capture_top field is
+always 1, and the value of the capture_last field is always -1.
+7. The \C escape sequence, which (in the standard algorithm) matches  a
+single  byte, even in UTF-8 mode, is not supported because the alterna-
+tive algorithm moves through the subject  string  one  character  at  a
+time, for all active paths through the tree.
+8.  Except for (*FAIL), the backtracking control verbs such as (*PRUNE)
+are not supported. (*FAIL) is supported, and  behaves  like  a  failing
+negative assertion.
+ADVANTAGES OF THE ALTERNATIVE ALGORITHM
+Using  the alternative matching algorithm provides the following advan-
+tages:
+1. All possible matches (at a single point in the subject) are automat-
+ically  found,  and  in particular, the longest match is found. To find
+more than one match using the standard algorithm, you have to do kludgy
+things with callouts.
+2.  There is much better support for partial matching. The restrictions
+on the content of the pattern that apply when using the standard  algo-
+rithm  for  partial matching do not apply to the alternative algorithm.
+For non-anchored patterns, the starting position of a partial match  is
+available.
+3.  Because  the  alternative  algorithm  scans the subject string just
+once, and never needs to backtrack, it is possible to  pass  very  long
+subject  strings  to  the matching function in several pieces, checking
+for partial matching each time.
+DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
+The alternative algorithm suffers from a number of disadvantages:
+1. It is substantially slower than  the  standard  algorithm.  This  is
+partly  because  it has to search for all possible matches, but is also
+because it is less susceptible to optimization.
+2. Capturing parentheses and back references are not supported.
+3. Although atomic groups are supported, their use does not provide the
+performance advantage that it does for the standard algorithm.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 19 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCREAPI(3)                                                          PCREAPI(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE NATIVE API
+#include <pcre.h>
+pcre *pcre_compile(const char *pattern, int options,
+const char **errptr, int *erroffset,
+const unsigned char *tableptr);
+pcre *pcre_compile2(const char *pattern, int options,
+int *errorcodeptr,
+const char **errptr, int *erroffset,
+const unsigned char *tableptr);
+pcre_extra *pcre_study(const pcre *code, int options,
+const char **errptr);
+int pcre_exec(const pcre *code, const pcre_extra *extra,
+const char *subject, int length, int startoffset,
+int options, int *ovector, int ovecsize);
+int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
+const char *subject, int length, int startoffset,
+int options, int *ovector, int ovecsize,
+int *workspace, int wscount);
+int pcre_copy_named_substring(const pcre *code,
+const char *subject, int *ovector,
+int stringcount, const char *stringname,
+char *buffer, int buffersize);
+int pcre_copy_substring(const char *subject, int *ovector,
+int stringcount, int stringnumber, char *buffer,
+int buffersize);
+int pcre_get_named_substring(const pcre *code,
+const char *subject, int *ovector,
+int stringcount, const char *stringname,
+const char **stringptr);
+int pcre_get_stringnumber(const pcre *code,
+const char *name);
+int pcre_get_stringtable_entries(const pcre *code,
+const char *name, char **first, char **last);
+int pcre_get_substring(const char *subject, int *ovector,
+int stringcount, int stringnumber,
+const char **stringptr);
+int pcre_get_substring_list(const char *subject,
+int *ovector, int stringcount, const char ***listptr);
+void pcre_free_substring(const char *stringptr);
+void pcre_free_substring_list(const char **stringptr);
+const unsigned char *pcre_maketables(void);
+int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
+int what, void *where);
+int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
+int pcre_refcount(pcre *code, int adjust);
+int pcre_config(int what, void *where);
+char *pcre_version(void);
+void *(*pcre_malloc)(size_t);
+void (*pcre_free)(void *);
+void *(*pcre_stack_malloc)(size_t);
+void (*pcre_stack_free)(void *);
+int (*pcre_callout)(pcre_callout_block *);
+PCRE API OVERVIEW
+PCRE has its own native API, which is described in this document. There
+are also some wrapper functions that correspond to  the  POSIX  regular
+expression  API.  These  are  described in the pcreposix documentation.
+Both of these APIs define a set of C function calls. A C++  wrapper  is
+distributed with PCRE. It is documented in the pcrecpp page.
+The  native  API  C  function prototypes are defined in the header file
+pcre.h, and on Unix systems the library itself is called  libpcre.   It
+can normally be accessed by adding -lpcre to the command for linking an
+application  that  uses  PCRE.  The  header  file  defines  the  macros
+PCRE_MAJOR  and  PCRE_MINOR to contain the major and minor release num-
+bers for the library.  Applications can use these  to  include  support
+for different releases of PCRE.
+The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
+pcre_exec() are used for compiling and matching regular expressions  in
+a  Perl-compatible  manner. A sample program that demonstrates the sim-
+plest way of using them is provided in the file  called  pcredemo.c  in
+the  source distribution. The pcresample documentation describes how to
+compile and run it.
+A second matching function, pcre_dfa_exec(), which is not Perl-compati-
+ble,  is  also provided. This uses a different algorithm for the match-
+ing. The alternative algorithm finds all possible matches (at  a  given
+point  in  the subject), and scans the subject just once. However, this
+algorithm does not return captured substrings. A description of the two
+matching  algorithms and their advantages and disadvantages is given in
+the pcrematching documentation.
+In addition to the main compiling and  matching  functions,  there  are
+convenience functions for extracting captured substrings from a subject
+string that is matched by pcre_exec(). They are:
+pcre_copy_substring()
+pcre_copy_named_substring()
+pcre_get_substring()
+pcre_get_named_substring()
+pcre_get_substring_list()
+pcre_get_stringnumber()
+pcre_get_stringtable_entries()
+pcre_free_substring() and pcre_free_substring_list() are also provided,
+to free the memory used for extracted strings.
+The  function  pcre_maketables()  is  used  to build a set of character
+tables  in  the  current  locale   for   passing   to   pcre_compile(),
+pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
+provided for specialist use.  Most  commonly,  no  special  tables  are
+passed,  in  which case internal tables that are generated when PCRE is
+built are used.
+The function pcre_fullinfo() is used to find out  information  about  a
+compiled  pattern; pcre_info() is an obsolete version that returns only
+some of the available information, but is retained for  backwards  com-
+patibility.   The function pcre_version() returns a pointer to a string
+containing the version of PCRE and its date of release.
+The function pcre_refcount() maintains a  reference  count  in  a  data
+block  containing  a compiled pattern. This is provided for the benefit
+of object-oriented applications.
+The global variables pcre_malloc and pcre_free  initially  contain  the
+entry  points  of  the  standard malloc() and free() functions, respec-
+tively. PCRE calls the memory management functions via these variables,
+so  a  calling  program  can replace them if it wishes to intercept the
+calls. This should be done before calling any PCRE functions.
+The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
+indirections  to  memory  management functions. These special functions
+are used only when PCRE is compiled to use  the  heap  for  remembering
+data, instead of recursive function calls, when running the pcre_exec()
+function. See the pcrebuild documentation for  details  of  how  to  do
+this.  It  is  a non-standard way of building PCRE, for use in environ-
+ments that have limited stacks. Because of the greater  use  of  memory
+management,  it  runs  more  slowly. Separate functions are provided so
+that special-purpose external code can be  used  for  this  case.  When
+used,  these  functions  are always called in a stack-like manner (last
+obtained, first freed), and always for memory blocks of the same  size.
+There  is  a discussion about PCRE's stack usage in the pcrestack docu-
+mentation.
+The global variable pcre_callout initially contains NULL. It can be set
+by  the  caller  to  a "callout" function, which PCRE will then call at
+specified points during a matching operation. Details are given in  the
+pcrecallout documentation.
+NEWLINES
+PCRE  supports five different conventions for indicating line breaks in
+strings: a single CR (carriage return) character, a  single  LF  (line-
+feed) character, the two-character sequence CRLF, any of the three pre-
+ceding, or any Unicode newline sequence. The Unicode newline  sequences
+are  the  three just mentioned, plus the single characters VT (vertical
+tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
+separator, U+2028), and PS (paragraph separator, U+2029).
+Each  of  the first three conventions is used by at least one operating
+system as its standard newline sequence. When PCRE is built, a  default
+can  be  specified.  The default default is LF, which is the Unix stan-
+dard. When PCRE is run, the default can be overridden,  either  when  a
+pattern is compiled, or when it is matched.
+At compile time, the newline convention can be specified by the options
+argument of pcre_compile(), or it can be specified by special  text  at
+the start of the pattern itself; this overrides any other settings. See
+the pcrepattern page for details of the special character sequences.
+In the PCRE documentation the word "newline" is used to mean "the char-
+acter  or pair of characters that indicate a line break". The choice of
+newline convention affects the handling of  the  dot,  circumflex,  and
+dollar metacharacters, the handling of #-comments in /x mode, and, when
+CRLF is a recognized line ending sequence, the match position  advance-
+ment for a non-anchored pattern. There is more detail about this in the
+section on pcre_exec() options below.
+The choice of newline convention does not affect the interpretation  of
+the  \n  or  \r  escape  sequences, nor does it affect what \R matches,
+which is controlled in a similar way, but by separate options.
+MULTITHREADING
+The PCRE functions can be used in  multi-threading  applications,  with
+the  proviso  that  the  memory  management  functions  pointed  to  by
+pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
+callout function pointed to by pcre_callout, are shared by all threads.
+The compiled form of a regular expression is not altered during  match-
+ing, so the same compiled pattern can safely be used by several threads
+at once.
+SAVING PRECOMPILED PATTERNS FOR LATER USE
+The compiled form of a regular expression can be saved and re-used at a
+later  time,  possibly by a different program, and even on a host other
+than the one on which  it  was  compiled.  Details  are  given  in  the
+pcreprecompile  documentation.  However, compiling a regular expression
+with one version of PCRE for use with a different version is not  guar-
+anteed to work and may cause crashes.
+CHECKING BUILD-TIME OPTIONS
+int pcre_config(int what, void *where);
+The  function pcre_config() makes it possible for a PCRE client to dis-
+cover which optional features have been compiled into the PCRE library.
+The  pcrebuild documentation has more details about these optional fea-
+tures.
+The first argument for pcre_config() is an  integer,  specifying  which
+information is required; the second argument is a pointer to a variable
+into which the information is  placed.  The  following  information  is
+available:
+PCRE_CONFIG_UTF8
+The  output is an integer that is set to one if UTF-8 support is avail-
+able; otherwise it is set to zero.
+PCRE_CONFIG_UNICODE_PROPERTIES
+The output is an integer that is set to  one  if  support  for  Unicode
+character properties is available; otherwise it is set to zero.
+PCRE_CONFIG_NEWLINE
+The  output  is  an integer whose value specifies the default character
+sequence that is recognized as meaning "newline". The four values  that
+are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
+and -1 for ANY. The default should normally be  the  standard  sequence
+for your operating system.
+PCRE_CONFIG_BSR
+The output is an integer whose value indicates what character sequences
+the \R escape sequence matches by default. A value of 0 means  that  \R
+matches  any  Unicode  line ending sequence; a value of 1 means that \R
+matches only CR, LF, or CRLF. The default can be overridden when a pat-
+tern is compiled or matched.
+PCRE_CONFIG_LINK_SIZE
+The  output  is  an  integer that contains the number of bytes used for
+internal linkage in compiled regular expressions. The value is 2, 3, or
+4.  Larger  values  allow larger regular expressions to be compiled, at
+the expense of slower matching. The default value of  2  is  sufficient
+for  all  but  the  most massive patterns, since it allows the compiled
+pattern to be up to 64K in size.
+PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
+The output is an integer that contains the threshold  above  which  the
+POSIX  interface  uses malloc() for output vectors. Further details are
+given in the pcreposix documentation.
+PCRE_CONFIG_MATCH_LIMIT
+The output is an integer that gives the default limit for the number of
+internal  matching  function  calls in a pcre_exec() execution. Further
+details are given with pcre_exec() below.
+PCRE_CONFIG_MATCH_LIMIT_RECURSION
+The output is an integer that gives the default limit for the depth  of
+recursion  when calling the internal matching function in a pcre_exec()
+execution. Further details are given with pcre_exec() below.
+PCRE_CONFIG_STACKRECURSE
+The output is an integer that is set to one if internal recursion  when
+running pcre_exec() is implemented by recursive function calls that use
+the stack to remember their state. This is the usual way that  PCRE  is
+compiled. The output is zero if PCRE was compiled to use blocks of data
+on the  heap  instead  of  recursive  function  calls.  In  this  case,
+pcre_stack_malloc  and  pcre_stack_free  are  called  to  manage memory
+blocks on the heap, thus avoiding the use of the stack.
+COMPILING A PATTERN
+pcre *pcre_compile(const char *pattern, int options,
+const char **errptr, int *erroffset,
+const unsigned char *tableptr);
+pcre *pcre_compile2(const char *pattern, int options,
+int *errorcodeptr,
+const char **errptr, int *erroffset,
+const unsigned char *tableptr);
+Either of the functions pcre_compile() or pcre_compile2() can be called
+to compile a pattern into an internal form. The only difference between
+the two interfaces is that pcre_compile2() has an additional  argument,
+errorcodeptr, via which a numerical error code can be returned.
+The pattern is a C string terminated by a binary zero, and is passed in
+the pattern argument. A pointer to a single block  of  memory  that  is
+obtained  via  pcre_malloc is returned. This contains the compiled code
+and related data. The pcre type is defined for the returned block; this
+is a typedef for a structure whose contents are not externally defined.
+It is up to the caller to free the memory (via pcre_free) when it is no
+longer required.
+Although  the compiled code of a PCRE regex is relocatable, that is, it
+does not depend on memory location, the complete pcre data block is not
+fully  relocatable, because it may contain a copy of the tableptr argu-
+ment, which is an address (see below).
+The options argument contains various bit settings that affect the com-
+pilation.  It  should be zero if no options are required. The available
+options are described below. Some of them, in  particular,  those  that
+are  compatible  with  Perl,  can also be set and unset from within the
+pattern (see the detailed description  in  the  pcrepattern  documenta-
+tion).  For  these options, the contents of the options argument speci-
+fies their initial settings at the start of compilation and  execution.
+The  PCRE_ANCHORED  and PCRE_NEWLINE_xxx options can be set at the time
+of matching as well as at compile time.
+If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
+if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
+sets the variable pointed to by errptr to point to a textual error mes-
+sage. This is a static string that is part of the library. You must not
+try to free it. The offset from the start of the pattern to the charac-
+ter where the error was discovered is placed in the variable pointed to
+by erroffset, which must not be NULL. If it is, an immediate  error  is
+given.
+If  pcre_compile2()  is  used instead of pcre_compile(), and the error-
+codeptr argument is not NULL, a non-zero error code number is  returned
+via  this argument in the event of an error. This is in addition to the
+textual error message. Error codes and messages are listed below.
+If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
+character  tables  that  are  built  when  PCRE  is compiled, using the
+default C locale. Otherwise, tableptr must be an address  that  is  the
+result  of  a  call to pcre_maketables(). This value is stored with the
+compiled pattern, and used again by pcre_exec(), unless  another  table
+pointer is passed to it. For more discussion, see the section on locale
+support below.
+This code fragment shows a typical straightforward  call  to  pcre_com-
+pile():
+pcre *re;
+const char *error;
+int erroffset;
+re = pcre_compile(
+"^A.*Z",          /* the pattern */
+0,                /* default options */
+&error,           /* for error message */
+&erroffset,       /* for error offset */
+NULL);            /* use default character tables */
+The  following  names  for option bits are defined in the pcre.h header
+file:
+PCRE_ANCHORED
+If this bit is set, the pattern is forced to be "anchored", that is, it
+is  constrained to match only at the first matching point in the string
+that is being searched (the "subject string"). This effect can also  be
+achieved  by appropriate constructs in the pattern itself, which is the
+only way to do it in Perl.
+PCRE_AUTO_CALLOUT
+If this bit is set, pcre_compile() automatically inserts callout items,
+all  with  number  255, before each pattern item. For discussion of the
+callout facility, see the pcrecallout documentation.
+PCRE_BSR_ANYCRLF
+PCRE_BSR_UNICODE
+These options (which are mutually exclusive) control what the \R escape
+sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+or to match any Unicode newline sequence. The default is specified when
+PCRE is built. It can be overridden from within the pattern, or by set-
+ting an option when a compiled pattern is matched.
+PCRE_CASELESS
+If this bit is set, letters in the pattern match both upper  and  lower
+case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
+changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
+always  understands the concept of case for characters whose values are
+less than 128, so caseless matching is always possible. For  characters
+with  higher  values,  the concept of case is supported if PCRE is com-
+piled with Unicode property support, but not otherwise. If you want  to
+use  caseless  matching  for  characters 128 and above, you must ensure
+that PCRE is compiled with Unicode property support  as  well  as  with
+UTF-8 support.
+PCRE_DOLLAR_ENDONLY
+If  this bit is set, a dollar metacharacter in the pattern matches only
+at the end of the subject string. Without this option,  a  dollar  also
+matches  immediately before a newline at the end of the string (but not
+before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
+if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
+Perl, and no way to set it within a pattern.
+PCRE_DOTALL
+If this bit is set, a dot metacharater in the pattern matches all char-
+acters,  including  those that indicate newline. Without it, a dot does
+not match when the current position is at a  newline.  This  option  is
+equivalent  to Perl's /s option, and it can be changed within a pattern
+by a (?s) option setting. A negative class such as [^a] always  matches
+newline characters, independent of the setting of this option.
+PCRE_DUPNAMES
+If  this  bit is set, names used to identify capturing subpatterns need
+not be unique. This can be helpful for certain types of pattern when it
+is  known  that  only  one instance of the named subpattern can ever be
+matched. There are more details of named subpatterns  below;  see  also
+the pcrepattern documentation.
+PCRE_EXTENDED
+If  this  bit  is  set,  whitespace  data characters in the pattern are
+totally ignored except when escaped or inside a character class. White-
+space does not include the VT character (code 11). In addition, charac-
+ters between an unescaped # outside a character class and the next new-
+line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
+option, and it can be changed within a pattern by a  (?x)  option  set-
+ting.
+This  option  makes  it possible to include comments inside complicated
+patterns.  Note, however, that this applies only  to  data  characters.
+Whitespace   characters  may  never  appear  within  special  character
+sequences in a pattern, for  example  within  the  sequence  (?(  which
+introduces a conditional subpattern.
+PCRE_EXTRA
+This  option  was invented in order to turn on additional functionality
+of PCRE that is incompatible with Perl, but it  is  currently  of  very
+little  use. When set, any backslash in a pattern that is followed by a
+letter that has no special meaning  causes  an  error,  thus  reserving
+these  combinations  for  future  expansion.  By default, as in Perl, a
+backslash followed by a letter with no special meaning is treated as  a
+literal.  (Perl can, however, be persuaded to give a warning for this.)
+There are at present no other features controlled by  this  option.  It
+can also be set by a (?X) option setting within a pattern.
+PCRE_FIRSTLINE
+If  this  option  is  set,  an  unanchored pattern is required to match
+before or at the first  newline  in  the  subject  string,  though  the
+matched text may continue over the newline.
+PCRE_JAVASCRIPT_COMPAT
+If this option is set, PCRE's behaviour is changed in some ways so that
+it is compatible with JavaScript rather than Perl. The changes  are  as
+follows:
+(1)  A  lone  closing square bracket in a pattern causes a compile-time
+error, because this is illegal in JavaScript (by default it is  treated
+as a data character). Thus, the pattern AB]CD becomes illegal when this
+option is set.
+(2) At run time, a back reference to an unset subpattern group  matches
+an  empty  string (by default this causes the current matching alterna-
+tive to fail). A pattern such as (\1)(a) succeeds when this  option  is
+set  (assuming  it can find an "a" in the subject), whereas it fails by
+default, for Perl compatibility.
+PCRE_MULTILINE
+By default, PCRE treats the subject string as consisting  of  a  single
+line  of characters (even if it actually contains newlines). The "start
+of line" metacharacter (^) matches only at the  start  of  the  string,
+while  the  "end  of line" metacharacter ($) matches only at the end of
+the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
+is set). This is the same as Perl.
+When  PCRE_MULTILINE  it  is set, the "start of line" and "end of line"
+constructs match immediately following or immediately  before  internal
+newlines  in  the  subject string, respectively, as well as at the very
+start and end. This is equivalent to Perl's /m option, and  it  can  be
+changed within a pattern by a (?m) option setting. If there are no new-
+lines in a subject string, or no occurrences of ^ or $  in  a  pattern,
+setting PCRE_MULTILINE has no effect.
+PCRE_NEWLINE_CR
+PCRE_NEWLINE_LF
+PCRE_NEWLINE_CRLF
+PCRE_NEWLINE_ANYCRLF
+PCRE_NEWLINE_ANY
+These  options  override the default newline definition that was chosen
+when PCRE was built. Setting the first or the second specifies  that  a
+newline  is  indicated  by a single character (CR or LF, respectively).
+Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by  the
+two-character  CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF specifies
+that any of the three preceding sequences should be recognized. Setting
+PCRE_NEWLINE_ANY  specifies that any Unicode newline sequence should be
+recognized. The Unicode newline sequences are the three just mentioned,
+plus  the  single  characters  VT (vertical tab, U+000B), FF (formfeed,
+U+000C), NEL (next line, U+0085), LS (line separator, U+2028),  and  PS
+(paragraph  separator,  U+2029).  The  last  two are recognized only in
+UTF-8 mode.
+The newline setting in the  options  word  uses  three  bits  that  are
+treated as a number, giving eight possibilities. Currently only six are
+used (default plus the five values above). This means that if  you  set
+more  than one newline option, the combination may or may not be sensi-
+ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
+PCRE_NEWLINE_CRLF,  but other combinations may yield unused numbers and
+cause an error.
+The only time that a line break is specially recognized when  compiling
+a  pattern  is  if  PCRE_EXTENDED  is set, and an unescaped # outside a
+character class is encountered. This indicates  a  comment  that  lasts
+until  after the next line break sequence. In other circumstances, line
+break  sequences  are  treated  as  literal  data,   except   that   in
+PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
+and are therefore ignored.
+The newline option that is set at compile time becomes the default that
+is  used for pcre_exec() and pcre_dfa_exec(), but it can be overridden.
+PCRE_NO_AUTO_CAPTURE
+If this option is set, it disables the use of numbered capturing paren-
+theses  in the pattern. Any opening parenthesis that is not followed by
+? behaves as if it were followed by ?: but named parentheses can  still
+be  used  for  capturing  (and  they acquire numbers in the usual way).
+There is no equivalent of this option in Perl.
+PCRE_UNGREEDY
+This option inverts the "greediness" of the quantifiers  so  that  they
+are  not greedy by default, but become greedy if followed by "?". It is
+not compatible with Perl. It can also be set by a (?U)  option  setting
+within the pattern.
+PCRE_UTF8
+This  option  causes PCRE to regard both the pattern and the subject as
+strings of UTF-8 characters instead of single-byte  character  strings.
+However,  it is available only when PCRE is built to include UTF-8 sup-
+port. If not, the use of this option provokes an error. Details of  how
+this  option  changes the behaviour of PCRE are given in the section on
+UTF-8 support in the main pcre page.
+PCRE_NO_UTF8_CHECK
+When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
+automatically  checked.  There  is  a  discussion about the validity of
+UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of
+bytes  is  found,  pcre_compile() returns an error. If you already know
+that your pattern is valid, and you want to skip this check for perfor-
+mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is
+set, the effect of passing an invalid UTF-8  string  as  a  pattern  is
+undefined.  It  may  cause your program to crash. Note that this option
+can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the
+UTF-8 validity checking of subject strings.
+COMPILATION ERROR CODES
+The  following  table  lists  the  error  codes than may be returned by
+pcre_compile2(), along with the error messages that may be returned  by
+both  compiling functions. As PCRE has developed, some error codes have
+fallen out of use. To avoid confusion, they have not been re-used.
+0  no error
+1  \ at end of pattern
+2  \c at end of pattern
+3  unrecognized character follows \
+4  numbers out of order in {} quantifier
+5  number too big in {} quantifier
+6  missing terminating ] for character class
+7  invalid escape sequence in character class
+8  range out of order in character class
+9  nothing to repeat
+10  [this code is not in use]
+11  internal error: unexpected repeat
+12  unrecognized character after (? or (?-
+13  POSIX named classes are supported only within a class
+14  missing )
+15  reference to non-existent subpattern
+16  erroffset passed as NULL
+17  unknown option bit(s) set
+18  missing ) after comment
+19  [this code is not in use]
+20  regular expression is too large
+21  failed to get memory
+22  unmatched parentheses
+23  internal error: code overflow
+24  unrecognized character after (?<
+25  lookbehind assertion is not fixed length
+26  malformed number or name after (?(
+27  conditional group contains more than two branches
+28  assertion expected after (?(
+29  (?R or (?[+-]digits must be followed by )
+30  unknown POSIX class name
+31  POSIX collating elements are not supported
+32  this version of PCRE is not compiled with PCRE_UTF8 support
+33  [this code is not in use]
+34  character value in \x{...} sequence is too large
+35  invalid condition (?(0)
+36  \C not allowed in lookbehind assertion
+37  PCRE does not support \L, \l, \N, \U, or \u
+38  number after (?C is > 255
+39  closing ) for (?C expected
+40  recursive call could loop indefinitely
+41  unrecognized character after (?P
+42  syntax error in subpattern name (missing terminator)
+43  two named subpatterns have the same name
+44  invalid UTF-8 string
+45  support for \P, \p, and \X has not been compiled
+46  malformed \P or \p sequence
+47  unknown property name after \P or \p
+48  subpattern name is too long (maximum 32 characters)
+49  too many named subpatterns (maximum 10000)
+50  [this code is not in use]
+51  octal value is greater than \377 (not in UTF-8 mode)
+52  internal error: overran compiling workspace
+53  internal  error:  previously-checked  referenced  subpattern  not
+found
+54  DEFINE group contains more than one branch
+55  repeating a DEFINE group is not allowed
+56  inconsistent NEWLINE options
+57  \g is not followed by a braced, angle-bracketed, or quoted
+name/number or by a plain number
+58  a numbered reference must not be zero
+59  (*VERB) with an argument is not supported
+60  (*VERB) not recognized
+61  number is too big
+62  subpattern name expected
+63  digit expected after (?+
+64  ] is an invalid data character in JavaScript compatibility mode
+The  numbers  32  and 10000 in errors 48 and 49 are defaults; different
+values may be used if the limits were changed when PCRE was built.
+STUDYING A PATTERN
+pcre_extra *pcre_study(const pcre *code, int options
+const char **errptr);
+If a compiled pattern is going to be used several times,  it  is  worth
+spending more time analyzing it in order to speed up the time taken for
+matching. The function pcre_study() takes a pointer to a compiled  pat-
+tern as its first argument. If studying the pattern produces additional
+information that will help speed up matching,  pcre_study()  returns  a
+pointer  to a pcre_extra block, in which the study_data field points to
+the results of the study.
+The  returned  value  from  pcre_study()  can  be  passed  directly  to
+pcre_exec().  However,  a  pcre_extra  block also contains other fields
+that can be set by the caller before the block  is  passed;  these  are
+described below in the section on matching a pattern.
+If  studying  the  pattern  does not produce any additional information
+pcre_study() returns NULL. In that circumstance, if the calling program
+wants  to  pass  any of the other fields to pcre_exec(), it must set up
+its own pcre_extra block.
+The second argument of pcre_study() contains option bits.  At  present,
+no options are defined, and this argument should always be zero.
+The  third argument for pcre_study() is a pointer for an error message.
+If studying succeeds (even if no data is  returned),  the  variable  it
+points  to  is  set  to NULL. Otherwise it is set to point to a textual
+error message. This is a static string that is part of the library. You
+must  not  try  to  free it. You should test the error pointer for NULL
+after calling pcre_study(), to be sure that it has run successfully.
+This is a typical call to pcre_study():
+pcre_extra *pe;
+pe = pcre_study(
+re,             /* result of pcre_compile() */
+0,              /* no options exist */
+&error);        /* set to NULL or points to a message */
+At present, studying a pattern is useful only for non-anchored patterns
+that  do not have a single fixed starting character. A bitmap of possi-
+ble starting bytes is created.
+LOCALE SUPPORT
+PCRE handles caseless matching, and determines whether  characters  are
+letters,  digits, or whatever, by reference to a set of tables, indexed
+by character value. When running in UTF-8 mode, this  applies  only  to
+characters  with  codes  less than 128. Higher-valued codes never match
+escapes such as \w or \d, but can be tested with \p if  PCRE  is  built
+with  Unicode  character property support. The use of locales with Uni-
+code is discouraged. If you are handling characters with codes  greater
+than  128, you should either use UTF-8 and Unicode, or use locales, but
+not try to mix the two.
+PCRE contains an internal set of tables that are used  when  the  final
+argument  of  pcre_compile()  is  NULL.  These  are sufficient for many
+applications.  Normally, the internal tables recognize only ASCII char-
+acters. However, when PCRE is built, it is possible to cause the inter-
+nal tables to be rebuilt in the default "C" locale of the local system,
+which may cause them to be different.
+The  internal tables can always be overridden by tables supplied by the
+application that calls PCRE. These may be created in a different locale
+from  the  default.  As more and more applications change to using Uni-
+code, the need for this locale support is expected to die away.
+External tables are built by calling  the  pcre_maketables()  function,
+which  has no arguments, in the relevant locale. The result can then be
+passed to pcre_compile() or pcre_exec()  as  often  as  necessary.  For
+example,  to  build  and use tables that are appropriate for the French
+locale (where accented characters with  values  greater  than  128  are
+treated as letters), the following code could be used:
+setlocale(LC_CTYPE, "fr_FR");
+tables = pcre_maketables();
+re = pcre_compile(..., tables);
+The  locale  name "fr_FR" is used on Linux and other Unix-like systems;
+if you are using Windows, the name for the French locale is "french".
+When pcre_maketables() runs, the tables are built  in  memory  that  is
+obtained  via  pcre_malloc. It is the caller's responsibility to ensure
+that the memory containing the tables remains available for as long  as
+it is needed.
+The pointer that is passed to pcre_compile() is saved with the compiled
+pattern, and the same tables are used via this pointer by  pcre_study()
+and normally also by pcre_exec(). Thus, by default, for any single pat-
+tern, compilation, studying and matching all happen in the same locale,
+but different patterns can be compiled in different locales.
+It  is  possible to pass a table pointer or NULL (indicating the use of
+the internal tables) to pcre_exec(). Although  not  intended  for  this
+purpose,  this facility could be used to match a pattern in a different
+locale from the one in which it was compiled. Passing table pointers at
+run time is discussed below in the section on matching a pattern.
+INFORMATION ABOUT A PATTERN
+int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
+int what, void *where);
+The  pcre_fullinfo() function returns information about a compiled pat-
+tern. It replaces the obsolete pcre_info() function, which is neverthe-
+less retained for backwards compability (and is documented below).
+The  first  argument  for  pcre_fullinfo() is a pointer to the compiled
+pattern. The second argument is the result of pcre_study(), or NULL  if
+the  pattern  was not studied. The third argument specifies which piece
+of information is required, and the fourth argument is a pointer  to  a
+variable  to  receive  the  data. The yield of the function is zero for
+success, or one of the following negative numbers:
+PCRE_ERROR_NULL       the argument code was NULL
+the argument where was NULL
+PCRE_ERROR_BADMAGIC   the "magic number" was not found
+PCRE_ERROR_BADOPTION  the value of what was invalid
+The "magic number" is placed at the start of each compiled  pattern  as
+an  simple check against passing an arbitrary memory pointer. Here is a
+typical call of pcre_fullinfo(), to obtain the length of  the  compiled
+pattern:
+int rc;
+size_t length;
+rc = pcre_fullinfo(
+re,               /* result of pcre_compile() */
+pe,               /* result of pcre_study(), or NULL */
+PCRE_INFO_SIZE,   /* what is required */
+&length);         /* where to put the data */
+The  possible  values for the third argument are defined in pcre.h, and
+are as follows:
+PCRE_INFO_BACKREFMAX
+Return the number of the highest back reference  in  the  pattern.  The
+fourth  argument  should  point to an int variable. Zero is returned if
+there are no back references.
+PCRE_INFO_CAPTURECOUNT
+Return the number of capturing subpatterns in the pattern.  The  fourth
+argument should point to an int variable.
+PCRE_INFO_DEFAULT_TABLES
+Return  a pointer to the internal default character tables within PCRE.
+The fourth argument should point to an unsigned char *  variable.  This
+information call is provided for internal use by the pcre_study() func-
+tion. External callers can cause PCRE to use  its  internal  tables  by
+passing a NULL table pointer.
+PCRE_INFO_FIRSTBYTE
+Return  information  about  the first byte of any matched string, for a
+non-anchored pattern. The fourth argument should point to an int  vari-
+able.  (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
+is still recognized for backwards compatibility.)
+If there is a fixed first byte, for example, from  a  pattern  such  as
+(cat|cow|coyote), its value is returned. Otherwise, if either
+(a)  the pattern was compiled with the PCRE_MULTILINE option, and every
+branch starts with "^", or
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
+set (if it were set, the pattern would be anchored),
+-1  is  returned, indicating that the pattern matches only at the start
+of a subject string or after any newline within the  string.  Otherwise
+-2 is returned. For anchored patterns, -2 is returned.
+PCRE_INFO_FIRSTTABLE
+If  the pattern was studied, and this resulted in the construction of a
+256-bit table indicating a fixed set of bytes for the first byte in any
+matching  string, a pointer to the table is returned. Otherwise NULL is
+returned. The fourth argument should point to an unsigned char *  vari-
+able.
+PCRE_INFO_HASCRORLF
+Return  1  if  the  pattern  contains any explicit matches for CR or LF
+characters, otherwise 0. The fourth argument should  point  to  an  int
+variable.  An explicit match is either a literal CR or LF character, or
+\r or \n.
+PCRE_INFO_JCHANGED
+Return 1 if the (?J) or (?-J) option setting is used  in  the  pattern,
+otherwise  0. The fourth argument should point to an int variable. (?J)
+and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
+PCRE_INFO_LASTLITERAL
+Return the value of the rightmost literal byte that must exist  in  any
+matched  string,  other  than  at  its  start,  if such a byte has been
+recorded. The fourth argument should point to an int variable. If there
+is  no such byte, -1 is returned. For anchored patterns, a last literal
+byte is recorded only if it follows something of variable  length.  For
+example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
+/^a\dz\d/ the returned value is -1.
+PCRE_INFO_NAMECOUNT
+PCRE_INFO_NAMEENTRYSIZE
+PCRE_INFO_NAMETABLE
+PCRE supports the use of named as well as numbered capturing  parenthe-
+ses.  The names are just an additional way of identifying the parenthe-
+ses, which still acquire numbers. Several convenience functions such as
+pcre_get_named_substring()  are  provided  for extracting captured sub-
+strings by name. It is also possible to extract the data  directly,  by
+first  converting  the  name to a number in order to access the correct
+pointers in the output vector (described with pcre_exec() below). To do
+the  conversion,  you  need  to  use  the  name-to-number map, which is
+described by these three values.
+The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
+gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
+of each entry; both of these  return  an  int  value.  The  entry  size
+depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
+a pointer to the first entry of the table  (a  pointer  to  char).  The
+first two bytes of each entry are the number of the capturing parenthe-
+sis, most significant byte first. The rest of the entry is  the  corre-
+sponding  name,  zero  terminated. The names are in alphabetical order.
+When PCRE_DUPNAMES is set, duplicate names are in order of their paren-
+theses  numbers.  For  example,  consider the following pattern (assume
+PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is
+ignored):
+(?<date> (?<year>(\d\d)?\d\d) -
+(?<month>\d\d) - (?<day>\d\d) )
+There  are  four  named subpatterns, so the table has four entries, and
+each entry in the table is eight bytes long. The table is  as  follows,
+with non-printing bytes shows in hexadecimal, and undefined bytes shown
+as ??:
+00 01 d  a  t  e  00 ??
+00 05 d  a  y  00 ?? ??
+00 04 m  o  n  t  h  00
+00 02 y  e  a  r  00 ??
+When writing code to extract data  from  named  subpatterns  using  the
+name-to-number  map,  remember that the length of the entries is likely
+to be different for each compiled pattern.
+PCRE_INFO_OKPARTIAL
+Return 1 if the pattern can be used for partial matching, otherwise  0.
+The  fourth  argument  should point to an int variable. The pcrepartial
+documentation lists the restrictions that apply to patterns  when  par-
+tial matching is used.
+PCRE_INFO_OPTIONS
+Return  a  copy of the options with which the pattern was compiled. The
+fourth argument should point to an unsigned long  int  variable.  These
+option bits are those specified in the call to pcre_compile(), modified
+by any top-level option settings at the start of the pattern itself. In
+other  words,  they are the options that will be in force when matching
+starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
+the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
+and PCRE_EXTENDED.
+A pattern is automatically anchored by PCRE if  all  of  its  top-level
+alternatives begin with one of the following:
+^     unless PCRE_MULTILINE is set
+\A    always
+\G    always
+.*    if PCRE_DOTALL is set and there are no back
+references to the subpattern in which .* appears
+For such patterns, the PCRE_ANCHORED bit is set in the options returned
+by pcre_fullinfo().
+PCRE_INFO_SIZE
+Return the size of the compiled pattern, that is, the  value  that  was
+passed as the argument to pcre_malloc() when PCRE was getting memory in
+which to place the compiled data. The fourth argument should point to a
+size_t variable.
+PCRE_INFO_STUDYSIZE
+Return the size of the data block pointed to by the study_data field in
+a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
+pcre_malloc() when PCRE was getting memory into which to place the data
+created by pcre_study(). The fourth argument should point to  a  size_t
+variable.
+OBSOLETE INFO FUNCTION
+int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
+The  pcre_info()  function is now obsolete because its interface is too
+restrictive to return all the available data about a compiled  pattern.
+New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
+pcre_info() is the number of capturing subpatterns, or one of the  fol-
+lowing negative numbers:
+PCRE_ERROR_NULL       the argument code was NULL
+PCRE_ERROR_BADMAGIC   the "magic number" was not found
+If  the  optptr  argument is not NULL, a copy of the options with which
+the pattern was compiled is placed in the integer  it  points  to  (see
+PCRE_INFO_OPTIONS above).
+If  the  pattern  is  not anchored and the firstcharptr argument is not
+NULL, it is used to pass back information about the first character  of
+any matched string (see PCRE_INFO_FIRSTBYTE above).
+REFERENCE COUNTS
+int pcre_refcount(pcre *code, int adjust);
+The  pcre_refcount()  function is used to maintain a reference count in
+the data block that contains a compiled pattern. It is provided for the
+benefit  of  applications  that  operate  in an object-oriented manner,
+where different parts of the application may be using the same compiled
+pattern, but you want to free the block when they are all done.
+When a pattern is compiled, the reference count field is initialized to
+zero.  It is changed only by calling this function, whose action is  to
+add  the  adjust  value  (which may be positive or negative) to it. The
+yield of the function is the new value. However, the value of the count
+is  constrained to lie between 0 and 65535, inclusive. If the new value
+is outside these limits, it is forced to the appropriate limit value.
+Except when it is zero, the reference count is not correctly  preserved
+if  a  pattern  is  compiled on one host and then transferred to a host
+whose byte-order is different. (This seems a highly unlikely scenario.)
+MATCHING A PATTERN: THE TRADITIONAL FUNCTION
+int pcre_exec(const pcre *code, const pcre_extra *extra,
+const char *subject, int length, int startoffset,
+int options, int *ovector, int ovecsize);
+The  function pcre_exec() is called to match a subject string against a
+compiled pattern, which is passed in the code argument. If the  pattern
+has been studied, the result of the study should be passed in the extra
+argument. This function is the main matching facility of  the  library,
+and it operates in a Perl-like manner. For specialist use there is also
+an alternative matching function, which is described below in the  sec-
+tion about the pcre_dfa_exec() function.
+In  most applications, the pattern will have been compiled (and option-
+ally studied) in the same process that calls pcre_exec().  However,  it
+is possible to save compiled patterns and study data, and then use them
+later in different processes, possibly even on different hosts.  For  a
+discussion about this, see the pcreprecompile documentation.
+Here is an example of a simple call to pcre_exec():
+int rc;
+int ovector[30];
+rc = pcre_exec(
+re,             /* result of pcre_compile() */
+NULL,           /* we didn't study the pattern */
+"some string",  /* the subject string */
+11,             /* the length of the subject string */
+0,              /* start at offset 0 in the subject */
+0,              /* default options */
+ovector,        /* vector of integers for substring information */
+30);            /* number of elements (NOT size in bytes) */
+Extra data for pcre_exec()
+If  the  extra argument is not NULL, it must point to a pcre_extra data
+block. The pcre_study() function returns such a block (when it  doesn't
+return  NULL), but you can also create one for yourself, and pass addi-
+tional information in it. The pcre_extra block contains  the  following
+fields (not necessarily in this order):
+unsigned long int flags;
+void *study_data;
+unsigned long int match_limit;
+unsigned long int match_limit_recursion;
+void *callout_data;
+const unsigned char *tables;
+The  flags  field  is a bitmap that specifies which of the other fields
+are set. The flag bits are:
+PCRE_EXTRA_STUDY_DATA
+PCRE_EXTRA_MATCH_LIMIT
+PCRE_EXTRA_MATCH_LIMIT_RECURSION
+PCRE_EXTRA_CALLOUT_DATA
+PCRE_EXTRA_TABLES
+Other flag bits should be set to zero. The study_data field is  set  in
+the  pcre_extra  block  that is returned by pcre_study(), together with
+the appropriate flag bit. You should not set this yourself, but you may
+add  to  the  block by setting the other fields and their corresponding
+flag bits.
+The match_limit field provides a means of preventing PCRE from using up
+a  vast amount of resources when running patterns that are not going to
+match, but which have a very large number  of  possibilities  in  their
+search  trees.  The  classic  example  is  the  use of nested unlimited
+repeats.
+Internally, PCRE uses a function called match() which it calls  repeat-
+edly  (sometimes  recursively). The limit set by match_limit is imposed
+on the number of times this function is called during  a  match,  which
+has  the  effect  of  limiting the amount of backtracking that can take
+place. For patterns that are not anchored, the count restarts from zero
+for each position in the subject string.
+The  default  value  for  the  limit can be set when PCRE is built; the
+default default is 10 million, which handles all but the  most  extreme
+cases.  You  can  override  the  default by suppling pcre_exec() with a
+pcre_extra    block    in    which    match_limit    is    set,     and
+PCRE_EXTRA_MATCH_LIMIT  is  set  in  the  flags  field. If the limit is
+exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
+The match_limit_recursion field is similar to match_limit, but  instead
+of limiting the total number of times that match() is called, it limits
+the depth of recursion. The recursion depth is a  smaller  number  than
+the  total number of calls, because not all calls to match() are recur-
+sive.  This limit is of use only if it is set smaller than match_limit.
+Limiting  the  recursion  depth  limits the amount of stack that can be
+used, or, when PCRE has been compiled to use memory on the heap instead
+of the stack, the amount of heap memory that can be used.
+The  default  value  for  match_limit_recursion can be set when PCRE is
+built; the default default  is  the  same  value  as  the  default  for
+match_limit.  You can override the default by suppling pcre_exec() with
+a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
+PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
+limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
+The pcre_callout field is used in conjunction with the  "callout"  fea-
+ture, which is described in the pcrecallout documentation.
+The  tables  field  is  used  to  pass  a  character  tables pointer to
+pcre_exec(); this overrides the value that is stored with the  compiled
+pattern.  A  non-NULL value is stored with the compiled pattern only if
+custom tables were supplied to pcre_compile() via  its  tableptr  argu-
+ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
+PCRE's internal tables to be used. This facility is  helpful  when  re-
+using  patterns  that  have been saved after compiling with an external
+set of tables, because the external tables  might  be  at  a  different
+address  when  pcre_exec() is called. See the pcreprecompile documenta-
+tion for a discussion of saving compiled patterns for later use.
+Option bits for pcre_exec()
+The unused bits of the options argument for pcre_exec() must  be  zero.
+The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
+PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and
+PCRE_PARTIAL.
+PCRE_ANCHORED
+The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
+matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
+turned  out to be anchored by virtue of its contents, it cannot be made
+unachored at matching time.
+PCRE_BSR_ANYCRLF
+PCRE_BSR_UNICODE
+These options (which are mutually exclusive) control what the \R escape
+sequence  matches.  The choice is either to match only CR, LF, or CRLF,
+or to match any Unicode newline sequence. These  options  override  the
+choice that was made or defaulted when the pattern was compiled.
+PCRE_NEWLINE_CR
+PCRE_NEWLINE_LF
+PCRE_NEWLINE_CRLF
+PCRE_NEWLINE_ANYCRLF
+PCRE_NEWLINE_ANY
+These  options  override  the  newline  definition  that  was chosen or
+defaulted when the pattern was compiled. For details, see the  descrip-
+tion  of  pcre_compile()  above.  During  matching,  the newline choice
+affects the behaviour of the dot, circumflex,  and  dollar  metacharac-
+ters.  It may also alter the way the match position is advanced after a
+match failure for an unanchored pattern.
+When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is
+set,  and a match attempt for an unanchored pattern fails when the cur-
+rent position is at a  CRLF  sequence,  and  the  pattern  contains  no
+explicit  matches  for  CR  or  LF  characters,  the  match position is
+advanced by two characters instead of one, in other words, to after the
+CRLF.
+The above rule is a compromise that makes the most common cases work as
+expected. For example, if the  pattern  is  .+A  (and  the  PCRE_DOTALL
+option is not set), it does not match the string "\r\nA" because, after
+failing at the start, it skips both the CR and the LF before  retrying.
+However,  the  pattern  [\r\n]A does match that string, because it con-
+tains an explicit CR or LF reference, and so advances only by one char-
+acter after the first failure.
+An explicit match for CR of LF is either a literal appearance of one of
+those characters, or one of the \r or  \n  escape  sequences.  Implicit
+matches  such  as [^X] do not count, nor does \s (which includes CR and
+LF in the characters that it matches).
+Notwithstanding the above, anomalous effects may still occur when  CRLF
+is a valid newline sequence and explicit \r or \n escapes appear in the
+pattern.
+PCRE_NOTBOL
+This option specifies that first character of the subject string is not
+the  beginning  of  a  line, so the circumflex metacharacter should not
+match before it. Setting this without PCRE_MULTILINE (at compile  time)
+causes  circumflex  never to match. This option affects only the behav-
+iour of the circumflex metacharacter. It does not affect \A.
+PCRE_NOTEOL
+This option specifies that the end of the subject string is not the end
+of  a line, so the dollar metacharacter should not match it nor (except
+in multiline mode) a newline immediately before it. Setting this  with-
+out PCRE_MULTILINE (at compile time) causes dollar never to match. This
+option affects only the behaviour of the dollar metacharacter. It  does
+not affect \Z or \z.
+PCRE_NOTEMPTY
+An empty string is not considered to be a valid match if this option is
+set. If there are alternatives in the pattern, they are tried.  If  all
+the  alternatives  match  the empty string, the entire match fails. For
+example, if the pattern
+a?b?
+is applied to a string not beginning with "a" or "b",  it  matches  the
+empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
+match is not valid, so PCRE searches further into the string for occur-
+rences of "a" or "b".
+Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe-
+cial case of a pattern match of the empty  string  within  its  split()
+function,  and  when  using  the /g modifier. It is possible to emulate
+Perl's behaviour after matching a null string by first trying the match
+again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
+if that fails by advancing the starting offset (see below)  and  trying
+an ordinary match again. There is some code that demonstrates how to do
+this in the pcredemo.c sample program.
+PCRE_NO_UTF8_CHECK
+When PCRE_UTF8 is set at compile time, the validity of the subject as a
+UTF-8  string is automatically checked when pcre_exec() is subsequently
+called.  The value of startoffset is also checked  to  ensure  that  it
+points  to  the start of a UTF-8 character. There is a discussion about
+the validity of UTF-8 strings in the section on UTF-8  support  in  the
+main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
+pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con-
+tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
+If  you  already  know that your subject is valid, and you want to skip
+these   checks   for   performance   reasons,   you   can    set    the
+PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
+do this for the second and subsequent calls to pcre_exec() if  you  are
+making  repeated  calls  to  find  all  the matches in a single subject
+string. However, you should be  sure  that  the  value  of  startoffset
+points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
+set, the effect of passing an invalid UTF-8 string as a subject,  or  a
+value  of startoffset that does not point to the start of a UTF-8 char-
+acter, is undefined. Your program may crash.
+PCRE_PARTIAL
+This option turns on the  partial  matching  feature.  If  the  subject
+string  fails to match the pattern, but at some point during the match-
+ing process the end of the subject was reached (that  is,  the  subject
+partially  matches  the  pattern and the failure to match occurred only
+because there were not enough subject characters), pcre_exec()  returns
+PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
+used, there are restrictions on what may appear in the  pattern.  These
+are discussed in the pcrepartial documentation.
+The string to be matched by pcre_exec()
+The  subject string is passed to pcre_exec() as a pointer in subject, a
+length (in bytes) in length, and a starting byte offset in startoffset.
+In UTF-8 mode, the byte offset must point to the start of a UTF-8 char-
+acter. Unlike the pattern string, the subject may contain  binary  zero
+bytes.  When the starting offset is zero, the search for a match starts
+at the beginning of the subject, and this is by  far  the  most  common
+case.
+A  non-zero  starting offset is useful when searching for another match
+in the same subject by calling pcre_exec() again after a previous  suc-
+cess.   Setting  startoffset differs from just passing over a shortened
+string and setting PCRE_NOTBOL in the case of  a  pattern  that  begins
+with any kind of lookbehind. For example, consider the pattern
+\Biss\B
+which  finds  occurrences  of "iss" in the middle of words. (\B matches
+only if the current position in the subject is not  a  word  boundary.)
+When  applied  to the string "Mississipi" the first call to pcre_exec()
+finds the first occurrence. If pcre_exec() is called  again  with  just
+the  remainder  of  the  subject,  namely  "issipi", it does not match,
+because \B is always false at the start of the subject, which is deemed
+to  be  a  word  boundary. However, if pcre_exec() is passed the entire
+string again, but with startoffset set to 4, it finds the second occur-
+rence  of "iss" because it is able to look behind the starting point to
+discover that it is preceded by a letter.
+If a non-zero starting offset is passed when the pattern  is  anchored,
+one attempt to match at the given offset is made. This can only succeed
+if the pattern does not require the match to be at  the  start  of  the
+subject.
+How pcre_exec() returns captured substrings
+In  general, a pattern matches a certain portion of the subject, and in
+addition, further substrings from the subject  may  be  picked  out  by
+parts  of  the  pattern.  Following the usage in Jeffrey Friedl's book,
+this is called "capturing" in what follows, and the  phrase  "capturing
+subpattern"  is  used for a fragment of a pattern that picks out a sub-
+string. PCRE supports several other kinds of  parenthesized  subpattern
+that do not cause substrings to be captured.
+Captured substrings are returned to the caller via a vector of integers
+whose address is passed in ovector. The number of elements in the  vec-
+tor  is  passed in ovecsize, which must be a non-negative number. Note:
+this argument is NOT the size of ovector in bytes.
+The first two-thirds of the vector is used to pass back  captured  sub-
+strings,  each  substring using a pair of integers. The remaining third
+of the vector is used as workspace by pcre_exec() while  matching  cap-
+turing  subpatterns, and is not available for passing back information.
+The number passed in ovecsize should always be a multiple of three.  If
+it is not, it is rounded down.
+When  a  match  is successful, information about captured substrings is
+returned in pairs of integers, starting at the  beginning  of  ovector,
+and  continuing  up  to two-thirds of its length at the most. The first
+element of each pair is set to the byte offset of the  first  character
+in  a  substring, and the second is set to the byte offset of the first
+character after the end of a substring. Note: these values  are  always
+byte offsets, even in UTF-8 mode. They are not character counts.
+The  first  pair  of  integers, ovector[0] and ovector[1], identify the
+portion of the subject string matched by the entire pattern.  The  next
+pair  is  used for the first capturing subpattern, and so on. The value
+returned by pcre_exec() is one more than the highest numbered pair that
+has  been  set.  For example, if two substrings have been captured, the
+returned value is 3. If there are no capturing subpatterns, the  return
+value from a successful match is 1, indicating that just the first pair
+of offsets has been set.
+If a capturing subpattern is matched repeatedly, it is the last portion
+of the string that it matched that is returned.
+If  the vector is too small to hold all the captured substring offsets,
+it is used as far as possible (up to two-thirds of its length), and the
+function  returns  a value of zero. If the substring offsets are not of
+interest, pcre_exec() may be called with ovector  passed  as  NULL  and
+ovecsize  as zero. However, if the pattern contains back references and
+the ovector is not big enough to remember the related substrings,  PCRE
+has  to  get additional memory for use during matching. Thus it is usu-
+ally advisable to supply an ovector.
+The pcre_info() function can be used to find  out  how  many  capturing
+subpatterns  there  are  in  a  compiled pattern. The smallest size for
+ovector that will allow for n captured substrings, in addition  to  the
+offsets of the substring matched by the whole pattern, is (n+1)*3.
+It  is  possible for capturing subpattern number n+1 to match some part
+of the subject when subpattern n has not been used at all. For example,
+if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
+return from the function is 4, and subpatterns 1 and 3 are matched, but
+2  is  not.  When  this happens, both values in the offset pairs corre-
+sponding to unused subpatterns are set to -1.
+Offset values that correspond to unused subpatterns at the end  of  the
+expression  are  also  set  to  -1. For example, if the string "abc" is
+matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
+matched.  The  return  from the function is 2, because the highest used
+capturing subpattern number is 1. However, you can refer to the offsets
+for  the  second  and third capturing subpatterns if you wish (assuming
+the vector is large enough, of course).
+Some convenience functions are provided  for  extracting  the  captured
+substrings as separate strings. These are described below.
+Error return values from pcre_exec()
+If  pcre_exec()  fails, it returns a negative number. The following are
+defined in the header file:
+PCRE_ERROR_NOMATCH        (-1)
+The subject string did not match the pattern.
+PCRE_ERROR_NULL           (-2)
+Either code or subject was passed as NULL,  or  ovector  was  NULL  and
+ovecsize was not zero.
+PCRE_ERROR_BADOPTION      (-3)
+An unrecognized bit was set in the options argument.
+PCRE_ERROR_BADMAGIC       (-4)
+PCRE  stores a 4-byte "magic number" at the start of the compiled code,
+to catch the case when it is passed a junk pointer and to detect when a
+pattern that was compiled in an environment of one endianness is run in
+an environment with the other endianness. This is the error  that  PCRE
+gives when the magic number is not present.
+PCRE_ERROR_UNKNOWN_OPCODE (-5)
+While running the pattern match, an unknown item was encountered in the
+compiled pattern. This error could be caused by a bug  in  PCRE  or  by
+overwriting of the compiled pattern.
+PCRE_ERROR_NOMEMORY       (-6)
+If  a  pattern contains back references, but the ovector that is passed
+to pcre_exec() is not big enough to remember the referenced substrings,
+PCRE  gets  a  block of memory at the start of matching to use for this
+purpose. If the call via pcre_malloc() fails, this error is given.  The
+memory is automatically freed at the end of matching.
+PCRE_ERROR_NOSUBSTRING    (-7)
+This  error is used by the pcre_copy_substring(), pcre_get_substring(),
+and  pcre_get_substring_list()  functions  (see  below).  It  is  never
+returned by pcre_exec().
+PCRE_ERROR_MATCHLIMIT     (-8)
+The  backtracking  limit,  as  specified  by the match_limit field in a
+pcre_extra structure (or defaulted) was reached.  See  the  description
+above.
+PCRE_ERROR_CALLOUT        (-9)
+This error is never generated by pcre_exec() itself. It is provided for
+use by callout functions that want to yield a distinctive  error  code.
+See the pcrecallout documentation for details.
+PCRE_ERROR_BADUTF8        (-10)
+A  string  that contains an invalid UTF-8 byte sequence was passed as a
+subject.
+PCRE_ERROR_BADUTF8_OFFSET (-11)
+The UTF-8 byte sequence that was passed as a subject was valid, but the
+value  of startoffset did not point to the beginning of a UTF-8 charac-
+ter.
+PCRE_ERROR_PARTIAL        (-12)
+The subject string did not match, but it did match partially.  See  the
+pcrepartial documentation for details of partial matching.
+PCRE_ERROR_BADPARTIAL     (-13)
+The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
+items that are not supported for partial matching. See the  pcrepartial
+documentation for details of partial matching.
+PCRE_ERROR_INTERNAL       (-14)
+An  unexpected  internal error has occurred. This error could be caused
+by a bug in PCRE or by overwriting of the compiled pattern.
+PCRE_ERROR_BADCOUNT       (-15)
+This error is given if the value of the ovecsize argument is  negative.
+PCRE_ERROR_RECURSIONLIMIT (-21)
+The internal recursion limit, as specified by the match_limit_recursion
+field in a pcre_extra structure (or defaulted)  was  reached.  See  the
+description above.
+PCRE_ERROR_BADNEWLINE     (-23)
+An invalid combination of PCRE_NEWLINE_xxx options was given.
+Error numbers -16 to -20 and -22 are not used by pcre_exec().
+EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
+int pcre_copy_substring(const char *subject, int *ovector,
+int stringcount, int stringnumber, char *buffer,
+int buffersize);
+int pcre_get_substring(const char *subject, int *ovector,
+int stringcount, int stringnumber,
+const char **stringptr);
+int pcre_get_substring_list(const char *subject,
+int *ovector, int stringcount, const char ***listptr);
+Captured  substrings  can  be  accessed  directly  by using the offsets
+returned by pcre_exec() in  ovector.  For  convenience,  the  functions
+pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub-
+string_list() are provided for extracting captured substrings  as  new,
+separate,  zero-terminated strings. These functions identify substrings
+by number. The next section describes functions  for  extracting  named
+substrings.
+A  substring that contains a binary zero is correctly extracted and has
+a further zero added on the end, but the result is not, of course, a  C
+string.   However,  you  can  process such a string by referring to the
+length that is  returned  by  pcre_copy_substring()  and  pcre_get_sub-
+string().  Unfortunately, the interface to pcre_get_substring_list() is
+not adequate for handling strings containing binary zeros, because  the
+end of the final string is not independently indicated.
+The  first  three  arguments  are the same for all three of these func-
+tions: subject is the subject string that has  just  been  successfully
+matched, ovector is a pointer to the vector of integer offsets that was
+passed to pcre_exec(), and stringcount is the number of substrings that
+were  captured  by  the match, including the substring that matched the
+entire regular expression. This is the value returned by pcre_exec() if
+it  is greater than zero. If pcre_exec() returned zero, indicating that
+it ran out of space in ovector, the value passed as stringcount  should
+be the number of elements in the vector divided by three.
+The  functions pcre_copy_substring() and pcre_get_substring() extract a
+single substring, whose number is given as  stringnumber.  A  value  of
+zero  extracts  the  substring that matched the entire pattern, whereas
+higher values  extract  the  captured  substrings.  For  pcre_copy_sub-
+string(),  the  string  is  placed  in buffer, whose length is given by
+buffersize, while for pcre_get_substring() a new  block  of  memory  is
+obtained  via  pcre_malloc,  and its address is returned via stringptr.
+The yield of the function is the length of the  string,  not  including
+the terminating zero, or one of these error codes:
+PCRE_ERROR_NOMEMORY       (-6)
+The  buffer  was too small for pcre_copy_substring(), or the attempt to
+get memory failed for pcre_get_substring().
+PCRE_ERROR_NOSUBSTRING    (-7)
+There is no substring whose number is stringnumber.
+The pcre_get_substring_list()  function  extracts  all  available  sub-
+strings  and  builds  a list of pointers to them. All this is done in a
+single block of memory that is obtained via pcre_malloc. The address of
+the  memory  block  is returned via listptr, which is also the start of
+the list of string pointers. The end of the list is marked  by  a  NULL
+pointer.  The  yield  of  the function is zero if all went well, or the
+error code
+PCRE_ERROR_NOMEMORY       (-6)
+if the attempt to get the memory block failed.
+When any of these functions encounter a substring that is unset,  which
+can  happen  when  capturing subpattern number n+1 matches some part of
+the subject, but subpattern n has not been used at all, they return  an
+empty string. This can be distinguished from a genuine zero-length sub-
+string by inspecting the appropriate offset in ovector, which is  nega-
+tive for unset substrings.
+The  two convenience functions pcre_free_substring() and pcre_free_sub-
+string_list() can be used to free the memory  returned  by  a  previous
+call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec-
+tively. They do nothing more than  call  the  function  pointed  to  by
+pcre_free,  which  of course could be called directly from a C program.
+However, PCRE is used in some situations where it is linked via a  spe-
+cial   interface  to  another  programming  language  that  cannot  use
+pcre_free directly; it is for these cases that the functions  are  pro-
+vided.
+EXTRACTING CAPTURED SUBSTRINGS BY NAME
+int pcre_get_stringnumber(const pcre *code,
+const char *name);
+int pcre_copy_named_substring(const pcre *code,
+const char *subject, int *ovector,
+int stringcount, const char *stringname,
+char *buffer, int buffersize);
+int pcre_get_named_substring(const pcre *code,
+const char *subject, int *ovector,
+int stringcount, const char *stringname,
+const char **stringptr);
+To  extract a substring by name, you first have to find associated num-
+ber.  For example, for this pattern
+(a+)b(?<xxx>\d+)...
+the number of the subpattern called "xxx" is 2. If the name is known to
+be unique (PCRE_DUPNAMES was not set), you can find the number from the
+name by calling pcre_get_stringnumber(). The first argument is the com-
+piled pattern, and the second is the name. The yield of the function is
+the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if  there  is  no
+subpattern of that name.
+Given the number, you can extract the substring directly, or use one of
+the functions described in the previous section. For convenience, there
+are also two functions that do the whole job.
+Most    of    the    arguments   of   pcre_copy_named_substring()   and
+pcre_get_named_substring() are the same  as  those  for  the  similarly
+named  functions  that extract by number. As these are described in the
+previous section, they are not re-described here. There  are  just  two
+differences:
+First,  instead  of a substring number, a substring name is given. Sec-
+ond, there is an extra argument, given at the start, which is a pointer
+to  the compiled pattern. This is needed in order to gain access to the
+name-to-number translation table.
+These functions call pcre_get_stringnumber(), and if it succeeds,  they
+then  call  pcre_copy_substring() or pcre_get_substring(), as appropri-
+ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate  names,  the
+behaviour may not be what you want (see the next section).
+DUPLICATE SUBPATTERN NAMES
+int pcre_get_stringtable_entries(const pcre *code,
+const char *name, char **first, char **last);
+When  a  pattern  is  compiled with the PCRE_DUPNAMES option, names for
+subpatterns are not required to  be  unique.  Normally,  patterns  with
+duplicate  names  are such that in any one match, only one of the named
+subpatterns participates. An example is shown in the pcrepattern  docu-
+mentation.
+When    duplicates   are   present,   pcre_copy_named_substring()   and
+pcre_get_named_substring() return the first substring corresponding  to
+the  given  name  that  is set. If none are set, PCRE_ERROR_NOSUBSTRING
+(-7) is returned; no  data  is  returned.  The  pcre_get_stringnumber()
+function  returns one of the numbers that are associated with the name,
+but it is not defined which it is.
+If you want to get full details of all captured substrings for a  given
+name,  you  must  use  the pcre_get_stringtable_entries() function. The
+first argument is the compiled pattern, and the second is the name. The
+third  and  fourth  are  pointers to variables which are updated by the
+function. After it has run, they point to the first and last entries in
+the  name-to-number  table  for  the  given  name.  The function itself
+returns the length of each entry,  or  PCRE_ERROR_NOSUBSTRING  (-7)  if
+there  are none. The format of the table is described above in the sec-
+tion entitled Information about a  pattern.   Given  all  the  relevant
+entries  for the name, you can extract each of their numbers, and hence
+the captured data, if any.
+FINDING ALL POSSIBLE MATCHES
+The traditional matching function uses a  similar  algorithm  to  Perl,
+which stops when it finds the first match, starting at a given point in
+the subject. If you want to find all possible matches, or  the  longest
+possible  match,  consider using the alternative matching function (see
+below) instead. If you cannot use the alternative function,  but  still
+need  to  find all possible matches, you can kludge it up by making use
+of the callout facility, which is described in the pcrecallout documen-
+tation.
+What you have to do is to insert a callout right at the end of the pat-
+tern.  When your callout function is called, extract and save the  cur-
+rent  matched  substring.  Then  return  1, which forces pcre_exec() to
+backtrack and try other alternatives. Ultimately, when it runs  out  of
+matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
+MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
+int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
+const char *subject, int length, int startoffset,
+int options, int *ovector, int ovecsize,
+int *workspace, int wscount);
+The  function  pcre_dfa_exec()  is  called  to  match  a subject string
+against a compiled pattern, using a matching algorithm that  scans  the
+subject  string  just  once, and does not backtrack. This has different
+characteristics to the normal algorithm, and  is  not  compatible  with
+Perl.  Some  of the features of PCRE patterns are not supported. Never-
+theless, there are times when this kind of matching can be useful.  For
+a discussion of the two matching algorithms, see the pcrematching docu-
+mentation.
+The arguments for the pcre_dfa_exec() function  are  the  same  as  for
+pcre_exec(), plus two extras. The ovector argument is used in a differ-
+ent way, and this is described below. The other  common  arguments  are
+used  in  the  same way as for pcre_exec(), so their description is not
+repeated here.
+The two additional arguments provide workspace for  the  function.  The
+workspace  vector  should  contain at least 20 elements. It is used for
+keeping  track  of  multiple  paths  through  the  pattern  tree.  More
+workspace  will  be  needed for patterns and subjects where there are a
+lot of potential matches.
+Here is an example of a simple call to pcre_dfa_exec():
+int rc;
+int ovector[10];
+int wspace[20];
+rc = pcre_dfa_exec(
+re,             /* result of pcre_compile() */
+NULL,           /* we didn't study the pattern */
+"some string",  /* the subject string */
+11,             /* the length of the subject string */
+0,              /* start at offset 0 in the subject */
+0,              /* default options */
+ovector,        /* vector of integers for substring information */
+10,             /* number of elements (NOT size in bytes) */
+wspace,         /* working space vector */
+20);            /* number of elements (NOT size in bytes) */
+Option bits for pcre_dfa_exec()
+The unused bits of the options argument  for  pcre_dfa_exec()  must  be
+zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW-
+LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY,  PCRE_NO_UTF8_CHECK,
+PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
+three of these are the same as for pcre_exec(), so their description is
+not repeated here.
+PCRE_PARTIAL
+This  has  the  same general effect as it does for pcre_exec(), but the
+details  are  slightly  different.  When  PCRE_PARTIAL   is   set   for
+pcre_dfa_exec(),  the  return code PCRE_ERROR_NOMATCH is converted into
+PCRE_ERROR_PARTIAL if the end of the subject  is  reached,  there  have
+been no complete matches, but there is still at least one matching pos-
+sibility. The portion of the string that provided the partial match  is
+set as the first matching string.
+PCRE_DFA_SHORTEST
+Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
+stop as soon as it has found one match. Because of the way the alterna-
+tive  algorithm  works, this is necessarily the shortest possible match
+at the first possible matching point in the subject string.
+PCRE_DFA_RESTART
+When pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option,  and
+returns  a  partial  match, it is possible to call it again, with addi-
+tional subject characters, and have it continue with  the  same  match.
+The  PCRE_DFA_RESTART  option requests this action; when it is set, the
+workspace and wscount options must reference the same vector as  before
+because  data  about  the  match so far is left in them after a partial
+match. There is more discussion of this  facility  in  the  pcrepartial
+documentation.
+Successful returns from pcre_dfa_exec()
+When  pcre_dfa_exec()  succeeds, it may have matched more than one sub-
+string in the subject. Note, however, that all the matches from one run
+of  the  function  start  at the same point in the subject. The shorter
+matches are all initial substrings of the longer matches. For  example,
+if the pattern
+<.*>
+is matched against the string
+This is <something> <something else> <something further> no more
+the three matched strings are
+<something>
+<something> <something else>
+<something> <something else> <something further>
+On  success,  the  yield of the function is a number greater than zero,
+which is the number of matched substrings.  The  substrings  themselves
+are  returned  in  ovector. Each string uses two elements; the first is
+the offset to the start, and the second is the offset to  the  end.  In
+fact,  all  the  strings  have the same start offset. (Space could have
+been saved by giving this only once, but it was decided to retain  some
+compatibility  with  the  way pcre_exec() returns data, even though the
+meaning of the strings is different.)
+The strings are returned in reverse order of length; that is, the long-
+est  matching  string is given first. If there were too many matches to
+fit into ovector, the yield of the function is zero, and the vector  is
+filled with the longest matches.
+Error returns from pcre_dfa_exec()
+The  pcre_dfa_exec()  function returns a negative number when it fails.
+Many of the errors are the same  as  for  pcre_exec(),  and  these  are
+described  above.   There are in addition the following errors that are
+specific to pcre_dfa_exec():
+PCRE_ERROR_DFA_UITEM      (-16)
+This return is given if pcre_dfa_exec() encounters an item in the  pat-
+tern  that  it  does not support, for instance, the use of \C or a back
+reference.
+PCRE_ERROR_DFA_UCOND      (-17)
+This return is given if pcre_dfa_exec()  encounters  a  condition  item
+that  uses  a back reference for the condition, or a test for recursion
+in a specific group. These are not supported.
+PCRE_ERROR_DFA_UMLIMIT    (-18)
+This return is given if pcre_dfa_exec() is called with an  extra  block
+that contains a setting of the match_limit field. This is not supported
+(it is meaningless).
+PCRE_ERROR_DFA_WSSIZE     (-19)
+This return is given if  pcre_dfa_exec()  runs  out  of  space  in  the
+workspace vector.
+PCRE_ERROR_DFA_RECURSE    (-20)
+When  a  recursive subpattern is processed, the matching function calls
+itself recursively, using private vectors for  ovector  and  workspace.
+This  error  is  given  if  the output vector is not large enough. This
+should be extremely rare, as a vector of size 1000 is used.
+SEE ALSO
+pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), pcrematching(3),  pcrepar-
+tial(3),  pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 24 August 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCRECALLOUT(3)                                                  PCRECALLOUT(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE CALLOUTS
+int (*pcre_callout)(pcre_callout_block *);
+PCRE provides a feature called "callout", which is a means of temporar-
+ily passing control to the caller of PCRE  in  the  middle  of  pattern
+matching.  The  caller of PCRE provides an external function by putting
+its entry point in the global variable pcre_callout. By  default,  this
+variable contains NULL, which disables all calling out.
+Within  a  regular  expression,  (?C) indicates the points at which the
+external function is to be called.  Different  callout  points  can  be
+identified  by  putting  a number less than 256 after the letter C. The
+default value is zero.  For  example,  this  pattern  has  two  callout
+points:
+(?C1)abc(?C2)def
+If  the  PCRE_AUTO_CALLOUT  option  bit  is  set when pcre_compile() is
+called, PCRE automatically  inserts  callouts,  all  with  number  255,
+before  each  item in the pattern. For example, if PCRE_AUTO_CALLOUT is
+used with the pattern
+A(\d{2}|--)
+it is processed as if it were
+(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
+Notice that there is a callout before and after  each  parenthesis  and
+alternation  bar.  Automatic  callouts  can  be  used  for tracking the
+progress of pattern matching. The pcretest command has an  option  that
+sets  automatic callouts; when it is used, the output indicates how the
+pattern is matched. This is useful information when you are  trying  to
+optimize the performance of a particular pattern.
+MISSING CALLOUTS
+You  should  be  aware  that,  because of optimizations in the way PCRE
+matches patterns, callouts sometimes do not happen. For example, if the
+pattern is
+ab(?C4)cd
+PCRE knows that any matching string must contain the letter "d". If the
+subject string is "abyz", the lack of "d" means that  matching  doesn't
+ever  start,  and  the  callout is never reached. However, with "abyd",
+though the result is still no match, the callout is obeyed.
+THE CALLOUT INTERFACE
+During matching, when PCRE reaches a callout point, the external  func-
+tion  defined by pcre_callout is called (if it is set). This applies to
+both the pcre_exec() and the pcre_dfa_exec()  matching  functions.  The
+only  argument  to  the callout function is a pointer to a pcre_callout
+block. This structure contains the following fields:
+int          version;
+int          callout_number;
+int         *offset_vector;
+const char  *subject;
+int          subject_length;
+int          start_match;
+int          current_position;
+int          capture_top;
+int          capture_last;
+void        *callout_data;
+int          pattern_position;
+int          next_item_length;
+The version field is an integer containing the version  number  of  the
+block  format. The initial version was 0; the current version is 1. The
+version number will change again in future  if  additional  fields  are
+added, but the intention is never to remove any of the existing fields.
+The callout_number field contains the number of the  callout,  as  com-
+piled  into  the pattern (that is, the number after ?C for manual call-
+outs, and 255 for automatically generated callouts).
+The offset_vector field is a pointer to the vector of offsets that  was
+passed   by   the   caller  to  pcre_exec()  or  pcre_dfa_exec().  When
+pcre_exec() is used, the contents can be inspected in order to  extract
+substrings  that  have  been  matched  so  far,  in the same way as for
+extracting substrings after a match has completed. For  pcre_dfa_exec()
+this field is not useful.
+The subject and subject_length fields contain copies of the values that
+were passed to pcre_exec().
+The start_match field normally contains the offset within  the  subject
+at  which  the  current  match  attempt started. However, if the escape
+sequence \K has been encountered, this value is changed to reflect  the
+modified  starting  point.  If the pattern is not anchored, the callout
+function may be called several times from the same point in the pattern
+for different starting points in the subject.
+The  current_position  field  contains the offset within the subject of
+the current match pointer.
+When the pcre_exec() function is used, the capture_top  field  contains
+one  more than the number of the highest numbered captured substring so
+far. If no substrings have been captured, the value of  capture_top  is
+one.  This  is always the case when pcre_dfa_exec() is used, because it
+does not support captured substrings.
+The capture_last field contains the number of the  most  recently  cap-
+tured  substring. If no substrings have been captured, its value is -1.
+This is always the case when pcre_dfa_exec() is used.
+The callout_data field contains a value that is passed  to  pcre_exec()
+or  pcre_dfa_exec() specifically so that it can be passed back in call-
+outs. It is passed in the pcre_callout field  of  the  pcre_extra  data
+structure.  If  no such data was passed, the value of callout_data in a
+pcre_callout block is NULL. There is a description  of  the  pcre_extra
+structure in the pcreapi documentation.
+The  pattern_position field is present from version 1 of the pcre_call-
+out structure. It contains the offset to the next item to be matched in
+the pattern string.
+The  next_item_length field is present from version 1 of the pcre_call-
+out structure. It contains the length of the next item to be matched in
+the  pattern  string. When the callout immediately precedes an alterna-
+tion bar, a closing parenthesis, or the end of the pattern, the  length
+is  zero.  When the callout precedes an opening parenthesis, the length
+is that of the entire subpattern.
+The pattern_position and next_item_length fields are intended  to  help
+in  distinguishing between different automatic callouts, which all have
+the same callout number. However, they are set for all callouts.
+RETURN VALUES
+The external callout function returns an integer to PCRE. If the  value
+is  zero,  matching  proceeds  as  normal. If the value is greater than
+zero, matching fails at the current point, but  the  testing  of  other
+matching possibilities goes ahead, just as if a lookahead assertion had
+failed. If the value is less than zero, the  match  is  abandoned,  and
+pcre_exec() (or pcre_dfa_exec()) returns the negative value.
+Negative   values   should   normally   be   chosen  from  the  set  of
+PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan-
+dard  "no  match"  failure.   The  error  number  PCRE_ERROR_CALLOUT is
+reserved for use by callout functions; it will never be  used  by  PCRE
+itself.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 29 May 2007
+Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+PCRECOMPAT(3)                                                    PCRECOMPAT(3)
+NAME
+PCRE - Perl-compatible regular expressions
+DIFFERENCES BETWEEN PCRE AND PERL
+This  document describes the differences in the ways that PCRE and Perl
+handle regular expressions. The differences described here  are  mainly
+with  respect  to  Perl 5.8, though PCRE versions 7.0 and later contain
+some features that are expected to be in the forthcoming Perl 5.10.
+1. PCRE has only a subset of Perl's UTF-8 and Unicode support.  Details
+of  what  it does have are given in the section on UTF-8 support in the
+main pcre page.
+2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
+permits  them,  but they do not mean what you might think. For example,
+(?!a){3} does not assert that the next three characters are not "a". It
+just asserts that the next character is not "a" three times.
+3.  Capturing  subpatterns  that occur inside negative lookahead asser-
+tions are counted, but their entries in the offsets  vector  are  never
+set.  Perl sets its numerical variables from any such patterns that are
+matched before the assertion fails to match something (thereby succeed-
+ing),  but  only  if the negative lookahead assertion contains just one
+branch.
+4. Though binary zero characters are supported in the  subject  string,
+they are not allowed in a pattern string because it is passed as a nor-
+mal C string, terminated by zero. The escape sequence \0 can be used in
+the pattern to represent a binary zero.
+5.  The  following Perl escape sequences are not supported: \l, \u, \L,
+\U, and \N. In fact these are implemented by Perl's general string-han-
+dling  and are not part of its pattern matching engine. If any of these
+are encountered by PCRE, an error is generated.
+6. The Perl escape sequences \p, \P, and \X are supported only if  PCRE
+is  built  with Unicode character property support. The properties that
+can be tested with \p and \P are limited to the general category  prop-
+erties  such  as  Lu and Nd, script names such as Greek or Han, and the
+derived properties Any and L&.
+7. PCRE does support the \Q...\E escape for quoting substrings. Charac-
+ters  in  between  are  treated as literals. This is slightly different
+from Perl in that $ and @ are  also  handled  as  literals  inside  the
+quotes.  In Perl, they cause variable interpolation (but of course PCRE
+does not have variables). Note the following examples:
+Pattern            PCRE matches      Perl matches
+\Qabc$xyz\E        abc$xyz           abc followed by the
+contents of $xyz
+\Qabc\$xyz\E       abc\$xyz          abc\$xyz
+\Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
+The \Q...\E sequence is recognized both inside  and  outside  character
+classes.
+8. Fairly obviously, PCRE does not support the (?{code}) and (??{code})
+constructions. However, there is support for recursive  patterns.  This
+is  not available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE
+"callout" feature allows an external function to be called during  pat-
+tern matching. See the pcrecallout documentation for details.
+9.  Subpatterns  that  are  called  recursively or as "subroutines" are
+always treated as atomic groups in  PCRE.  This  is  like  Python,  but
+unlike Perl.
+10.  There are some differences that are concerned with the settings of
+captured strings when part of  a  pattern  is  repeated.  For  example,
+matching  "aba"  against  the  pattern  /^(a(b)?)+$/  in Perl leaves $2
+unset, but in PCRE it is set to "b".
+11.  PCRE  does  support  Perl  5.10's  backtracking  verbs  (*ACCEPT),
+(*FAIL),  (*F),  (*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in
+the forms without an  argument.  PCRE  does  not  support  (*MARK).  If
+(*ACCEPT)  is within capturing parentheses, PCRE does not set that cap-
+ture group; this is different to Perl.
+12. PCRE provides some extensions to the Perl regular expression facil-
+ities.   Perl  5.10  will  include new features that are not in earlier
+versions, some of which (such as named parentheses) have been  in  PCRE
+for some time. This list is with respect to Perl 5.10:
+(a)  Although  lookbehind  assertions  must match fixed length strings,
+each alternative branch of a lookbehind assertion can match a different
+length of string. Perl requires them all to have the same length.
+(b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
+meta-character matches only at the very end of the string.
+(c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
+cial meaning is faulted. Otherwise, like Perl, the backslash is quietly
+ignored.  (Perl can be made to issue a warning.)
+(d) If PCRE_UNGREEDY is set, the greediness of the  repetition  quanti-
+fiers is inverted, that is, by default they are not greedy, but if fol-
+lowed by a question mark they are.
+(e) PCRE_ANCHORED can be used at matching time to force a pattern to be
+tried only at the first matching position in the subject string.
+(f)  The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP-
+TURE options for pcre_exec() have no Perl equivalents.
+(g) The \R escape sequence can be restricted to match only CR,  LF,  or
+CRLF by the PCRE_BSR_ANYCRLF option.
+(h) The callout facility is PCRE-specific.
+(i) The partial matching facility is PCRE-specific.
+(j) Patterns compiled by PCRE can be saved and re-used at a later time,
+even on different hosts that have the other endianness.
+(k) The alternative matching function (pcre_dfa_exec())  matches  in  a
+different way and is not Perl-compatible.
+(l)  PCRE  recognizes some special sequences such as (*CR) at the start
+of a pattern that set overall options that cannot be changed within the
+pattern.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 11 September 2007
+Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+PCREPATTERN(3)                                                  PCREPATTERN(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE REGULAR EXPRESSION DETAILS
+The  syntax and semantics of the regular expressions that are supported
+by PCRE are described in detail below. There is a quick-reference  syn-
+tax summary in the pcresyntax page. PCRE tries to match Perl syntax and
+semantics as closely as it can. PCRE  also  supports  some  alternative
+regular  expression  syntax (which does not conflict with the Perl syn-
+tax) in order to provide some compatibility with regular expressions in
+Python, .NET, and Oniguruma.
+Perl's  regular expressions are described in its own documentation, and
+regular expressions in general are covered in a number of  books,  some
+of  which  have  copious  examples. Jeffrey Friedl's "Mastering Regular
+Expressions", published by  O'Reilly,  covers  regular  expressions  in
+great  detail.  This  description  of  PCRE's  regular  expressions  is
+intended as reference material.
+The original operation of PCRE was on strings of  one-byte  characters.
+However,  there is now also support for UTF-8 character strings. To use
+this, you must build PCRE to  include  UTF-8  support,  and  then  call
+pcre_compile()  with  the  PCRE_UTF8  option.  How this affects pattern
+matching is mentioned in several places below. There is also a  summary
+of  UTF-8  features  in  the  section on UTF-8 support in the main pcre
+page.
+The remainder of this document discusses the  patterns  that  are  sup-
+ported  by  PCRE when its main matching function, pcre_exec(), is used.
+From  release  6.0,   PCRE   offers   a   second   matching   function,
+pcre_dfa_exec(),  which matches using a different algorithm that is not
+Perl-compatible. Some of the features discussed below are not available
+when  pcre_dfa_exec()  is used. The advantages and disadvantages of the
+alternative function, and how it differs from the normal function,  are
+discussed in the pcrematching page.
+NEWLINE CONVENTIONS
+PCRE  supports five different conventions for indicating line breaks in
+strings: a single CR (carriage return) character, a  single  LF  (line-
+feed) character, the two-character sequence CRLF, any of the three pre-
+ceding, or any Unicode newline sequence. The pcreapi page  has  further
+discussion  about newlines, and shows how to set the newline convention
+in the options arguments for the compiling and matching functions.
+It is also possible to specify a newline convention by starting a  pat-
+tern string with one of the following five sequences:
+(*CR)        carriage return
+(*LF)        linefeed
+(*CRLF)      carriage return, followed by linefeed
+(*ANYCRLF)   any of the three above
+(*ANY)       all Unicode newline sequences
+These override the default and the options given to pcre_compile(). For
+example, on a Unix system where LF is the default newline sequence, the
+pattern
+(*CR)a.b
+changes the convention to CR. That pattern matches "a\nb" because LF is
+no longer a newline. Note that these special settings,  which  are  not
+Perl-compatible,  are  recognized  only at the very start of a pattern,
+and that they must be in upper case.  If  more  than  one  of  them  is
+present, the last one is used.
+The  newline  convention  does  not  affect what the \R escape sequence
+matches. By default, this is any Unicode  newline  sequence,  for  Perl
+compatibility.  However, this can be changed; see the description of \R
+in the section entitled "Newline sequences" below. A change of \R  set-
+ting can be combined with a change of newline convention.
+CHARACTERS AND METACHARACTERS
+A  regular  expression  is  a pattern that is matched against a subject
+string from left to right. Most characters stand for  themselves  in  a
+pattern,  and  match  the corresponding characters in the subject. As a
+trivial example, the pattern
+The quick brown fox
+matches a portion of a subject string that is identical to itself. When
+caseless  matching is specified (the PCRE_CASELESS option), letters are
+matched independently of case. In UTF-8 mode, PCRE  always  understands
+the  concept  of case for characters whose values are less than 128, so
+caseless matching is always possible. For characters with  higher  val-
+ues,  the concept of case is supported if PCRE is compiled with Unicode
+property support, but not otherwise.   If  you  want  to  use  caseless
+matching  for  characters  128  and above, you must ensure that PCRE is
+compiled with Unicode property support as well as with UTF-8 support.
+The power of regular expressions comes  from  the  ability  to  include
+alternatives  and  repetitions in the pattern. These are encoded in the
+pattern by the use of metacharacters, which do not stand for themselves
+but instead are interpreted in some special way.
+There  are  two different sets of metacharacters: those that are recog-
+nized anywhere in the pattern except within square brackets, and  those
+that  are  recognized  within square brackets. Outside square brackets,
+the metacharacters are as follows:
+\      general escape character with several uses
+^      assert start of string (or line, in multiline mode)
+$      assert end of string (or line, in multiline mode)
+.      match any character except newline (by default)
+[      start character class definition
+|      start of alternative branch
+(      start subpattern
+)      end subpattern
+?      extends the meaning of (
+also 0 or 1 quantifier
+also quantifier minimizer
+*      0 or more quantifier
++      1 or more quantifier
+also "possessive quantifier"
+{      start min/max quantifier
+Part of a pattern that is in square brackets  is  called  a  "character
+class". In a character class the only metacharacters are:
+\      general escape character
+^      negate the class, but only if the first character
+-      indicates character range
+[      POSIX character class (only if followed by POSIX
+syntax)
+]      terminates the character class
+The  following sections describe the use of each of the metacharacters.
+BACKSLASH
+The backslash character has several uses. Firstly, if it is followed by
+a  non-alphanumeric  character,  it takes away any special meaning that
+character may have. This  use  of  backslash  as  an  escape  character
+applies both inside and outside character classes.
+For  example,  if  you want to match a * character, you write \* in the
+pattern.  This escaping action applies whether  or  not  the  following
+character  would  otherwise be interpreted as a metacharacter, so it is
+always safe to precede a non-alphanumeric  with  backslash  to  specify
+that  it stands for itself. In particular, if you want to match a back-
+slash, you write \\.
+If a pattern is compiled with the PCRE_EXTENDED option,  whitespace  in
+the  pattern (other than in a character class) and characters between a
+# outside a character class and the next newline are ignored. An escap-
+ing  backslash  can  be  used to include a whitespace or # character as
+part of the pattern.
+If you want to remove the special meaning from a  sequence  of  charac-
+ters,  you can do so by putting them between \Q and \E. This is differ-
+ent from Perl in that $ and  @  are  handled  as  literals  in  \Q...\E
+sequences  in  PCRE, whereas in Perl, $ and @ cause variable interpola-
+tion. Note the following examples:
+Pattern            PCRE matches   Perl matches
+\Qabc$xyz\E        abc$xyz        abc followed by the
+contents of $xyz
+\Qabc\$xyz\E       abc\$xyz       abc\$xyz
+\Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
+The \Q...\E sequence is recognized both inside  and  outside  character
+classes.
+Non-printing characters
+A second use of backslash provides a way of encoding non-printing char-
+acters in patterns in a visible manner. There is no restriction on  the
+appearance  of non-printing characters, apart from the binary zero that
+terminates a pattern, but when a pattern  is  being  prepared  by  text
+editing,  it  is  usually  easier  to  use  one of the following escape
+sequences than the binary character it represents:
+\a        alarm, that is, the BEL character (hex 07)
+\cx       "control-x", where x is any character
+\e        escape (hex 1B)
+\f        formfeed (hex 0C)
+\n        linefeed (hex 0A)
+\r        carriage return (hex 0D)
+\t        tab (hex 09)
+\ddd      character with octal code ddd, or backreference
+\xhh      character with hex code hh
+\x{hhh..} character with hex code hhh..
+The precise effect of \cx is as follows: if x is a lower  case  letter,
+it  is converted to upper case. Then bit 6 of the character (hex 40) is
+inverted.  Thus \cz becomes hex 1A, but \c{ becomes hex 3B,  while  \c;
+becomes hex 7B.
+After  \x, from zero to two hexadecimal digits are read (letters can be
+in upper or lower case). Any number of hexadecimal  digits  may  appear
+between  \x{  and  },  but the value of the character code must be less
+than 256 in non-UTF-8 mode, and less than 2**31 in UTF-8 mode. That is,
+the  maximum value in hexadecimal is 7FFFFFFF. Note that this is bigger
+than the largest Unicode code point, which is 10FFFF.
+If characters other than hexadecimal digits appear between \x{  and  },
+or if there is no terminating }, this form of escape is not recognized.
+Instead, the initial \x will be  interpreted  as  a  basic  hexadecimal
+escape,  with  no  following  digits, giving a character whose value is
+zero.
+Characters whose value is less than 256 can be defined by either of the
+two  syntaxes  for  \x. There is no difference in the way they are han-
+dled. For example, \xdc is exactly the same as \x{dc}.
+After \0 up to two further octal digits are read. If  there  are  fewer
+than  two  digits,  just  those  that  are  present  are used. Thus the
+sequence \0\x\07 specifies two binary zeros followed by a BEL character
+(code  value 7). Make sure you supply two digits after the initial zero
+if the pattern character that follows is itself an octal digit.
+The handling of a backslash followed by a digit other than 0 is compli-
+cated.  Outside a character class, PCRE reads it and any following dig-
+its as a decimal number. If the number is less than  10,  or  if  there
+have been at least that many previous capturing left parentheses in the
+expression, the entire  sequence  is  taken  as  a  back  reference.  A
+description  of how this works is given later, following the discussion
+of parenthesized subpatterns.
+Inside a character class, or if the decimal number is  greater  than  9
+and  there have not been that many capturing subpatterns, PCRE re-reads
+up to three octal digits following the backslash, and uses them to gen-
+erate  a data character. Any subsequent digits stand for themselves. In
+non-UTF-8 mode, the value of a character specified  in  octal  must  be
+less  than  \400.  In  UTF-8 mode, values up to \777 are permitted. For
+example:
+\040   is another way of writing a space
+\40    is the same, provided there are fewer than 40
+previous capturing subpatterns
+\7     is always a back reference
+\11    might be a back reference, or another way of
+writing a tab
+\011   is always a tab
+\0113  is a tab followed by the character "3"
+\113   might be a back reference, otherwise the
+character with octal code 113
+\377   might be a back reference, otherwise
+the byte consisting entirely of 1 bits
+\81    is either a back reference, or a binary zero
+followed by the two characters "8" and "1"
+Note that octal values of 100 or greater must not be  introduced  by  a
+leading zero, because no more than three octal digits are ever read.
+All the sequences that define a single character value can be used both
+inside and outside character classes. In addition, inside  a  character
+class,  the  sequence \b is interpreted as the backspace character (hex
+08), and the sequences \R and \X are interpreted as the characters  "R"
+and  "X", respectively. Outside a character class, these sequences have
+different meanings (see below).
+Absolute and relative back references
+The sequence \g followed by an unsigned or a negative  number,  option-
+ally  enclosed  in braces, is an absolute or relative back reference. A
+named back reference can be coded as \g{name}. Back references are dis-
+cussed later, following the discussion of parenthesized subpatterns.
+Absolute and relative subroutine calls
+For  compatibility with Oniguruma, the non-Perl syntax \g followed by a
+name or a number enclosed either in angle brackets or single quotes, is
+an  alternative  syntax for referencing a subpattern as a "subroutine".
+Details are discussed later.   Note  that  \g{...}  (Perl  syntax)  and
+\g<...>  (Oniguruma  syntax)  are  not synonymous. The former is a back
+reference; the latter is a subroutine call.
+Generic character types
+Another use of backslash is for specifying generic character types. The
+following are always recognized:
+\d     any decimal digit
+\D     any character that is not a decimal digit
+\h     any horizontal whitespace character
+\H     any character that is not a horizontal whitespace character
+\s     any whitespace character
+\S     any character that is not a whitespace character
+\v     any vertical whitespace character
+\V     any character that is not a vertical whitespace character
+\w     any "word" character
+\W     any "non-word" character
+Each pair of escape sequences partitions the complete set of characters
+into two disjoint sets. Any given character matches one, and only  one,
+of each pair.
+These character type sequences can appear both inside and outside char-
+acter classes. They each match one character of the  appropriate  type.
+If  the current matching point is at the end of the subject string, all
+of them fail, since there is no character to match.
+For compatibility with Perl, \s does not match the VT  character  (code
+11).   This makes it different from the the POSIX "space" class. The \s
+characters are HT (9), LF (10), FF (12), CR (13), and  space  (32).  If
+"use locale;" is included in a Perl script, \s may match the VT charac-
+ter. In PCRE, it never does.
+In UTF-8 mode, characters with values greater than 128 never match  \d,
+\s, or \w, and always match \D, \S, and \W. This is true even when Uni-
+code character property support is available.  These  sequences  retain
+their original meanings from before UTF-8 support was available, mainly
+for efficiency reasons.
+The sequences \h, \H, \v, and \V are Perl 5.10 features. In contrast to
+the  other  sequences, these do match certain high-valued codepoints in
+UTF-8 mode.  The horizontal space characters are:
+U+0009     Horizontal tab
+U+0020     Space
+U+00A0     Non-break space
+U+1680     Ogham space mark
+U+180E     Mongolian vowel separator
+U+2000     En quad
+U+2001     Em quad
+U+2002     En space
+U+2003     Em space
+U+2004     Three-per-em space
+U+2005     Four-per-em space
+U+2006     Six-per-em space
+U+2007     Figure space
+U+2008     Punctuation space
+U+2009     Thin space
+U+200A     Hair space
+U+202F     Narrow no-break space
+U+205F     Medium mathematical space
+U+3000     Ideographic space
+The vertical space characters are:
+U+000A     Linefeed
+U+000B     Vertical tab
+U+000C     Formfeed
+U+000D     Carriage return
+U+0085     Next line
+U+2028     Line separator
+U+2029     Paragraph separator
+A "word" character is an underscore or any character less than 256 that
+is  a  letter  or  digit.  The definition of letters and digits is con-
+trolled by PCRE's low-valued character tables, and may vary if  locale-
+specific  matching is taking place (see "Locale support" in the pcreapi
+page). For example, in a French locale such  as  "fr_FR"  in  Unix-like
+systems,  or "french" in Windows, some character codes greater than 128
+are used for accented letters, and these are matched by \w. The use  of
+locales with Unicode is discouraged.
+Newline sequences
+Outside  a  character class, by default, the escape sequence \R matches
+any Unicode newline sequence. This is a Perl 5.10 feature. In non-UTF-8
+mode \R is equivalent to the following:
+(?>\r\n|\n|\x0b|\f|\r|\x85)
+This  is  an  example  of an "atomic group", details of which are given
+below.  This particular group matches either the two-character sequence
+CR  followed  by  LF,  or  one  of  the single characters LF (linefeed,
+U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
+return, U+000D), or NEL (next line, U+0085). The two-character sequence
+is treated as a single unit that cannot be split.
+In UTF-8 mode, two additional characters whose codepoints  are  greater
+than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa-
+rator, U+2029).  Unicode character property support is not  needed  for
+these characters to be recognized.
+It is possible to restrict \R to match only CR, LF, or CRLF (instead of
+the complete set  of  Unicode  line  endings)  by  setting  the  option
+PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
+(BSR is an abbrevation for "backslash R".) This can be made the default
+when  PCRE  is  built;  if this is the case, the other behaviour can be
+requested via the PCRE_BSR_UNICODE option.   It  is  also  possible  to
+specify  these  settings  by  starting a pattern string with one of the
+following sequences:
+(*BSR_ANYCRLF)   CR, LF, or CRLF only
+(*BSR_UNICODE)   any Unicode newline sequence
+These override the default and the options given to pcre_compile(), but
+they can be overridden by options given to pcre_exec(). Note that these
+special settings, which are not Perl-compatible, are recognized only at
+the  very  start  of a pattern, and that they must be in upper case. If
+more than one of them is present, the last one is  used.  They  can  be
+combined  with  a  change of newline convention, for example, a pattern
+can start with:
+(*ANY)(*BSR_ANYCRLF)
+Inside a character class, \R matches the letter "R".
+Unicode character properties
+When PCRE is built with Unicode character property support, three addi-
+tional  escape sequences that match characters with specific properties
+are available.  When not in UTF-8 mode, these sequences are  of  course
+limited  to  testing characters whose codepoints are less than 256, but
+they do work in this mode.  The extra escape sequences are:
+\p{xx}   a character with the xx property
+\P{xx}   a character without the xx property
+\X       an extended Unicode sequence
+The property names represented by xx above are limited to  the  Unicode
+script names, the general category properties, and "Any", which matches
+any character (including newline). Other properties such as "InMusical-
+Symbols"  are  not  currently supported by PCRE. Note that \P{Any} does
+not match any characters, so always causes a match failure.
+Sets of Unicode characters are defined as belonging to certain scripts.
+A  character from one of these sets can be matched using a script name.
+For example:
+\p{Greek}
+\P{Han}
+Those that are not part of an identified script are lumped together  as
+"Common". The current list of scripts is:
+Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
+Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
+Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
+Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
+gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
+Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
+Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
+Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
+Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
+Each  character has exactly one general category property, specified by
+a two-letter abbreviation. For compatibility with Perl, negation can be
+specified  by  including a circumflex between the opening brace and the
+property name. For example, \p{^Lu} is the same as \P{Lu}.
+If only one letter is specified with \p or \P, it includes all the gen-
+eral  category properties that start with that letter. In this case, in
+the absence of negation, the curly brackets in the escape sequence  are
+optional; these two examples have the same effect:
+\p{L}
+\pL
+The following general category property codes are supported:
+C     Other
+Cc    Control
+Cf    Format
+Cn    Unassigned
+Co    Private use
+Cs    Surrogate
+L     Letter
+Ll    Lower case letter
+Lm    Modifier letter
+Lo    Other letter
+Lt    Title case letter
+Lu    Upper case letter
+M     Mark
+Mc    Spacing mark
+Me    Enclosing mark
+Mn    Non-spacing mark
+N     Number
+Nd    Decimal number
+Nl    Letter number
+No    Other number
+P     Punctuation
+Pc    Connector punctuation
+Pd    Dash punctuation
+Pe    Close punctuation
+Pf    Final punctuation
+Pi    Initial punctuation
+Po    Other punctuation
+Ps    Open punctuation
+S     Symbol
+Sc    Currency symbol
+Sk    Modifier symbol
+Sm    Mathematical symbol
+So    Other symbol
+Z     Separator
+Zl    Line separator
+Zp    Paragraph separator
+Zs    Space separator
+The  special property L& is also supported: it matches a character that
+has the Lu, Ll, or Lt property, in other words, a letter  that  is  not
+classified as a modifier or "other".
+The  Cs  (Surrogate)  property  applies only to characters in the range
+U+D800 to U+DFFF. Such characters are not valid in UTF-8  strings  (see
+RFC 3629) and so cannot be tested by PCRE, unless UTF-8 validity check-
+ing has been turned off (see the discussion  of  PCRE_NO_UTF8_CHECK  in
+the pcreapi page).
+The  long  synonyms  for  these  properties that Perl supports (such as
+\p{Letter}) are not supported by PCRE, nor is it  permitted  to  prefix
+any of these properties with "Is".
+No character that is in the Unicode table has the Cn (unassigned) prop-
+erty.  Instead, this property is assumed for any code point that is not
+in the Unicode table.
+Specifying  caseless  matching  does not affect these escape sequences.
+For example, \p{Lu} always matches only upper case letters.
+The \X escape matches any number of Unicode  characters  that  form  an
+extended Unicode sequence. \X is equivalent to
+(?>\PM\pM*)
+That  is,  it matches a character without the "mark" property, followed
+by zero or more characters with the "mark"  property,  and  treats  the
+sequence  as  an  atomic group (see below).  Characters with the "mark"
+property are typically accents that  affect  the  preceding  character.
+None  of  them  have  codepoints less than 256, so in non-UTF-8 mode \X
+matches any one character.
+Matching characters by Unicode property is not fast, because  PCRE  has
+to  search  a  structure  that  contains data for over fifteen thousand
+characters. That is why the traditional escape sequences such as \d and
+\w do not use Unicode properties in PCRE.
+Resetting the match start
+The escape sequence \K, which is a Perl 5.10 feature, causes any previ-
+ously matched characters not  to  be  included  in  the  final  matched
+sequence. For example, the pattern:
+foo\Kbar
+matches  "foobar",  but reports that it has matched "bar". This feature
+is similar to a lookbehind assertion (described  below).   However,  in
+this  case, the part of the subject before the real match does not have
+to be of fixed length, as lookbehind assertions do. The use of \K  does
+not  interfere  with  the setting of captured substrings.  For example,
+when the pattern
+(foo)\Kbar
+matches "foobar", the first substring is still set to "foo".
+Simple assertions
+The final use of backslash is for certain simple assertions. An  asser-
+tion  specifies a condition that has to be met at a particular point in
+a match, without consuming any characters from the subject string.  The
+use  of subpatterns for more complicated assertions is described below.
+The backslashed assertions are:
+\b     matches at a word boundary
+\B     matches when not at a word boundary
+\A     matches at the start of the subject
+\Z     matches at the end of the subject
+also matches before a newline at the end of the subject
+\z     matches only at the end of the subject
+\G     matches at the first matching position in the subject
+These assertions may not appear in character classes (but note that  \b
+has a different meaning, namely the backspace character, inside a char-
+acter class).
+A word boundary is a position in the subject string where  the  current
+character  and  the previous character do not both match \w or \W (i.e.
+one matches \w and the other matches \W), or the start or  end  of  the
+string if the first or last character matches \w, respectively.
+The  \A,  \Z,  and \z assertions differ from the traditional circumflex
+and dollar (described in the next section) in that they only ever match
+at  the  very start and end of the subject string, whatever options are
+set. Thus, they are independent of multiline mode. These  three  asser-
+tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which
+affect only the behaviour of the circumflex and dollar  metacharacters.
+However,  if the startoffset argument of pcre_exec() is non-zero, indi-
+cating that matching is to start at a point other than the beginning of
+the  subject,  \A  can never match. The difference between \Z and \z is
+that \Z matches before a newline at the end of the string as well as at
+the very end, whereas \z matches only at the end.
+The  \G assertion is true only when the current matching position is at
+the start point of the match, as specified by the startoffset  argument
+of  pcre_exec().  It  differs  from \A when the value of startoffset is
+non-zero. By calling pcre_exec() multiple times with appropriate  argu-
+ments, you can mimic Perl's /g option, and it is in this kind of imple-
+mentation where \G can be useful.
+Note, however, that PCRE's interpretation of \G, as the  start  of  the
+current match, is subtly different from Perl's, which defines it as the
+end of the previous match. In Perl, these can  be  different  when  the
+previously  matched  string was empty. Because PCRE does just one match
+at a time, it cannot reproduce this behaviour.
+If all the alternatives of a pattern begin with \G, the  expression  is
+anchored to the starting match position, and the "anchored" flag is set
+in the compiled regular expression.
+CIRCUMFLEX AND DOLLAR
+Outside a character class, in the default matching mode, the circumflex
+character  is  an  assertion  that is true only if the current matching
+point is at the start of the subject string. If the  startoffset  argu-
+ment  of  pcre_exec()  is  non-zero,  circumflex can never match if the
+PCRE_MULTILINE option is unset. Inside a  character  class,  circumflex
+has an entirely different meaning (see below).
+Circumflex  need  not be the first character of the pattern if a number
+of alternatives are involved, but it should be the first thing in  each
+alternative  in  which  it appears if the pattern is ever to match that
+branch. If all possible alternatives start with a circumflex, that  is,
+if  the  pattern  is constrained to match only at the start of the sub-
+ject, it is said to be an "anchored" pattern.  (There  are  also  other
+constructs that can cause a pattern to be anchored.)
+A  dollar  character  is  an assertion that is true only if the current
+matching point is at the end of  the  subject  string,  or  immediately
+before a newline at the end of the string (by default). Dollar need not
+be the last character of the pattern if a number  of  alternatives  are
+involved,  but  it  should  be  the last item in any branch in which it
+appears. Dollar has no special meaning in a character class.
+The meaning of dollar can be changed so that it  matches  only  at  the
+very  end  of  the string, by setting the PCRE_DOLLAR_ENDONLY option at
+compile time. This does not affect the \Z assertion.
+The meanings of the circumflex and dollar characters are changed if the
+PCRE_MULTILINE  option  is  set.  When  this  is the case, a circumflex
+matches immediately after internal newlines as well as at the start  of
+the  subject  string.  It  does not match after a newline that ends the
+string. A dollar matches before any newlines in the string, as well  as
+at  the very end, when PCRE_MULTILINE is set. When newline is specified
+as the two-character sequence CRLF, isolated CR and  LF  characters  do
+not indicate newlines.
+For  example, the pattern /^abc$/ matches the subject string "def\nabc"
+(where \n represents a newline) in multiline mode, but  not  otherwise.
+Consequently,  patterns  that  are anchored in single line mode because
+all branches start with ^ are not anchored in  multiline  mode,  and  a
+match  for  circumflex  is  possible  when  the startoffset argument of
+pcre_exec() is non-zero. The PCRE_DOLLAR_ENDONLY option is  ignored  if
+PCRE_MULTILINE is set.
+Note  that  the sequences \A, \Z, and \z can be used to match the start
+and end of the subject in both modes, and if all branches of a  pattern
+start  with  \A it is always anchored, whether or not PCRE_MULTILINE is
+set.
+FULL STOP (PERIOD, DOT)
+Outside a character class, a dot in the pattern matches any one charac-
+ter  in  the subject string except (by default) a character that signi-
+fies the end of a line. In UTF-8 mode, the  matched  character  may  be
+more than one byte long.
+When  a line ending is defined as a single character, dot never matches
+that character; when the two-character sequence CRLF is used, dot  does
+not  match  CR  if  it  is immediately followed by LF, but otherwise it
+matches all characters (including isolated CRs and LFs). When any  Uni-
+code  line endings are being recognized, dot does not match CR or LF or
+any of the other line ending characters.
+The behaviour of dot with regard to newlines can  be  changed.  If  the
+PCRE_DOTALL  option  is  set,  a dot matches any one character, without
+exception. If the two-character sequence CRLF is present in the subject
+string, it takes two dots to match it.
+The  handling of dot is entirely independent of the handling of circum-
+flex and dollar, the only relationship being  that  they  both  involve
+newlines. Dot has no special meaning in a character class.
+MATCHING A SINGLE BYTE
+Outside a character class, the escape sequence \C matches any one byte,
+both in and out of UTF-8 mode. Unlike a  dot,  it  always  matches  any
+line-ending  characters.  The  feature  is provided in Perl in order to
+match individual bytes in UTF-8 mode. Because it breaks up UTF-8  char-
+acters  into individual bytes, what remains in the string may be a mal-
+formed UTF-8 string. For this reason, the \C escape  sequence  is  best
+avoided.
+PCRE  does  not  allow \C to appear in lookbehind assertions (described
+below), because in UTF-8 mode this would make it impossible  to  calcu-
+late the length of the lookbehind.
+SQUARE BRACKETS AND CHARACTER CLASSES
+An opening square bracket introduces a character class, terminated by a
+closing square bracket. A closing square bracket on its own is not spe-
+cial. If a closing square bracket is required as a member of the class,
+it should be the first data character in the class  (after  an  initial
+circumflex, if present) or escaped with a backslash.
+A  character  class matches a single character in the subject. In UTF-8
+mode, the character may occupy more than one byte. A matched  character
+must be in the set of characters defined by the class, unless the first
+character in the class definition is a circumflex, in  which  case  the
+subject  character  must  not  be in the set defined by the class. If a
+circumflex is actually required as a member of the class, ensure it  is
+not the first character, or escape it with a backslash.
+For  example, the character class [aeiou] matches any lower case vowel,
+while [^aeiou] matches any character that is not a  lower  case  vowel.
+Note that a circumflex is just a convenient notation for specifying the
+characters that are in the class by enumerating those that are  not.  A
+class  that starts with a circumflex is not an assertion: it still con-
+sumes a character from the subject string, and therefore  it  fails  if
+the current pointer is at the end of the string.
+In  UTF-8 mode, characters with values greater than 255 can be included
+in a class as a literal string of bytes, or by using the  \x{  escaping
+mechanism.
+When  caseless  matching  is set, any letters in a class represent both
+their upper case and lower case versions, so for  example,  a  caseless
+[aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
+match "A", whereas a caseful version would. In UTF-8 mode, PCRE  always
+understands  the  concept  of case for characters whose values are less
+than 128, so caseless matching is always possible. For characters  with
+higher  values,  the  concept  of case is supported if PCRE is compiled
+with Unicode property support, but not otherwise.  If you want  to  use
+caseless  matching  for  characters 128 and above, you must ensure that
+PCRE is compiled with Unicode property support as well  as  with  UTF-8
+support.
+Characters  that  might  indicate  line breaks are never treated in any
+special way  when  matching  character  classes,  whatever  line-ending
+sequence  is  in  use,  and  whatever  setting  of  the PCRE_DOTALL and
+PCRE_MULTILINE options is used. A class such as [^a] always matches one
+of these characters.
+The  minus (hyphen) character can be used to specify a range of charac-
+ters in a character  class.  For  example,  [d-m]  matches  any  letter
+between  d  and  m,  inclusive.  If  a minus character is required in a
+class, it must be escaped with a backslash  or  appear  in  a  position
+where  it cannot be interpreted as indicating a range, typically as the
+first or last character in the class.
+It is not possible to have the literal character "]" as the end charac-
+ter  of a range. A pattern such as [W-]46] is interpreted as a class of
+two characters ("W" and "-") followed by a literal string "46]", so  it
+would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
+backslash it is interpreted as the end of range, so [W-\]46] is  inter-
+preted  as a class containing a range followed by two other characters.
+The octal or hexadecimal representation of "]" can also be used to  end
+a range.
+Ranges  operate in the collating sequence of character values. They can
+also  be  used  for  characters  specified  numerically,  for   example
+[\000-\037].  In UTF-8 mode, ranges can include characters whose values
+are greater than 255, for example [\x{100}-\x{2ff}].
+If a range that includes letters is used when caseless matching is set,
+it matches the letters in either case. For example, [W-c] is equivalent
+to [][\\^_`wxyzabc], matched caselessly,  and  in  non-UTF-8  mode,  if
+character  tables  for  a French locale are in use, [\xc8-\xcb] matches
+accented E characters in both cases. In UTF-8 mode, PCRE  supports  the
+concept  of  case for characters with values greater than 128 only when
+it is compiled with Unicode property support.
+The character types \d, \D, \p, \P, \s, \S, \w, and \W may also  appear
+in  a  character  class,  and add the characters that they match to the
+class. For example, [\dABCDEF] matches any hexadecimal digit. A circum-
+flex  can  conveniently  be used with the upper case character types to
+specify a more restricted set of characters  than  the  matching  lower
+case  type.  For example, the class [^\W_] matches any letter or digit,
+but not underscore.
+The only metacharacters that are recognized in  character  classes  are
+backslash,  hyphen  (only  where  it can be interpreted as specifying a
+range), circumflex (only at the start), opening  square  bracket  (only
+when  it can be interpreted as introducing a POSIX class name - see the
+next section), and the terminating  closing  square  bracket.  However,
+escaping other non-alphanumeric characters does no harm.
+POSIX CHARACTER CLASSES
+Perl supports the POSIX notation for character classes. This uses names
+enclosed by [: and :] within the enclosing square brackets.  PCRE  also
+supports this notation. For example,
+[01[:alpha:]%]
+matches "0", "1", any alphabetic character, or "%". The supported class
+names are
+alnum    letters and digits
+alpha    letters
+ascii    character codes 0 - 127
+blank    space or tab only
+cntrl    control characters
+digit    decimal digits (same as \d)
+graph    printing characters, excluding space
+lower    lower case letters
+print    printing characters, including space
+punct    printing characters, excluding letters and digits
+space    white space (not quite the same as \s)
+upper    upper case letters
+word     "word" characters (same as \w)
+xdigit   hexadecimal digits
+The "space" characters are HT (9), LF (10), VT (11), FF (12), CR  (13),
+and  space  (32). Notice that this list includes the VT character (code
+11). This makes "space" different to \s, which does not include VT (for
+Perl compatibility).
+The  name  "word"  is  a Perl extension, and "blank" is a GNU extension
+from Perl 5.8. Another Perl extension is negation, which  is  indicated
+by a ^ character after the colon. For example,
+[12[:^digit:]]
+matches  "1", "2", or any non-digit. PCRE (and Perl) also recognize the
+POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
+these are not supported, and an error is given if they are encountered.
+In UTF-8 mode, characters with values greater than 128 do not match any
+of the POSIX character classes.
+VERTICAL BAR
+Vertical  bar characters are used to separate alternative patterns. For
+example, the pattern
+gilbert|sullivan
+matches either "gilbert" or "sullivan". Any number of alternatives  may
+appear,  and  an  empty  alternative  is  permitted (matching the empty
+string). The matching process tries each alternative in turn, from left
+to  right, and the first one that succeeds is used. If the alternatives
+are within a subpattern (defined below), "succeeds" means matching  the
+rest  of the main pattern as well as the alternative in the subpattern.
+INTERNAL OPTION SETTING
+The settings of the  PCRE_CASELESS,  PCRE_MULTILINE,  PCRE_DOTALL,  and
+PCRE_EXTENDED  options  (which are Perl-compatible) can be changed from
+within the pattern by  a  sequence  of  Perl  option  letters  enclosed
+between "(?" and ")".  The option letters are
+i  for PCRE_CASELESS
+m  for PCRE_MULTILINE
+s  for PCRE_DOTALL
+x  for PCRE_EXTENDED
+For example, (?im) sets caseless, multiline matching. It is also possi-
+ble to unset these options by preceding the letter with a hyphen, and a
+combined  setting and unsetting such as (?im-sx), which sets PCRE_CASE-
+LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and  PCRE_EXTENDED,
+is  also  permitted.  If  a  letter  appears  both before and after the
+hyphen, the option is unset.
+The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and  PCRE_EXTRA
+can  be changed in the same way as the Perl-compatible options by using
+the characters J, U and X respectively.
+When an option change occurs at top level (that is, not inside  subpat-
+tern  parentheses),  the change applies to the remainder of the pattern
+that follows.  If the change is placed right at the start of a pattern,
+PCRE extracts it into the global options (and it will therefore show up
+in data extracted by the pcre_fullinfo() function).
+An option change within a subpattern (see below for  a  description  of
+subpatterns) affects only that part of the current pattern that follows
+it, so
+(a(?i)b)c
+matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
+used).   By  this means, options can be made to have different settings
+in different parts of the pattern. Any changes made in one  alternative
+do  carry  on  into subsequent branches within the same subpattern. For
+example,
+(a(?i)b|c)
+matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
+first  branch  is  abandoned before the option setting. This is because
+the effects of option settings happen at compile time. There  would  be
+some very weird behaviour otherwise.
+Note:  There  are  other  PCRE-specific  options that can be set by the
+application when the compile or match functions  are  called.  In  some
+cases  the  pattern  can  contain special leading sequences to override
+what the application has set or what has been  defaulted.  Details  are
+given in the section entitled "Newline sequences" above.
+SUBPATTERNS
+Subpatterns are delimited by parentheses (round brackets), which can be
+nested.  Turning part of a pattern into a subpattern does two things:
+1. It localizes a set of alternatives. For example, the pattern
+cat(aract|erpillar|)
+matches one of the words "cat", "cataract", or  "caterpillar".  Without
+the  parentheses,  it  would  match  "cataract", "erpillar" or an empty
+string.
+2. It sets up the subpattern as  a  capturing  subpattern.  This  means
+that,  when  the  whole  pattern  matches,  that portion of the subject
+string that matched the subpattern is passed back to the caller via the
+ovector  argument  of pcre_exec(). Opening parentheses are counted from
+left to right (starting from 1) to obtain  numbers  for  the  capturing
+subpatterns.
+For  example,  if the string "the red king" is matched against the pat-
+tern
+the ((red|white) (king|queen))
+the captured substrings are "red king", "red", and "king", and are num-
+bered 1, 2, and 3, respectively.
+The  fact  that  plain  parentheses  fulfil two functions is not always
+helpful.  There are often times when a grouping subpattern is  required
+without  a capturing requirement. If an opening parenthesis is followed
+by a question mark and a colon, the subpattern does not do any  captur-
+ing,  and  is  not  counted when computing the number of any subsequent
+capturing subpatterns. For example, if the string "the white queen"  is
+matched against the pattern
+the ((?:red|white) (king|queen))
+the captured substrings are "white queen" and "queen", and are numbered
+1 and 2. The maximum number of capturing subpatterns is 65535.
+As a convenient shorthand, if any option settings are required  at  the
+start  of  a  non-capturing  subpattern,  the option letters may appear
+between the "?" and the ":". Thus the two patterns
+(?i:saturday|sunday)
+(?:(?i)saturday|sunday)
+match exactly the same set of strings. Because alternative branches are
+tried  from  left  to right, and options are not reset until the end of
+the subpattern is reached, an option setting in one branch does  affect
+subsequent  branches,  so  the above patterns match "SUNDAY" as well as
+"Saturday".
+DUPLICATE SUBPATTERN NUMBERS
+Perl 5.10 introduced a feature whereby each alternative in a subpattern
+uses  the same numbers for its capturing parentheses. Such a subpattern
+starts with (?| and is itself a non-capturing subpattern. For  example,
+consider this pattern:
+(?|(Sat)ur|(Sun))day
+Because  the two alternatives are inside a (?| group, both sets of cap-
+turing parentheses are numbered one. Thus, when  the  pattern  matches,
+you  can  look  at captured substring number one, whichever alternative
+matched. This construct is useful when you want to  capture  part,  but
+not all, of one of a number of alternatives. Inside a (?| group, paren-
+theses are numbered as usual, but the number is reset at the  start  of
+each  branch. The numbers of any capturing buffers that follow the sub-
+pattern start after the highest number used in any branch. The  follow-
+ing  example  is taken from the Perl documentation.  The numbers under-
+neath show in which buffer the captured content will be stored.
+# before  ---------------branch-reset----------- after
+/ ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+# 1            2         2  3        2     3     4
+A backreference or a recursive call to  a  numbered  subpattern  always
+refers to the first one in the pattern with the given number.
+An  alternative approach to using this "branch reset" feature is to use
+duplicate named subpatterns, as described in the next section.
+NAMED SUBPATTERNS
+Identifying capturing parentheses by number is simple, but  it  can  be
+very  hard  to keep track of the numbers in complicated regular expres-
+sions. Furthermore, if an  expression  is  modified,  the  numbers  may
+change.  To help with this difficulty, PCRE supports the naming of sub-
+patterns. This feature was not added to Perl until release 5.10. Python
+had  the  feature earlier, and PCRE introduced it at release 4.0, using
+the Python syntax. PCRE now supports both the Perl and the Python  syn-
+tax.
+In  PCRE,  a subpattern can be named in one of three ways: (?<name>...)
+or (?'name'...) as in Perl, or (?P<name>...) as in  Python.  References
+to capturing parentheses from other parts of the pattern, such as back-
+references, recursion, and conditions, can be made by name as  well  as
+by number.
+Names  consist  of  up  to  32 alphanumeric characters and underscores.
+Named capturing parentheses are still  allocated  numbers  as  well  as
+names,  exactly as if the names were not present. The PCRE API provides
+function calls for extracting the name-to-number translation table from
+a compiled pattern. There is also a convenience function for extracting
+a captured substring by name.
+By default, a name must be unique within a pattern, but it is  possible
+to relax this constraint by setting the PCRE_DUPNAMES option at compile
+time. This can be useful for patterns where only one  instance  of  the
+named  parentheses  can  match. Suppose you want to match the name of a
+weekday, either as a 3-letter abbreviation or as the full name, and  in
+both cases you want to extract the abbreviation. This pattern (ignoring
+the line breaks) does the job:
+(?<DN>Mon|Fri|Sun)(?:day)?|
+(?<DN>Tue)(?:sday)?|
+(?<DN>Wed)(?:nesday)?|
+(?<DN>Thu)(?:rsday)?|
+(?<DN>Sat)(?:urday)?
+There are five capturing substrings, but only one is ever set  after  a
+match.  (An alternative way of solving this problem is to use a "branch
+reset" subpattern, as described in the previous section.)
+The convenience function for extracting the data by  name  returns  the
+substring  for  the first (and in this example, the only) subpattern of
+that name that matched. This saves searching  to  find  which  numbered
+subpattern  it  was. If you make a reference to a non-unique named sub-
+pattern from elsewhere in the pattern, the one that corresponds to  the
+lowest  number  is used. For further details of the interfaces for han-
+dling named subpatterns, see the pcreapi documentation.
+REPETITION
+Repetition is specified by quantifiers, which can  follow  any  of  the
+following items:
+a literal data character
+the dot metacharacter
+the \C escape sequence
+the \X escape sequence (in UTF-8 mode with Unicode properties)
+the \R escape sequence
+an escape such as \d that matches a single character
+a character class
+a back reference (see next section)
+a parenthesized subpattern (unless it is an assertion)
+The  general repetition quantifier specifies a minimum and maximum num-
+ber of permitted matches, by giving the two numbers in  curly  brackets
+(braces),  separated  by  a comma. The numbers must be less than 65536,
+and the first must be less than or equal to the second. For example:
+z{2,4}
+matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
+special  character.  If  the second number is omitted, but the comma is
+present, there is no upper limit; if the second number  and  the  comma
+are  both omitted, the quantifier specifies an exact number of required
+matches. Thus
+[aeiou]{3,}
+matches at least 3 successive vowels, but may match many more, while
+\d{8}
+matches exactly 8 digits. An opening curly bracket that  appears  in  a
+position  where a quantifier is not allowed, or one that does not match
+the syntax of a quantifier, is taken as a literal character. For  exam-
+ple, {,6} is not a quantifier, but a literal string of four characters.
+In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
+individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
+acters, each of which is represented by a two-byte sequence. Similarly,
+when Unicode property support is available, \X{3} matches three Unicode
+extended sequences, each of which may be several bytes long  (and  they
+may be of different lengths).
+The quantifier {0} is permitted, causing the expression to behave as if
+the previous item and the quantifier were not present. This may be use-
+ful  for  subpatterns that are referenced as subroutines from elsewhere
+in the pattern. Items other than subpatterns that have a {0} quantifier
+are omitted from the compiled pattern.
+For  convenience, the three most common quantifiers have single-charac-
+ter abbreviations:
+*    is equivalent to {0,}
++    is equivalent to {1,}
+?    is equivalent to {0,1}
+It is possible to construct infinite loops by  following  a  subpattern
+that can match no characters with a quantifier that has no upper limit,
+for example:
+(a?)*
+Earlier versions of Perl and PCRE used to give an error at compile time
+for  such  patterns. However, because there are cases where this can be
+useful, such patterns are now accepted, but if any  repetition  of  the
+subpattern  does in fact match no characters, the loop is forcibly bro-
+ken.
+By default, the quantifiers are "greedy", that is, they match  as  much
+as  possible  (up  to  the  maximum number of permitted times), without
+causing the rest of the pattern to fail. The classic example  of  where
+this gives problems is in trying to match comments in C programs. These
+appear between /* and */ and within the comment,  individual  *  and  /
+characters  may  appear. An attempt to match C comments by applying the
+pattern
+/\*.*\*/
+to the string
+/* first comment */  not comment  /* second comment */
+fails, because it matches the entire string owing to the greediness  of
+the .*  item.
+However,  if  a quantifier is followed by a question mark, it ceases to
+be greedy, and instead matches the minimum number of times possible, so
+the pattern
+/\*.*?\*/
+does  the  right  thing with the C comments. The meaning of the various
+quantifiers is not otherwise changed,  just  the  preferred  number  of
+matches.   Do  not  confuse this use of question mark with its use as a
+quantifier in its own right. Because it has two uses, it can  sometimes
+appear doubled, as in
+\d??\d
+which matches one digit by preference, but can match two if that is the
+only way the rest of the pattern matches.
+If the PCRE_UNGREEDY option is set (an option that is not available  in
+Perl),  the  quantifiers are not greedy by default, but individual ones
+can be made greedy by following them with a  question  mark.  In  other
+words, it inverts the default behaviour.
+When  a  parenthesized  subpattern  is quantified with a minimum repeat
+count that is greater than 1 or with a limited maximum, more memory  is
+required  for  the  compiled  pattern, in proportion to the size of the
+minimum or maximum.
+If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
+alent  to  Perl's  /s) is set, thus allowing the dot to match newlines,
+the pattern is implicitly anchored, because whatever  follows  will  be
+tried  against every character position in the subject string, so there
+is no point in retrying the overall match at  any  position  after  the
+first.  PCRE  normally treats such a pattern as though it were preceded
+by \A.
+In cases where it is known that the subject  string  contains  no  new-
+lines,  it  is  worth setting PCRE_DOTALL in order to obtain this opti-
+mization, or alternatively using ^ to indicate anchoring explicitly.
+However, there is one situation where the optimization cannot be  used.
+When  .*   is  inside  capturing  parentheses that are the subject of a
+backreference elsewhere in the pattern, a match at the start  may  fail
+where a later one succeeds. Consider, for example:
+(.*)abc\1
+If  the subject is "xyz123abc123" the match point is the fourth charac-
+ter. For this reason, such a pattern is not implicitly anchored.
+When a capturing subpattern is repeated, the value captured is the sub-
+string that matched the final iteration. For example, after
+(tweedle[dume]{3}\s*)+
+has matched "tweedledum tweedledee" the value of the captured substring
+is "tweedledee". However, if there are  nested  capturing  subpatterns,
+the  corresponding captured values may have been set in previous itera-
+tions. For example, after
+/(a|(b))+/
+matches "aba" the value of the second captured substring is "b".
+ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
+With both maximizing ("greedy") and minimizing ("ungreedy"  or  "lazy")
+repetition,  failure  of what follows normally causes the repeated item
+to be re-evaluated to see if a different number of repeats  allows  the
+rest  of  the pattern to match. Sometimes it is useful to prevent this,
+either to change the nature of the match, or to cause it  fail  earlier
+than  it otherwise might, when the author of the pattern knows there is
+no point in carrying on.
+Consider, for example, the pattern \d+foo when applied to  the  subject
+line
+123456bar
+After matching all 6 digits and then failing to match "foo", the normal
+action of the matcher is to try again with only 5 digits  matching  the
+\d+  item,  and  then  with  4,  and  so on, before ultimately failing.
+"Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
+the  means for specifying that once a subpattern has matched, it is not
+to be re-evaluated in this way.
+If we use atomic grouping for the previous example, the  matcher  gives
+up  immediately  on failing to match "foo" the first time. The notation
+is a kind of special parenthesis, starting with (?> as in this example:
+(?>\d+)foo
+This  kind  of  parenthesis "locks up" the  part of the pattern it con-
+tains once it has matched, and a failure further into  the  pattern  is
+prevented  from  backtracking into it. Backtracking past it to previous
+items, however, works as normal.
+An alternative description is that a subpattern of  this  type  matches
+the  string  of  characters  that an identical standalone pattern would
+match, if anchored at the current point in the subject string.
+Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+such as the above example can be thought of as a maximizing repeat that
+must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
+pared  to  adjust  the number of digits they match in order to make the
+rest of the pattern match, (?>\d+) can only match an entire sequence of
+digits.
+Atomic  groups in general can of course contain arbitrarily complicated
+subpatterns, and can be nested. However, when  the  subpattern  for  an
+atomic group is just a single repeated item, as in the example above, a
+simpler notation, called a "possessive quantifier" can  be  used.  This
+consists  of  an  additional  + character following a quantifier. Using
+this notation, the previous example can be rewritten as
+\d++foo
+Note that a possessive quantifier can be used with an entire group, for
+example:
+(abc|xyz){2,3}+
+Possessive   quantifiers   are   always  greedy;  the  setting  of  the
+PCRE_UNGREEDY option is ignored. They are a convenient notation for the
+simpler  forms  of atomic group. However, there is no difference in the
+meaning of a possessive quantifier and  the  equivalent  atomic  group,
+though  there  may  be a performance difference; possessive quantifiers
+should be slightly faster.
+The possessive quantifier syntax is an extension to the Perl  5.8  syn-
+tax.   Jeffrey  Friedl  originated the idea (and the name) in the first
+edition of his book. Mike McCloskey liked it, so implemented it when he
+built  Sun's Java package, and PCRE copied it from there. It ultimately
+found its way into Perl at release 5.10.
+PCRE has an optimization that automatically "possessifies" certain sim-
+ple  pattern  constructs.  For  example, the sequence A+B is treated as
+A++B because there is no point in backtracking into a sequence  of  A's
+when B must follow.
+When  a  pattern  contains an unlimited repeat inside a subpattern that
+can itself be repeated an unlimited number of  times,  the  use  of  an
+atomic  group  is  the  only way to avoid some failing matches taking a
+very long time indeed. The pattern
+(\D+|<\d+>)*[!?]
+matches an unlimited number of substrings that either consist  of  non-
+digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
+matches, it runs quickly. However, if it is applied to
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+it takes a long time before reporting  failure.  This  is  because  the
+string  can be divided between the internal \D+ repeat and the external
+* repeat in a large number of ways, and all  have  to  be  tried.  (The
+example  uses  [!?]  rather than a single character at the end, because
+both PCRE and Perl have an optimization that allows  for  fast  failure
+when  a single character is used. They remember the last single charac-
+ter that is required for a match, and fail early if it is  not  present
+in  the  string.)  If  the pattern is changed so that it uses an atomic
+group, like this:
+((?>\D+)|<\d+>)*[!?]
+sequences of non-digits cannot be broken, and failure happens  quickly.
+BACK REFERENCES
+Outside a character class, a backslash followed by a digit greater than
+0 (and possibly further digits) is a back reference to a capturing sub-
+pattern  earlier  (that is, to its left) in the pattern, provided there
+have been that many previous capturing left parentheses.
+However, if the decimal number following the backslash is less than 10,
+it  is  always  taken  as a back reference, and causes an error only if
+there are not that many capturing left parentheses in the  entire  pat-
+tern.  In  other words, the parentheses that are referenced need not be
+to the left of the reference for numbers less than 10. A "forward  back
+reference"  of  this  type can make sense when a repetition is involved
+and the subpattern to the right has participated in an  earlier  itera-
+tion.
+It  is  not  possible to have a numerical "forward back reference" to a
+subpattern whose number is 10 or  more  using  this  syntax  because  a
+sequence  such  as  \50 is interpreted as a character defined in octal.
+See the subsection entitled "Non-printing characters" above for further
+details  of  the  handling of digits following a backslash. There is no
+such problem when named parentheses are used. A back reference  to  any
+subpattern is possible using named parentheses (see below).
+Another  way  of  avoiding  the ambiguity inherent in the use of digits
+following a backslash is to use the \g escape sequence, which is a fea-
+ture  introduced  in  Perl  5.10.  This  escape  must be followed by an
+unsigned number or a negative number, optionally  enclosed  in  braces.
+These examples are all identical:
+(ring), \1
+(ring), \g1
+(ring), \g{1}
+An  unsigned number specifies an absolute reference without the ambigu-
+ity that is present in the older syntax. It is also useful when literal
+digits follow the reference. A negative number is a relative reference.
+Consider this example:
+(abc(def)ghi)\g{-1}
+The sequence \g{-1} is a reference to the most recently started captur-
+ing  subpattern  before \g, that is, is it equivalent to \2. Similarly,
+\g{-2} would be equivalent to \1. The use of relative references can be
+helpful  in  long  patterns,  and  also in patterns that are created by
+joining together fragments that contain references within themselves.
+A back reference matches whatever actually matched the  capturing  sub-
+pattern  in  the  current subject string, rather than anything matching
+the subpattern itself (see "Subpatterns as subroutines" below for a way
+of doing that). So the pattern
+(sens|respons)e and \1ibility
+matches  "sense and sensibility" and "response and responsibility", but
+not "sense and responsibility". If caseful matching is in force at  the
+time  of the back reference, the case of letters is relevant. For exam-
+ple,
+((?i)rah)\s+\1
+matches "rah rah" and "RAH RAH", but not "RAH  rah",  even  though  the
+original capturing subpattern is matched caselessly.
+There  are  several  different ways of writing back references to named
+subpatterns. The .NET syntax \k{name} and the Perl syntax  \k<name>  or
+\k'name'  are supported, as is the Python syntax (?P=name). Perl 5.10's
+unified back reference syntax, in which \g can be used for both numeric
+and  named  references,  is  also supported. We could rewrite the above
+example in any of the following ways:
+(?<p1>(?i)rah)\s+\k<p1>
+(?'p1'(?i)rah)\s+\k{p1}
+(?P<p1>(?i)rah)\s+(?P=p1)
+(?<p1>(?i)rah)\s+\g{p1}
+A subpattern that is referenced by  name  may  appear  in  the  pattern
+before or after the reference.
+There  may be more than one back reference to the same subpattern. If a
+subpattern has not actually been used in a particular match,  any  back
+references to it always fail. For example, the pattern
+(a|(bc))\2
+always  fails if it starts to match "a" rather than "bc". Because there
+may be many capturing parentheses in a pattern,  all  digits  following
+the  backslash  are taken as part of a potential back reference number.
+If the pattern continues with a digit character, some delimiter must be
+used  to  terminate  the back reference. If the PCRE_EXTENDED option is
+set, this can be whitespace.  Otherwise an  empty  comment  (see  "Com-
+ments" below) can be used.
+A  back reference that occurs inside the parentheses to which it refers
+fails when the subpattern is first used, so, for example,  (a\1)  never
+matches.   However,  such references can be useful inside repeated sub-
+patterns. For example, the pattern
+(a|b\1)+
+matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
+ation  of  the  subpattern,  the  back  reference matches the character
+string corresponding to the previous iteration. In order  for  this  to
+work,  the  pattern must be such that the first iteration does not need
+to match the back reference. This can be done using alternation, as  in
+the example above, or by a quantifier with a minimum of zero.
+ASSERTIONS
+An  assertion  is  a  test on the characters following or preceding the
+current matching point that does not actually consume  any  characters.
+The  simple  assertions  coded  as  \b, \B, \A, \G, \Z, \z, ^ and $ are
+described above.
+More complicated assertions are coded as  subpatterns.  There  are  two
+kinds:  those  that  look  ahead of the current position in the subject
+string, and those that look  behind  it.  An  assertion  subpattern  is
+matched  in  the  normal way, except that it does not cause the current
+matching position to be changed.
+Assertion subpatterns are not capturing subpatterns,  and  may  not  be
+repeated,  because  it  makes no sense to assert the same thing several
+times. If any kind of assertion contains capturing  subpatterns  within
+it,  these are counted for the purposes of numbering the capturing sub-
+patterns in the whole pattern.  However, substring capturing is carried
+out  only  for  positive assertions, because it does not make sense for
+negative assertions.
+Lookahead assertions
+Lookahead assertions start with (?= for positive assertions and (?! for
+negative assertions. For example,
+\w+(?=;)
+matches  a word followed by a semicolon, but does not include the semi-
+colon in the match, and
+foo(?!bar)
+matches any occurrence of "foo" that is not  followed  by  "bar".  Note
+that the apparently similar pattern
+(?!foo)bar
+does  not  find  an  occurrence  of "bar" that is preceded by something
+other than "foo"; it finds any occurrence of "bar" whatsoever,  because
+the assertion (?!foo) is always true when the next three characters are
+"bar". A lookbehind assertion is needed to achieve the other effect.
+If you want to force a matching failure at some point in a pattern, the
+most  convenient  way  to  do  it  is with (?!) because an empty string
+always matches, so an assertion that requires there not to be an  empty
+string must always fail.
+Lookbehind assertions
+Lookbehind  assertions start with (?<= for positive assertions and (?<!
+for negative assertions. For example,
+(?<!foo)bar
+does find an occurrence of "bar" that is not  preceded  by  "foo".  The
+contents  of  a  lookbehind  assertion are restricted such that all the
+strings it matches must have a fixed length. However, if there are sev-
+eral  top-level  alternatives,  they  do  not all have to have the same
+fixed length. Thus
+(?<=bullock|donkey)
+is permitted, but
+(?<!dogs?|cats?)
+causes an error at compile time. Branches that match  different  length
+strings  are permitted only at the top level of a lookbehind assertion.
+This is an extension compared with  Perl  (at  least  for  5.8),  which
+requires  all branches to match the same length of string. An assertion
+such as
+(?<=ab(c|de))
+is not permitted, because its single top-level  branch  can  match  two
+different  lengths,  but  it is acceptable if rewritten to use two top-
+level branches:
+(?<=abc|abde)
+In some cases, the Perl 5.10 escape sequence \K (see above) can be used
+instead  of  a lookbehind assertion; this is not restricted to a fixed-
+length.
+The implementation of lookbehind assertions is, for  each  alternative,
+to  temporarily  move the current position back by the fixed length and
+then try to match. If there are insufficient characters before the cur-
+rent position, the assertion fails.
+PCRE does not allow the \C escape (which matches a single byte in UTF-8
+mode) to appear in lookbehind assertions, because it makes it  impossi-
+ble  to  calculate the length of the lookbehind. The \X and \R escapes,
+which can match different numbers of bytes, are also not permitted.
+Possessive quantifiers can  be  used  in  conjunction  with  lookbehind
+assertions  to  specify  efficient  matching  at the end of the subject
+string. Consider a simple pattern such as
+abcd$
+when applied to a long string that does  not  match.  Because  matching
+proceeds from left to right, PCRE will look for each "a" in the subject
+and then see if what follows matches the rest of the  pattern.  If  the
+pattern is specified as
+^.*abcd$
+the  initial .* matches the entire string at first, but when this fails
+(because there is no following "a"), it backtracks to match all but the
+last  character,  then all but the last two characters, and so on. Once
+again the search for "a" covers the entire string, from right to  left,
+so we are no better off. However, if the pattern is written as
+^.*+(?<=abcd)
+there  can  be  no backtracking for the .*+ item; it can match only the
+entire string. The subsequent lookbehind assertion does a  single  test
+on  the last four characters. If it fails, the match fails immediately.
+For long strings, this approach makes a significant difference  to  the
+processing time.
+Using multiple assertions
+Several assertions (of any sort) may occur in succession. For example,
+(?<=\d{3})(?<!999)foo
+matches  "foo" preceded by three digits that are not "999". Notice that
+each of the assertions is applied independently at the  same  point  in
+the  subject  string.  First  there  is a check that the previous three
+characters are all digits, and then there is  a  check  that  the  same
+three characters are not "999".  This pattern does not match "foo" pre-
+ceded by six characters, the first of which are  digits  and  the  last
+three  of  which  are not "999". For example, it doesn't match "123abc-
+foo". A pattern to do that is
+(?<=\d{3}...)(?<!999)foo
+This time the first assertion looks at the  preceding  six  characters,
+checking that the first three are digits, and then the second assertion
+checks that the preceding three characters are not "999".
+Assertions can be nested in any combination. For example,
+(?<=(?<!foo)bar)baz
+matches an occurrence of "baz" that is preceded by "bar" which in  turn
+is not preceded by "foo", while
+(?<=\d{3}(?!999)...)foo
+is  another pattern that matches "foo" preceded by three digits and any
+three characters that are not "999".
+CONDITIONAL SUBPATTERNS
+It is possible to cause the matching process to obey a subpattern  con-
+ditionally  or to choose between two alternative subpatterns, depending
+on the result of an assertion, or whether a previous capturing  subpat-
+tern  matched  or not. The two possible forms of conditional subpattern
+are
+(?(condition)yes-pattern)
+(?(condition)yes-pattern|no-pattern)
+If the condition is satisfied, the yes-pattern is used;  otherwise  the
+no-pattern  (if  present)  is used. If there are more than two alterna-
+tives in the subpattern, a compile-time error occurs.
+There are four kinds of condition: references  to  subpatterns,  refer-
+ences to recursion, a pseudo-condition called DEFINE, and assertions.
+Checking for a used subpattern by number
+If  the  text between the parentheses consists of a sequence of digits,
+the condition is true if the capturing subpattern of  that  number  has
+previously  matched.  An  alternative notation is to precede the digits
+with a plus or minus sign. In this case, the subpattern number is rela-
+tive rather than absolute.  The most recently opened parentheses can be
+referenced by (?(-1), the next most recent by (?(-2),  and  so  on.  In
+looping constructs it can also make sense to refer to subsequent groups
+with constructs such as (?(+2).
+Consider the following pattern, which  contains  non-significant  white
+space to make it more readable (assume the PCRE_EXTENDED option) and to
+divide it into three parts for ease of discussion:
+( \( )?    [^()]+    (?(1) \) )
+The first part matches an optional opening  parenthesis,  and  if  that
+character is present, sets it as the first captured substring. The sec-
+ond part matches one or more characters that are not  parentheses.  The
+third part is a conditional subpattern that tests whether the first set
+of parentheses matched or not. If they did, that is, if subject started
+with an opening parenthesis, the condition is true, and so the yes-pat-
+tern is executed and a  closing  parenthesis  is  required.  Otherwise,
+since  no-pattern  is  not  present, the subpattern matches nothing. In
+other words,  this  pattern  matches  a  sequence  of  non-parentheses,
+optionally enclosed in parentheses.
+If  you  were  embedding  this pattern in a larger one, you could use a
+relative reference:
+...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
+This makes the fragment independent of the parentheses  in  the  larger
+pattern.
+Checking for a used subpattern by name
+Perl  uses  the  syntax  (?(<name>)...) or (?('name')...) to test for a
+used subpattern by name. For compatibility  with  earlier  versions  of
+PCRE,  which  had this facility before Perl, the syntax (?(name)...) is
+also recognized. However, there is a possible ambiguity with this  syn-
+tax,  because  subpattern  names  may  consist entirely of digits. PCRE
+looks first for a named subpattern; if it cannot find one and the  name
+consists  entirely  of digits, PCRE looks for a subpattern of that num-
+ber, which must be greater than zero. Using subpattern names that  con-
+sist entirely of digits is not recommended.
+Rewriting the above example to use a named subpattern gives this:
+(?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
+Checking for pattern recursion
+If the condition is the string (R), and there is no subpattern with the
+name R, the condition is true if a recursive call to the whole  pattern
+or any subpattern has been made. If digits or a name preceded by amper-
+sand follow the letter R, for example:
+(?(R3)...) or (?(R&name)...)
+the condition is true if the most recent recursion is into the  subpat-
+tern  whose  number or name is given. This condition does not check the
+entire recursion stack.
+At "top level", all these recursion test conditions are  false.  Recur-
+sive patterns are described below.
+Defining subpatterns for use by reference only
+If  the  condition  is  the string (DEFINE), and there is no subpattern
+with the name DEFINE, the condition is  always  false.  In  this  case,
+there  may  be  only  one  alternative  in the subpattern. It is always
+skipped if control reaches this point  in  the  pattern;  the  idea  of
+DEFINE  is that it can be used to define "subroutines" that can be ref-
+erenced from elsewhere. (The use of "subroutines" is described  below.)
+For  example,  a pattern to match an IPv4 address could be written like
+this (ignore whitespace and line breaks):
+(?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
+\b (?&byte) (\.(?&byte)){3} \b
+The first part of the pattern is a DEFINE group inside which a  another
+group  named "byte" is defined. This matches an individual component of
+an IPv4 address (a number less than 256). When  matching  takes  place,
+this  part  of  the pattern is skipped because DEFINE acts like a false
+condition.
+The rest of the pattern uses references to the named group to match the
+four  dot-separated  components of an IPv4 address, insisting on a word
+boundary at each end.
+Assertion conditions
+If the condition is not in any of the above  formats,  it  must  be  an
+assertion.   This may be a positive or negative lookahead or lookbehind
+assertion. Consider  this  pattern,  again  containing  non-significant
+white space, and with the two alternatives on the second line:
+(?(?=[^a-z]*[a-z])
+\d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
+The  condition  is  a  positive  lookahead  assertion  that  matches an
+optional sequence of non-letters followed by a letter. In other  words,
+it  tests  for the presence of at least one letter in the subject. If a
+letter is found, the subject is matched against the first  alternative;
+otherwise  it  is  matched  against  the  second.  This pattern matches
+strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
+letters and dd are digits.
+COMMENTS
+The  sequence (?# marks the start of a comment that continues up to the
+next closing parenthesis. Nested parentheses  are  not  permitted.  The
+characters  that make up a comment play no part in the pattern matching
+at all.
+If the PCRE_EXTENDED option is set, an unescaped # character outside  a
+character  class  introduces  a  comment  that continues to immediately
+after the next newline in the pattern.
+RECURSIVE PATTERNS
+Consider the problem of matching a string in parentheses, allowing  for
+unlimited  nested  parentheses.  Without the use of recursion, the best
+that can be done is to use a pattern that  matches  up  to  some  fixed
+depth  of  nesting.  It  is not possible to handle an arbitrary nesting
+depth.
+For some time, Perl has provided a facility that allows regular expres-
+sions  to recurse (amongst other things). It does this by interpolating
+Perl code in the expression at run time, and the code can refer to  the
+expression itself. A Perl pattern using code interpolation to solve the
+parentheses problem can be created like this:
+$re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
+The (?p{...}) item interpolates Perl code at run time, and in this case
+refers recursively to the pattern in which it appears.
+Obviously, PCRE cannot support the interpolation of Perl code. Instead,
+it supports special syntax for recursion of  the  entire  pattern,  and
+also  for  individual  subpattern  recursion. After its introduction in
+PCRE and Python, this kind of recursion was  introduced  into  Perl  at
+release 5.10.
+A  special  item  that consists of (? followed by a number greater than
+zero and a closing parenthesis is a recursive call of the subpattern of
+the  given  number, provided that it occurs inside that subpattern. (If
+not, it is a "subroutine" call, which is described  in  the  next  sec-
+tion.)  The special item (?R) or (?0) is a recursive call of the entire
+regular expression.
+In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
+always treated as an atomic group. That is, once it has matched some of
+the subject string, it is never re-entered, even if it contains untried
+alternatives and there is a subsequent matching failure.
+This  PCRE  pattern  solves  the nested parentheses problem (assume the
+PCRE_EXTENDED option is set so that white space is ignored):
+\( ( (?>[^()]+) | (?R) )* \)
+First it matches an opening parenthesis. Then it matches any number  of
+substrings  which  can  either  be  a sequence of non-parentheses, or a
+recursive match of the pattern itself (that is, a  correctly  parenthe-
+sized substring).  Finally there is a closing parenthesis.
+If  this  were  part of a larger pattern, you would not want to recurse
+the entire pattern, so instead you could use this:
+( \( ( (?>[^()]+) | (?1) )* \) )
+We have put the pattern into parentheses, and caused the  recursion  to
+refer to them instead of the whole pattern.
+In  a  larger  pattern,  keeping  track  of  parenthesis numbers can be
+tricky. This is made easier by the use of relative references. (A  Perl
+5.10  feature.)   Instead  of  (?1)  in the pattern above you can write
+(?-2) to refer to the second most recently opened parentheses preceding
+the  recursion.  In  other  words,  a  negative number counts capturing
+parentheses leftwards from the point at which it is encountered.
+It is also possible to refer to  subsequently  opened  parentheses,  by
+writing  references  such  as (?+2). However, these cannot be recursive
+because the reference is not inside the  parentheses  that  are  refer-
+enced.  They  are  always  "subroutine" calls, as described in the next
+section.
+An alternative approach is to use named parentheses instead.  The  Perl
+syntax  for  this  is (?&name); PCRE's earlier syntax (?P>name) is also
+supported. We could rewrite the above example as follows:
+(?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
+If there is more than one subpattern with the same name,  the  earliest
+one is used.
+This  particular  example pattern that we have been looking at contains
+nested unlimited repeats, and so the use of atomic grouping for  match-
+ing  strings  of non-parentheses is important when applying the pattern
+to strings that do not match. For example, when this pattern is applied
+to
+(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+it  yields "no match" quickly. However, if atomic grouping is not used,
+the match runs for a very long time indeed because there  are  so  many
+different  ways  the  + and * repeats can carve up the subject, and all
+have to be tested before failure can be reported.
+At the end of a match, the values set for any capturing subpatterns are
+those from the outermost level of the recursion at which the subpattern
+value is set.  If you want to obtain  intermediate  values,  a  callout
+function  can be used (see below and the pcrecallout documentation). If
+the pattern above is matched against
+(ab(cd)ef)
+the value for the capturing parentheses is  "ef",  which  is  the  last
+value  taken  on at the top level. If additional parentheses are added,
+giving
+\( ( ( (?>[^()]+) | (?R) )* ) \)
+^                        ^
+^                        ^
+the string they capture is "ab(cd)ef", the contents of  the  top  level
+parentheses.  If there are more than 15 capturing parentheses in a pat-
+tern, PCRE has to obtain extra memory to store data during a recursion,
+which  it  does  by  using pcre_malloc, freeing it via pcre_free after-
+wards. If  no  memory  can  be  obtained,  the  match  fails  with  the
+PCRE_ERROR_NOMEMORY error.
+Do  not  confuse  the (?R) item with the condition (R), which tests for
+recursion.  Consider this pattern, which matches text in  angle  brack-
+ets,  allowing for arbitrary nesting. Only digits are allowed in nested
+brackets (that is, when recursing), whereas any characters are  permit-
+ted at the outer level.
+< (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
+In  this  pattern, (?(R) is the start of a conditional subpattern, with
+two different alternatives for the recursive and  non-recursive  cases.
+The (?R) item is the actual recursive call.
+SUBPATTERNS AS SUBROUTINES
+If the syntax for a recursive subpattern reference (either by number or
+by name) is used outside the parentheses to which it refers,  it  oper-
+ates  like a subroutine in a programming language. The "called" subpat-
+tern may be defined before or after the reference. A numbered reference
+can be absolute or relative, as in these examples:
+(...(absolute)...)...(?2)...
+(...(relative)...)...(?-1)...
+(...(?+1)...(relative)...
+An earlier example pointed out that the pattern
+(sens|respons)e and \1ibility
+matches  "sense and sensibility" and "response and responsibility", but
+not "sense and responsibility". If instead the pattern
+(sens|respons)e and (?1)ibility
+is used, it does match "sense and responsibility" as well as the  other
+two  strings.  Another  example  is  given  in the discussion of DEFINE
+above.
+Like recursive subpatterns, a "subroutine" call is always treated as an
+atomic  group. That is, once it has matched some of the subject string,
+it is never re-entered, even if it contains  untried  alternatives  and
+there is a subsequent matching failure.
+When  a  subpattern is used as a subroutine, processing options such as
+case-independence are fixed when the subpattern is defined. They cannot
+be changed for different calls. For example, consider this pattern:
+(abc)(?i:(?-1))
+It  matches  "abcabc". It does not match "abcABC" because the change of
+processing option does not affect the called subpattern.
+ONIGURUMA SUBROUTINE SYNTAX
+For compatibility with Oniguruma, the non-Perl syntax \g followed by  a
+name or a number enclosed either in angle brackets or single quotes, is
+an alternative syntax for referencing a  subpattern  as  a  subroutine,
+possibly  recursively. Here are two of the examples used above, rewrit-
+ten using this syntax:
+(?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
+(sens|respons)e and \g'1'ibility
+PCRE supports an extension to Oniguruma: if a number is preceded  by  a
+plus or a minus sign it is taken as a relative reference. For example:
+(abc)(?i:\g<-1>)
+Note  that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not
+synonymous. The former is a back reference; the latter is a  subroutine
+call.
+CALLOUTS
+Perl has a feature whereby using the sequence (?{...}) causes arbitrary
+Perl code to be obeyed in the middle of matching a regular  expression.
+This makes it possible, amongst other things, to extract different sub-
+strings that match the same pair of parentheses when there is a repeti-
+tion.
+PCRE provides a similar feature, but of course it cannot obey arbitrary
+Perl code. The feature is called "callout". The caller of PCRE provides
+an  external function by putting its entry point in the global variable
+pcre_callout.  By default, this variable contains NULL, which  disables
+all calling out.
+Within  a  regular  expression,  (?C) indicates the points at which the
+external function is to be called. If you want  to  identify  different
+callout  points, you can put a number less than 256 after the letter C.
+The default value is zero.  For example, this pattern has  two  callout
+points:
+(?C1)abc(?C2)def
+If the PCRE_AUTO_CALLOUT flag is passed to pcre_compile(), callouts are
+automatically installed before each item in the pattern. They  are  all
+numbered 255.
+During matching, when PCRE reaches a callout point (and pcre_callout is
+set), the external function is called. It is provided with  the  number
+of  the callout, the position in the pattern, and, optionally, one item
+of data originally supplied by the caller of pcre_exec().  The  callout
+function  may cause matching to proceed, to backtrack, or to fail alto-
+gether. A complete description of the interface to the callout function
+is given in the pcrecallout documentation.
+BACKTRACKING CONTROL
+Perl  5.10 introduced a number of "Special Backtracking Control Verbs",
+which are described in the Perl documentation as "experimental and sub-
+ject  to  change or removal in a future version of Perl". It goes on to
+say: "Their usage in production code should be noted to avoid  problems
+during upgrades." The same remarks apply to the PCRE features described
+in this section.
+Since these verbs are specifically related  to  backtracking,  most  of
+them  can  be  used  only  when  the  pattern  is  to  be matched using
+pcre_exec(), which uses a backtracking algorithm. With the exception of
+(*FAIL), which behaves like a failing negative assertion, they cause an
+error if encountered by pcre_dfa_exec().
+The new verbs make use of what was previously invalid syntax: an  open-
+ing parenthesis followed by an asterisk. In Perl, they are generally of
+the form (*VERB:ARG) but PCRE does not support the use of arguments, so
+its  general  form is just (*VERB). Any number of these verbs may occur
+in a pattern. There are two kinds:
+Verbs that act immediately
+The following verbs act as soon as they are encountered:
+(*ACCEPT)
+This verb causes the match to end successfully, skipping the  remainder
+of  the pattern. When inside a recursion, only the innermost pattern is
+ended immediately. PCRE differs  from  Perl  in  what  happens  if  the
+(*ACCEPT)  is inside capturing parentheses. In Perl, the data so far is
+captured: in PCRE no data is captured. For example:
+A(A|B(*ACCEPT)|C)D
+This matches "AB", "AAD", or "ACD", but when it matches "AB",  no  data
+is captured.
+(*FAIL) or (*F)
+This  verb  causes the match to fail, forcing backtracking to occur. It
+is equivalent to (?!) but easier to read. The Perl documentation  notes
+that  it  is  probably  useful only when combined with (?{}) or (??{}).
+Those are, of course, Perl features that are not present in  PCRE.  The
+nearest  equivalent is the callout feature, as for example in this pat-
+tern:
+a+(?C)(*FAIL)
+A match with the string "aaaa" always fails, but the callout  is  taken
+before each backtrack happens (in this example, 10 times).
+Verbs that act after backtracking
+The following verbs do nothing when they are encountered. Matching con-
+tinues with what follows, but if there is no subsequent match, a  fail-
+ure  is  forced.   The  verbs  differ  in  exactly what kind of failure
+occurs.
+(*COMMIT)
+This verb causes the whole match to fail outright if the  rest  of  the
+pattern  does  not match. Even if the pattern is unanchored, no further
+attempts to find a match by advancing the start point take place.  Once
+(*COMMIT)  has been passed, pcre_exec() is committed to finding a match
+at the current starting point, or not at all. For example:
+a+(*COMMIT)b
+This matches "xxaab" but not "aacaab". It can be thought of as  a  kind
+of dynamic anchor, or "I've started, so I must finish."
+(*PRUNE)
+This  verb causes the match to fail at the current position if the rest
+of the pattern does not match. If the pattern is unanchored, the normal
+"bumpalong"  advance to the next starting character then happens. Back-
+tracking can occur as usual to the left of (*PRUNE), or  when  matching
+to  the right of (*PRUNE), but if there is no match to the right, back-
+tracking cannot cross (*PRUNE).  In simple cases, the use  of  (*PRUNE)
+is just an alternative to an atomic group or possessive quantifier, but
+there are some uses of (*PRUNE) that cannot be expressed in  any  other
+way.
+(*SKIP)
+This  verb  is like (*PRUNE), except that if the pattern is unanchored,
+the "bumpalong" advance is not to the next character, but to the  posi-
+tion  in  the  subject where (*SKIP) was encountered. (*SKIP) signifies
+that whatever text was matched leading up to it cannot  be  part  of  a
+successful match. Consider:
+a+(*SKIP)b
+If  the  subject  is  "aaaac...",  after  the first match attempt fails
+(starting at the first character in the  string),  the  starting  point
+skips on to start the next attempt at "c". Note that a possessive quan-
+tifer does not have the same effect in this example; although it  would
+suppress  backtracking  during  the  first  match  attempt,  the second
+attempt would start at the second character instead of skipping  on  to
+"c".
+(*THEN)
+This verb causes a skip to the next alternation if the rest of the pat-
+tern does not match. That is, it cancels pending backtracking, but only
+within  the  current  alternation.  Its name comes from the observation
+that it can be used for a pattern-based if-then-else block:
+( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
+If the COND1 pattern matches, FOO is tried (and possibly further  items
+after  the  end  of  the group if FOO succeeds); on failure the matcher
+skips to the second alternative and tries COND2,  without  backtracking
+into  COND1.  If  (*THEN)  is  used outside of any alternation, it acts
+exactly like (*PRUNE).
+SEE ALSO
+pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 19 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCRESYNTAX(3)                                                    PCRESYNTAX(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE REGULAR EXPRESSION SYNTAX SUMMARY
+The  full syntax and semantics of the regular expressions that are sup-
+ported by PCRE are described in  the  pcrepattern  documentation.  This
+document contains just a quick-reference summary of the syntax.
+QUOTING
+\x         where x is non-alphanumeric is a literal x
+\Q...\E    treat enclosed characters as literal
+CHARACTERS
+\a         alarm, that is, the BEL character (hex 07)
+\cx        "control-x", where x is any character
+\e         escape (hex 1B)
+\f         formfeed (hex 0C)
+\n         newline (hex 0A)
+\r         carriage return (hex 0D)
+\t         tab (hex 09)
+\ddd       character with octal code ddd, or backreference
+\xhh       character with hex code hh
+\x{hhh..}  character with hex code hhh..
+CHARACTER TYPES
+.          any character except newline;
+in dotall mode, any character whatsoever
+\C         one byte, even in UTF-8 mode (best avoided)
+\d         a decimal digit
+\D         a character that is not a decimal digit
+\h         a horizontal whitespace character
+\H         a character that is not a horizontal whitespace character
+\p{xx}     a character with the xx property
+\P{xx}     a character without the xx property
+\R         a newline sequence
+\s         a whitespace character
+\S         a character that is not a whitespace character
+\v         a vertical whitespace character
+\V         a character that is not a vertical whitespace character
+\w         a "word" character
+\W         a "non-word" character
+\X         an extended Unicode sequence
+In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+GENERAL CATEGORY PROPERTY CODES FOR \p and \P
+C          Other
+Cc         Control
+Cf         Format
+Cn         Unassigned
+Co         Private use
+Cs         Surrogate
+L          Letter
+Ll         Lower case letter
+Lm         Modifier letter
+Lo         Other letter
+Lt         Title case letter
+Lu         Upper case letter
+L&         Ll, Lu, or Lt
+M          Mark
+Mc         Spacing mark
+Me         Enclosing mark
+Mn         Non-spacing mark
+N          Number
+Nd         Decimal number
+Nl         Letter number
+No         Other number
+P          Punctuation
+Pc         Connector punctuation
+Pd         Dash punctuation
+Pe         Close punctuation
+Pf         Final punctuation
+Pi         Initial punctuation
+Po         Other punctuation
+Ps         Open punctuation
+S          Symbol
+Sc         Currency symbol
+Sk         Modifier symbol
+Sm         Mathematical symbol
+So         Other symbol
+Z          Separator
+Zl         Line separator
+Zp         Paragraph separator
+Zs         Space separator
+SCRIPT NAMES FOR \p AND \P
+Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
+Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
+Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
+Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira-
+gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
+Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
+Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
+Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
+Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
+CHARACTER CLASSES
+[...]       positive character class
+[^...]      negative character class
+[x-y]       range (can be used for hex characters)
+[[:xxx:]]   positive POSIX named set
+[[:^xxx:]]  negative POSIX named set
+alnum       alphanumeric
+alpha       alphabetic
+ascii       0-127
+blank       space or tab
+cntrl       control character
+digit       decimal digit
+graph       printing, excluding space
+lower       lower case letter
+print       printing, including space
+punct       printing, excluding alphanumeric
+space       whitespace
+upper       upper case letter
+word        same as \w
+xdigit      hexadecimal digit
+In PCRE, POSIX character set names recognize only ASCII characters. You
+can use \Q...\E inside a character class.
+QUANTIFIERS
+?           0 or 1, greedy
+?+          0 or 1, possessive
+??          0 or 1, lazy
+*           0 or more, greedy
+*+          0 or more, possessive
+*?          0 or more, lazy
++           1 or more, greedy
+++          1 or more, possessive
++?          1 or more, lazy
+{n}         exactly n
+{n,m}       at least n, no more than m, greedy
+{n,m}+      at least n, no more than m, possessive
+{n,m}?      at least n, no more than m, lazy
+{n,}        n or more, greedy
+{n,}+       n or more, possessive
+{n,}?       n or more, lazy
+ANCHORS AND SIMPLE ASSERTIONS
+\b          word boundary
+\B          not a word boundary
+^           start of subject
+also after internal newline in multiline mode
+\A          start of subject
+$           end of subject
+also before newline at end of subject
+also before internal newline in multiline mode
+\Z          end of subject
+also before newline at end of subject
+\z          end of subject
+\G          first matching position in subject
+MATCH POINT RESET
+\K          reset start of match
+ALTERNATION
+expr|expr|expr...
+CAPTURING
+(...)          capturing group
+(?<name>...)   named capturing group (Perl)
+(?'name'...)   named capturing group (Perl)
+(?P<name>...)  named capturing group (Python)
+(?:...)        non-capturing group
+(?|...)        non-capturing group; reset group numbers for
+capturing groups in each alternative
+ATOMIC GROUPS
+(?>...)        atomic, non-capturing group
+COMMENT
+(?#....)       comment (not nestable)
+OPTION SETTING
+(?i)           caseless
+(?J)           allow duplicate names
+(?m)           multiline
+(?s)           single line (dotall)
+(?U)           default ungreedy (lazy)
+(?x)           extended (ignore white space)
+(?-...)        unset option(s)
+LOOKAHEAD AND LOOKBEHIND ASSERTIONS
+(?=...)        positive look ahead
+(?!...)        negative look ahead
+(?<=...)       positive look behind
+(?<!...)       negative look behind
+Each top-level branch of a look behind must be of a fixed length.
+BACKREFERENCES
+\n             reference by number (can be ambiguous)
+\gn            reference by number
+\g{n}          reference by number
+\g{-n}         relative reference by number
+\k<name>       reference by name (Perl)
+\k'name'       reference by name (Perl)
+\g{name}       reference by name (Perl)
+\k{name}       reference by name (.NET)
+(?P=name)      reference by name (Python)
+SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
+(?R)           recurse whole pattern
+(?n)           call subpattern by absolute number
+(?+n)          call subpattern by relative number
+(?-n)          call subpattern by relative number
+(?&name)       call subpattern by name (Perl)
+(?P>name)      call subpattern by name (Python)
+\g<name>       call subpattern by name (Oniguruma)
+\g'name'       call subpattern by name (Oniguruma)
+\g<n>          call subpattern by absolute number (Oniguruma)
+\g'n'          call subpattern by absolute number (Oniguruma)
+\g<+n>         call subpattern by relative number (PCRE extension)
+\g'+n'         call subpattern by relative number (PCRE extension)
+\g<-n>         call subpattern by relative number (PCRE extension)
+\g'-n'         call subpattern by relative number (PCRE extension)
+CONDITIONAL PATTERNS
+(?(condition)yes-pattern)
+(?(condition)yes-pattern|no-pattern)
+(?(n)...       absolute reference condition
+(?(+n)...      relative reference condition
+(?(-n)...      relative reference condition
+(?(<name>)...  named reference condition (Perl)
+(?('name')...  named reference condition (Perl)
+(?(name)...    named reference condition (PCRE)
+(?(R)...       overall recursion condition
+(?(Rn)...      specific group recursion condition
+(?(R&name)...  specific recursion condition
+(?(DEFINE)...  define subpattern for reference
+(?(assert)...  assertion condition
+BACKTRACKING CONTROL
+The following act immediately they are reached:
+(*ACCEPT)      force successful match
+(*FAIL)        force backtrack; synonym (*F)
+The following act only when a subsequent match failure causes  a  back-
+track to reach them. They all force a match failure, but they differ in
+what happens afterwards. Those that advance the start-of-match point do
+so only if the pattern is not anchored.
+(*COMMIT)      overall failure, no advance of starting point
+(*PRUNE)       advance to next starting character
+(*SKIP)        advance start to current matching position
+(*THEN)        local failure, backtrack to next alternation
+NEWLINE CONVENTIONS
+These  are  recognized only at the very start of the pattern or after a
+(*BSR_...) option.
+(*CR)
+(*LF)
+(*CRLF)
+(*ANYCRLF)
+(*ANY)
+WHAT \R MATCHES
+These are recognized only at the very start of the pattern or  after  a
+(*...) option that sets the newline convention.
+(*BSR_ANYCRLF)
+(*BSR_UNICODE)
+CALLOUTS
+(?C)      callout
+(?Cn)     callout with data n
+SEE ALSO
+pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 09 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCREPARTIAL(3)                                                  PCREPARTIAL(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PARTIAL MATCHING IN PCRE
+In  normal  use  of  PCRE,  if  the  subject  string  that is passed to
+pcre_exec() or pcre_dfa_exec() matches as far as it goes,  but  is  too
+short  to  match  the  entire  pattern, PCRE_ERROR_NOMATCH is returned.
+There are circumstances where it might be helpful to  distinguish  this
+case from other cases in which there is no match.
+Consider, for example, an application where a human is required to type
+in data for a field with specific formatting requirements.  An  example
+might be a date in the form ddmmmyy, defined by this pattern:
+^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
+If the application sees the user's keystrokes one by one, and can check
+that what has been typed so far is potentially valid,  it  is  able  to
+raise  an  error as soon as a mistake is made, possibly beeping and not
+reflecting the character that has been typed. This  immediate  feedback
+is  likely  to  be a better user interface than a check that is delayed
+until the entire string has been entered.
+PCRE supports the concept of partial matching by means of the PCRE_PAR-
+TIAL   option,   which   can   be   set  when  calling  pcre_exec()  or
+pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
+PCRE_ERROR_NOMATCH  is converted into PCRE_ERROR_PARTIAL if at any time
+during the matching process the last part of the subject string matched
+part  of  the  pattern. Unfortunately, for non-anchored matching, it is
+not possible to obtain the position of the start of the partial  match.
+No captured data is set when PCRE_ERROR_PARTIAL is returned.
+When   PCRE_PARTIAL   is  set  for  pcre_dfa_exec(),  the  return  code
+PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the  end  of
+the  subject is reached, there have been no complete matches, but there
+is still at least one matching possibility. The portion of  the  string
+that provided the partial match is set as the first matching string.
+Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
+the last literal byte in a pattern, and abandons  matching  immediately
+if  such a byte is not present in the subject string. This optimization
+cannot be used for a subject string that might match only partially.
+RESTRICTED PATTERNS FOR PCRE_PARTIAL
+Because of the way certain internal optimizations  are  implemented  in
+the  pcre_exec()  function, the PCRE_PARTIAL option cannot be used with
+all patterns. These restrictions do not apply when  pcre_dfa_exec()  is
+used.  For pcre_exec(), repeated single characters such as
+a{2,4}
+and repeated single metasequences such as
+\d+
+are  not permitted if the maximum number of occurrences is greater than
+one.  Optional items such as \d? (where the maximum is one) are permit-
+ted.   Quantifiers  with any values are permitted after parentheses, so
+the invalid examples above can be coded thus:
+(a){2,4}
+(\d)+
+These constructions run more slowly, but for the kinds  of  application
+that  are  envisaged  for this facility, this is not felt to be a major
+restriction.
+If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the
+restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
+(-13).  You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo()  to
+find out if a compiled pattern can be used for partial matching.
+EXAMPLE OF PARTIAL MATCHING USING PCRETEST
+If  the  escape  sequence  \P  is  present in a pcretest data line, the
+PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
+uses the date example quoted above:
+re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+data> 25jun04\P
+0: 25jun04
+1: jun
+data> 25dec3\P
+Partial match
+data> 3ju\P
+Partial match
+data> 3juj\P
+No match
+data> j\P
+No match
+The  first  data  string  is  matched completely, so pcretest shows the
+matched substrings. The remaining four strings do not  match  the  com-
+plete  pattern,  but  the first two are partial matches. The same test,
+using pcre_dfa_exec() matching (by means of the  \D  escape  sequence),
+produces the following output:
+re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+data> 25jun04\P\D
+0: 25jun04
+data> 23dec3\P\D
+Partial match: 23dec3
+data> 3ju\P\D
+Partial match: 3ju
+data> 3juj\P\D
+No match
+data> j\P\D
+No match
+Notice  that in this case the portion of the string that was matched is
+made available.
+MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()
+When a partial match has been found using pcre_dfa_exec(), it is possi-
+ble  to  continue  the  match  by providing additional subject data and
+calling pcre_dfa_exec() again with the same  compiled  regular  expres-
+sion, this time setting the PCRE_DFA_RESTART option. You must also pass
+the same working space as before, because this is where details of  the
+previous  partial  match are stored. Here is an example using pcretest,
+using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
+\D are as above):
+re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+data> 23ja\P\D
+Partial match: 23ja
+data> n05\R\D
+0: n05
+The  first  call has "23ja" as the subject, and requests partial match-
+ing; the second call  has  "n05"  as  the  subject  for  the  continued
+(restarted)  match.   Notice  that when the match is complete, only the
+last part is shown; PCRE does  not  retain  the  previously  partially-
+matched  string. It is up to the calling program to do that if it needs
+to.
+You can set PCRE_PARTIAL  with  PCRE_DFA_RESTART  to  continue  partial
+matching over multiple segments. This facility can be used to pass very
+long subject strings to pcre_dfa_exec(). However, some care  is  needed
+for certain types of pattern.
+1.  If  the  pattern contains tests for the beginning or end of a line,
+you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri-
+ate,  when  the subject string for any call does not contain the begin-
+ning or end of a line.
+2. If the pattern contains backward assertions (including  \b  or  \B),
+you  need  to  arrange for some overlap in the subject strings to allow
+for this. For example, you could pass the subject in  chunks  that  are
+500  bytes long, but in a buffer of 700 bytes, with the starting offset
+set to 200 and the previous 200 bytes at the start of the buffer.
+3. Matching a subject string that is split into multiple segments  does
+not  always produce exactly the same result as matching over one single
+long string.  The difference arises when there  are  multiple  matching
+possibilities,  because a partial match result is given only when there
+are no completed matches in a call to pcre_dfa_exec(). This means  that
+as  soon  as  the  shortest match has been found, continuation to a new
+subject segment is no longer possible.  Consider this pcretest example:
+re> /dog(sbody)?/
+data> do\P\D
+Partial match: do
+data> gsb\R\P\D
+0: g
+data> dogsbody\D
+0: dogsbody
+1: dog
+The  pattern matches the words "dog" or "dogsbody". When the subject is
+presented in several parts ("do" and "gsb" being  the  first  two)  the
+match  stops  when "dog" has been found, and it is not possible to con-
+tinue. On the other hand,  if  "dogsbody"  is  presented  as  a  single
+string, both matches are found.
+Because  of  this  phenomenon,  it does not usually make sense to end a
+pattern that is going to be matched in this way with a variable repeat.
+4. Patterns that contain alternatives at the top level which do not all
+start with the same pattern item may not work as expected. For example,
+consider this pattern:
+1234|3789
+If  the  first  part of the subject is "ABC123", a partial match of the
+first alternative is found at offset 3. There is no partial  match  for
+the second alternative, because such a match does not start at the same
+point in the subject string. Attempting to  continue  with  the  string
+"789" does not yield a match because only those alternatives that match
+at one point in the subject are remembered. The problem arises  because
+the  start  of the second alternative matches within the first alterna-
+tive. There is no problem with anchored patterns or patterns such as:
+1234|ABCD
+where no string can be a partial match for both alternatives.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 04 June 2007
+Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+PCREPRECOMPILE(3)                                            PCREPRECOMPILE(3)
+NAME
+PCRE - Perl-compatible regular expressions
+SAVING AND RE-USING PRECOMPILED PCRE PATTERNS
+If  you  are running an application that uses a large number of regular
+expression patterns, it may be useful to store them  in  a  precompiled
+form  instead  of  having to compile them every time the application is
+run.  If you are not  using  any  private  character  tables  (see  the
+pcre_maketables()  documentation),  this is relatively straightforward.
+If you are using private tables, it is a little bit more complicated.
+If you save compiled patterns to a file, you can copy them to a differ-
+ent  host  and  run them there. This works even if the new host has the
+opposite endianness to the one on which  the  patterns  were  compiled.
+There  may  be a small performance penalty, but it should be insignifi-
+cant. However, compiling regular expressions with one version  of  PCRE
+for  use  with  a  different  version is not guaranteed to work and may
+cause crashes.
+SAVING A COMPILED PATTERN
+The value returned by pcre_compile() points to a single block of memory
+that  holds  the compiled pattern and associated data. You can find the
+length of this block in bytes by calling pcre_fullinfo() with an  argu-
+ment  of  PCRE_INFO_SIZE. You can then save the data in any appropriate
+manner. Here is sample code that compiles a pattern and writes it to  a
+file. It assumes that the variable fd refers to a file that is open for
+output:
+int erroroffset, rc, size;
+char *error;
+pcre *re;
+re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
+if (re == NULL) { ... handle errors ... }
+rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
+if (rc < 0) { ... handle errors ... }
+rc = fwrite(re, 1, size, fd);
+if (rc != size) { ... handle errors ... }
+In this example, the bytes  that  comprise  the  compiled  pattern  are
+copied  exactly.  Note that this is binary data that may contain any of
+the 256 possible byte  values.  On  systems  that  make  a  distinction
+between binary and non-binary data, be sure that the file is opened for
+binary output.
+If you want to write more than one pattern to a file, you will have  to
+devise  a  way of separating them. For binary data, preceding each pat-
+tern with its length is probably  the  most  straightforward  approach.
+Another  possibility is to write out the data in hexadecimal instead of
+binary, one pattern to a line.
+Saving compiled patterns in a file is only one possible way of  storing
+them  for later use. They could equally well be saved in a database, or
+in the memory of some daemon process that passes them  via  sockets  to
+the processes that want them.
+If  the pattern has been studied, it is also possible to save the study
+data in a similar way to the compiled  pattern  itself.  When  studying
+generates  additional  information, pcre_study() returns a pointer to a
+pcre_extra data block. Its format is defined in the section on matching
+a  pattern in the pcreapi documentation. The study_data field points to
+the binary study data,  and  this  is  what  you  must  save  (not  the
+pcre_extra  block itself). The length of the study data can be obtained
+by calling pcre_fullinfo() with  an  argument  of  PCRE_INFO_STUDYSIZE.
+Remember  to check that pcre_study() did return a non-NULL value before
+trying to save the study data.
+RE-USING A PRECOMPILED PATTERN
+Re-using a precompiled pattern is straightforward. Having  reloaded  it
+into   main   memory,   you   pass   its   pointer  to  pcre_exec()  or
+pcre_dfa_exec() in the usual way. This  should  work  even  on  another
+host,  and  even  if  that  host has the opposite endianness to the one
+where the pattern was compiled.
+However, if you passed a pointer to custom character  tables  when  the
+pattern  was  compiled  (the  tableptr argument of pcre_compile()), you
+must now pass a similar  pointer  to  pcre_exec()  or  pcre_dfa_exec(),
+because  the  value  saved  with the compiled pattern will obviously be
+nonsense. A field in a pcre_extra() block is used to pass this data, as
+described  in the section on matching a pattern in the pcreapi documen-
+tation.
+If you did not provide custom character tables  when  the  pattern  was
+compiled,  the  pointer  in  the compiled pattern is NULL, which causes
+pcre_exec() to use PCRE's internal tables. Thus, you  do  not  need  to
+take any special action at run time in this case.
+If  you  saved study data with the compiled pattern, you need to create
+your own pcre_extra data block and set the study_data field to point to
+the  reloaded  study  data. You must also set the PCRE_EXTRA_STUDY_DATA
+bit in the flags field to indicate that study  data  is  present.  Then
+pass  the  pcre_extra  block  to  pcre_exec() or pcre_dfa_exec() in the
+usual way.
+COMPATIBILITY WITH DIFFERENT PCRE RELEASES
+In general, it is safest to  recompile  all  saved  patterns  when  you
+update  to  a new PCRE release, though not all updates actually require
+this. Recompiling is definitely needed for release 7.2.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 13 June 2007
+Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+PCREPERFORM(3)                                                  PCREPERFORM(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE PERFORMANCE
+Two  aspects  of performance are discussed below: memory usage and pro-
+cessing time. The way you express your pattern as a regular  expression
+can affect both of them.
+MEMORY USAGE
+Patterns are compiled by PCRE into a reasonably efficient byte code, so
+that most simple patterns do not use much memory. However, there is one
+case where memory usage can be unexpectedly large. When a parenthesized
+subpattern has a quantifier with a minimum greater than 1 and/or a lim-
+ited  maximum,  the  whole subpattern is repeated in the compiled code.
+For example, the pattern
+(abc|def){2,4}
+is compiled as if it were
+(abc|def)(abc|def)((abc|def)(abc|def)?)?
+(Technical aside: It is done this way so that backtrack  points  within
+each of the repetitions can be independently maintained.)
+For  regular expressions whose quantifiers use only small numbers, this
+is not usually a problem. However, if the numbers are large,  and  par-
+ticularly  if  such repetitions are nested, the memory usage can become
+an embarrassment. For example, the very simple pattern
+((ab){1,1000}c){1,3}
+uses 51K bytes when compiled. When PCRE is compiled  with  its  default
+internal  pointer  size of two bytes, the size limit on a compiled pat-
+tern is 64K, and this is reached with the above pattern  if  the  outer
+repetition is increased from 3 to 4. PCRE can be compiled to use larger
+internal pointers and thus handle larger compiled patterns, but  it  is
+better to try to rewrite your pattern to use less memory if you can.
+One  way  of reducing the memory usage for such patterns is to make use
+of PCRE's "subroutine" facility. Re-writing the above pattern as
+((ab)(?2){0,999}c)(?1){0,2}
+reduces the memory requirements to 18K, and indeed it remains under 20K
+even  with the outer repetition increased to 100. However, this pattern
+is not exactly equivalent, because the "subroutine" calls  are  treated
+as  atomic groups into which there can be no backtracking if there is a
+subsequent matching failure. Therefore, PCRE cannot  do  this  kind  of
+rewriting  automatically.   Furthermore,  there is a noticeable loss of
+speed when executing the modified pattern. Nevertheless, if the  atomic
+grouping  is  not  a  problem and the loss of speed is acceptable, this
+kind of rewriting will allow you to process patterns that  PCRE  cannot
+otherwise handle.
+PROCESSING TIME
+Certain  items  in regular expression patterns are processed more effi-
+ciently than others. It is more efficient to use a character class like
+[aeiou]   than   a   set   of  single-character  alternatives  such  as
+(a|e|i|o|u). In general, the simplest construction  that  provides  the
+required behaviour is usually the most efficient. Jeffrey Friedl's book
+contains a lot of useful general discussion  about  optimizing  regular
+expressions  for  efficient  performance.  This document contains a few
+observations about PCRE.
+Using Unicode character properties (the \p,  \P,  and  \X  escapes)  is
+slow,  because PCRE has to scan a structure that contains data for over
+fifteen thousand characters whenever it needs a  character's  property.
+If  you  can  find  an  alternative pattern that does not use character
+properties, it will probably be faster.
+When a pattern begins with .* not in  parentheses,  or  in  parentheses
+that are not the subject of a backreference, and the PCRE_DOTALL option
+is set, the pattern is implicitly anchored by PCRE, since it can  match
+only  at  the start of a subject string. However, if PCRE_DOTALL is not
+set, PCRE cannot make this optimization, because  the  .  metacharacter
+does  not then match a newline, and if the subject string contains new-
+lines, the pattern may match from the character  immediately  following
+one of them instead of from the very start. For example, the pattern
+.*second
+matches  the subject "first\nand second" (where \n stands for a newline
+character), with the match starting at the seventh character. In  order
+to do this, PCRE has to retry the match starting after every newline in
+the subject.
+If you are using such a pattern with subject strings that do  not  con-
+tain newlines, the best performance is obtained by setting PCRE_DOTALL,
+or starting the pattern with ^.* or ^.*? to indicate  explicit  anchor-
+ing.  That saves PCRE from having to scan along the subject looking for
+a newline to restart at.
+Beware of patterns that contain nested indefinite  repeats.  These  can
+take  a  long time to run when applied to a string that does not match.
+Consider the pattern fragment
+^(a+)*
+This can match "aaaa" in 16 different ways, and this  number  increases
+very  rapidly  as the string gets longer. (The * repeat can match 0, 1,
+2, 3, or 4 times, and for each of those cases other than 0 or 4, the  +
+repeats  can  match  different numbers of times.) When the remainder of
+the pattern is such that the entire match is going to fail, PCRE has in
+principle  to  try  every  possible  variation,  and  this  can take an
+extremely long time, even for relatively short strings.
+An optimization catches some of the more simple cases such as
+(a+)*b
+where a literal character follows. Before  embarking  on  the  standard
+matching  procedure,  PCRE checks that there is a "b" later in the sub-
+ject string, and if there is not, it fails the match immediately.  How-
+ever,  when  there  is no following literal this optimization cannot be
+used. You can see the difference by comparing the behaviour of
+(a+)*\d
+with the pattern above. The former gives  a  failure  almost  instantly
+when  applied  to  a  whole  line of "a" characters, whereas the latter
+takes an appreciable time with strings longer than about 20 characters.
+In many cases, the solution to this kind of performance issue is to use
+an atomic group or a possessive quantifier.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 06 March 2007
+Copyright (c) 1997-2007 University of Cambridge.
+------------------------------------------------------------------------------
+PCREPOSIX(3)                                                      PCREPOSIX(3)
+NAME
+PCRE - Perl-compatible regular expressions.
+SYNOPSIS OF POSIX API
+#include <pcreposix.h>
+int regcomp(regex_t *preg, const char *pattern,
+int cflags);
+int regexec(regex_t *preg, const char *string,
+size_t nmatch, regmatch_t pmatch[], int eflags);
+size_t regerror(int errcode, const regex_t *preg,
+char *errbuf, size_t errbuf_size);
+void regfree(regex_t *preg);
+DESCRIPTION
+This  set  of  functions provides a POSIX-style API to the PCRE regular
+expression package. See the pcreapi documentation for a description  of
+PCRE's native API, which contains much additional functionality.
+The functions described here are just wrapper functions that ultimately
+call  the  PCRE  native  API.  Their  prototypes  are  defined  in  the
+pcreposix.h  header  file,  and  on  Unix systems the library itself is
+called pcreposix.a, so can be accessed by  adding  -lpcreposix  to  the
+command  for  linking  an application that uses them. Because the POSIX
+functions call the native ones, it is also necessary to add -lpcre.
+I have implemented only those option bits that can be reasonably mapped
+to PCRE native options. In addition, the option REG_EXTENDED is defined
+with the value zero. This has no effect, but since  programs  that  are
+written  to  the  POSIX interface often use it, this makes it easier to
+slot in PCRE as a replacement library. Other POSIX options are not even
+defined.
+When  PCRE  is  called  via these functions, it is only the API that is
+POSIX-like in style. The syntax and semantics of  the  regular  expres-
+sions  themselves  are  still  those of Perl, subject to the setting of
+various PCRE options, as described below. "POSIX-like in  style"  means
+that  the  API  approximates  to  the POSIX definition; it is not fully
+POSIX-compatible, and in multi-byte encoding  domains  it  is  probably
+even less compatible.
+The  header for these functions is supplied as pcreposix.h to avoid any
+potential clash with other POSIX  libraries.  It  can,  of  course,  be
+renamed or aliased as regex.h, which is the "correct" name. It provides
+two structure types, regex_t for  compiled  internal  forms,  and  reg-
+match_t  for  returning  captured substrings. It also defines some con-
+stants whose names start  with  "REG_";  these  are  used  for  setting
+options and identifying error codes.
+COMPILING A PATTERN
+The  function regcomp() is called to compile a pattern into an internal
+form. The pattern is a C string terminated by a  binary  zero,  and  is
+passed  in  the  argument  pattern. The preg argument is a pointer to a
+regex_t structure that is used as a base for storing information  about
+the compiled regular expression.
+The argument cflags is either zero, or contains one or more of the bits
+defined by the following macros:
+REG_DOTALL
+The PCRE_DOTALL option is set when the regular expression is passed for
+compilation to the native function. Note that REG_DOTALL is not part of
+the POSIX standard.
+REG_ICASE
+The PCRE_CASELESS option is set when the regular expression  is  passed
+for compilation to the native function.
+REG_NEWLINE
+The  PCRE_MULTILINE option is set when the regular expression is passed
+for compilation to the native function. Note that this does  not  mimic
+the  defined  POSIX  behaviour  for REG_NEWLINE (see the following sec-
+tion).
+REG_NOSUB
+The PCRE_NO_AUTO_CAPTURE option is set when the regular  expression  is
+passed for compilation to the native function. In addition, when a pat-
+tern that is compiled with this flag is passed to regexec() for  match-
+ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
+strings are returned.
+REG_UTF8
+The PCRE_UTF8 option is set when the regular expression is  passed  for
+compilation  to the native function. This causes the pattern itself and
+all data strings used for matching it to be treated as  UTF-8  strings.
+Note that REG_UTF8 is not part of the POSIX standard.
+In  the  absence  of  these  flags, no options are passed to the native
+function.  This means the the  regex  is  compiled  with  PCRE  default
+semantics.  In particular, the way it handles newline characters in the
+subject string is the Perl way, not the POSIX way.  Note  that  setting
+PCRE_MULTILINE  has only some of the effects specified for REG_NEWLINE.
+It does not affect the way newlines are matched by . (they  aren't)  or
+by a negative class such as [^a] (they are).
+The  yield of regcomp() is zero on success, and non-zero otherwise. The
+preg structure is filled in on success, and one member of the structure
+is  public: re_nsub contains the number of capturing subpatterns in the
+regular expression. Various error codes are defined in the header file.
+MATCHING NEWLINE CHARACTERS
+This area is not simple, because POSIX and Perl take different views of
+things.  It is not possible to get PCRE to obey  POSIX  semantics,  but
+then  PCRE was never intended to be a POSIX engine. The following table
+lists the different possibilities for matching  newline  characters  in
+PCRE:
+Default   Change with
+. matches newline          no     PCRE_DOTALL
+newline matches [^a]       yes    not changeable
+$ matches \n at end        yes    PCRE_DOLLARENDONLY
+$ matches \n in middle     no     PCRE_MULTILINE
+^ matches \n in middle     no     PCRE_MULTILINE
+This is the equivalent table for POSIX:
+Default   Change with
+. matches newline          yes    REG_NEWLINE
+newline matches [^a]       yes    REG_NEWLINE
+$ matches \n at end        no     REG_NEWLINE
+$ matches \n in middle     no     REG_NEWLINE
+^ matches \n in middle     no     REG_NEWLINE
+PCRE's behaviour is the same as Perl's, except that there is no equiva-
+lent for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl,  there  is
+no way to stop newline from matching [^a].
+The   default  POSIX  newline  handling  can  be  obtained  by  setting
+PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is no way to  make  PCRE
+behave exactly as for the REG_NEWLINE action.
+MATCHING A PATTERN
+The  function  regexec()  is  called  to  match a compiled pattern preg
+against a given string, which is by default terminated by a  zero  byte
+(but  see  REG_STARTEND below), subject to the options in eflags. These
+can be:
+REG_NOTBOL
+The PCRE_NOTBOL option is set when calling the underlying PCRE matching
+function.
+REG_NOTEOL
+The PCRE_NOTEOL option is set when calling the underlying PCRE matching
+function.
+REG_STARTEND
+The string is considered to start at string +  pmatch[0].rm_so  and  to
+have  a terminating NUL located at string + pmatch[0].rm_eo (there need
+not actually be a NUL at that location), regardless  of  the  value  of
+nmatch.  This  is a BSD extension, compatible with but not specified by
+IEEE Standard 1003.2 (POSIX.2), and should  be  used  with  caution  in
+software intended to be portable to other systems. Note that a non-zero
+rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
+of the string, not how it is matched.
+If  the pattern was compiled with the REG_NOSUB flag, no data about any
+matched strings  is  returned.  The  nmatch  and  pmatch  arguments  of
+regexec() are ignored.
+Otherwise,the portion of the string that was matched, and also any cap-
+tured substrings, are returned via the pmatch argument, which points to
+an  array  of nmatch structures of type regmatch_t, containing the mem-
+bers rm_so and rm_eo. These contain the offset to the  first  character
+of  each  substring and the offset to the first character after the end
+of each substring, respectively. The 0th element of the vector  relates
+to  the  entire portion of string that was matched; subsequent elements
+relate to the capturing subpatterns of the regular  expression.  Unused
+entries in the array have both structure members set to -1.
+A  successful  match  yields  a  zero  return;  various error codes are
+defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
+failure code.
+ERROR MESSAGES
+The regerror() function maps a non-zero errorcode from either regcomp()
+or regexec() to a printable message. If preg is  not  NULL,  the  error
+should have arisen from the use of that structure. A message terminated
+by a binary zero is placed  in  errbuf.  The  length  of  the  message,
+including  the  zero, is limited to errbuf_size. The yield of the func-
+tion is the size of buffer needed to hold the whole message.
+MEMORY USAGE
+Compiling a regular expression causes memory to be allocated and  asso-
+ciated  with  the preg structure. The function regfree() frees all such
+memory, after which preg may no longer be used as  a  compiled  expres-
+sion.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 05 April 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCRECPP(3)                                                          PCRECPP(3)
+NAME
+PCRE - Perl-compatible regular expressions.
+SYNOPSIS OF C++ WRAPPER
+#include <pcrecpp.h>
+DESCRIPTION
+The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
+functionality was added by Giuseppe Maxia. This brief man page was con-
+structed  from  the  notes  in the pcrecpp.h file, which should be con-
+sulted for further details.
+MATCHING INTERFACE
+The "FullMatch" operation checks that supplied text matches a  supplied
+pattern  exactly.  If pointer arguments are supplied, it copies matched
+sub-strings that match sub-patterns into them.
+Example: successful match
+pcrecpp::RE re("h.*o");
+re.FullMatch("hello");
+Example: unsuccessful match (requires full match):
+pcrecpp::RE re("e");
+!re.FullMatch("hello");
+Example: creating a temporary RE object:
+pcrecpp::RE("h.*o").FullMatch("hello");
+You can pass in a "const char*" or a "string" for "text". The  examples
+below  tend to use a const char*. You can, as in the different examples
+above, store the RE object explicitly in a variable or use a  temporary
+RE  object.  The  examples below use one mode or the other arbitrarily.
+Either could correctly be used for any of these examples.
+You must supply extra pointer arguments to extract matched subpieces.
+Example: extracts "ruby" into "s" and 1234 into "i"
+int i;
+string s;
+pcrecpp::RE re("(\\w+):(\\d+)");
+re.FullMatch("ruby:1234", &s, &i);
+Example: does not try to extract any extra sub-patterns
+re.FullMatch("ruby:1234", &s);
+Example: does not try to extract into NULL
+re.FullMatch("ruby:1234", NULL, &i);
+Example: integer overflow causes failure
+!re.FullMatch("ruby:1234567891234", NULL, &i);
+Example: fails because there aren't enough sub-patterns:
+!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
+Example: fails because string cannot be stored in integer
+!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
+The provided pointer arguments can be pointers to  any  scalar  numeric
+type, or one of:
+string        (matched piece is copied to string)
+StringPiece   (StringPiece is mutated to point to matched piece)
+T             (where "bool T::ParseFrom(const char*, int)" exists)
+NULL          (the corresponding matched sub-pattern is not copied)
+The  function returns true iff all of the following conditions are sat-
+isfied:
+a. "text" matches "pattern" exactly;
+b. The number of matched sub-patterns is >= number of supplied
+pointers;
+c. The "i"th argument has a suitable type for holding the
+string captured as the "i"th sub-pattern. If you pass in
+void * NULL for the "i"th argument, or a non-void * NULL
+of the correct type, or pass fewer arguments than the
+number of sub-patterns, "i"th captured sub-pattern is
+ignored.
+CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
+string  is  assigned  the  empty  string. Therefore, the following will
+return false (because the empty string is not a valid number):
+int number;
+pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
+The matching interface supports at most 16 arguments per call.  If  you
+need    more,    consider    using    the    more   general   interface
+pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
+QUOTING METACHARACTERS
+You can use the "QuoteMeta" operation to insert backslashes before  all
+potentially  meaningful  characters  in  a string. The returned string,
+used as a regular expression, will exactly match the original string.
+Example:
+string quoted = RE::QuoteMeta(unquoted);
+Note that it's legal to escape a character even if it  has  no  special
+meaning  in  a  regular expression -- so this function does that. (This
+also makes it identical to the perl function  of  the  same  name;  see
+"perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
+"1\.5\-2\.0\?".
+PARTIAL MATCHES
+You can use the "PartialMatch" operation when you want the  pattern  to
+match any substring of the text.
+Example: simple search for a string:
+pcrecpp::RE("ell").PartialMatch("hello");
+Example: find first number in a string:
+int number;
+pcrecpp::RE re("(\\d+)");
+re.PartialMatch("x*100 + 20", &number);
+assert(number == 100);
+UTF-8 AND THE MATCHING INTERFACE
+By  default,  pattern  and text are plain text, one byte per character.
+The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
+string to be treated as UTF-8 text, still a byte stream but potentially
+multiple bytes per character. In practice, the text is likelier  to  be
+UTF-8  than  the pattern, but the match returned may depend on the UTF8
+flag, so always use it when matching UTF8 text. For example,  "."  will
+match  one  byte normally but with UTF8 set may match up to three bytes
+of a multi-byte character.
+Example:
+pcrecpp::RE_Options options;
+options.set_utf8();
+pcrecpp::RE re(utf8_pattern, options);
+re.FullMatch(utf8_string);
+Example: using the convenience function UTF8():
+pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
+re.FullMatch(utf8_string);
+NOTE: The UTF8 flag is ignored if pcre was not configured with the
+--enable-utf8 flag.
+PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE
+PCRE defines some modifiers to  change  the  behavior  of  the  regular
+expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
+RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur-
+rently, the following modifiers are supported:
+modifier              description               Perl corresponding
+PCRE_CASELESS         case insensitive match      /i
+PCRE_MULTILINE        multiple lines match        /m
+PCRE_DOTALL           dot matches newlines        /s
+PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
+PCRE_EXTRA            strict escape parsing       N/A
+PCRE_EXTENDED         ignore whitespaces          /x
+PCRE_UTF8             handles UTF8 chars          built-in
+PCRE_UNGREEDY         reverses * and *?           N/A
+PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
+(*)  Both Perl and PCRE allow non capturing parentheses by means of the
+"?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap-
+ture, while (ab|cd) does.
+For  a  full  account on how each modifier works, please check the PCRE
+API reference page.
+For each modifier, there are two member functions whose  name  is  made
+out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
+instance, PCRE_CASELESS is handled by
+bool caseless()
+which returns true if the modifier is set, and
+RE_Options & set_caseless(bool)
+which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
+be  accessed  through  the  set_match_limit()  and match_limit() member
+functions. Setting match_limit to a non-zero value will limit the  exe-
+cution  of pcre to keep it from doing bad things like blowing the stack
+or taking an eternity to return a result.  A  value  of  5000  is  good
+enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
+to  zero  disables  match  limiting.  Alternatively,   you   can   call
+match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
+limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
+matches PCRE does; match_limit_recursion() limits the depth of internal
+recursion, and therefore the amount of stack that is used.
+Normally, to pass one or more modifiers to a RE class,  you  declare  a
+RE_Options object, set the appropriate options, and pass this object to
+a RE constructor. Example:
+RE_options opt;
+opt.set_caseless(true);
+if (RE("HELLO", opt).PartialMatch("hello world")) ...
+RE_options has two constructors. The default constructor takes no argu-
+ments  and creates a set of flags that are off by default. The optional
+parameter option_flags is to facilitate transfer of legacy code from  C
+programs.  This lets you do
+RE(pattern,
+RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
+However, new code is better off doing
+RE(pattern,
+RE_Options().set_caseless(true).set_multiline(true))
+.PartialMatch(str);
+If you are going to pass one of the most used modifiers, there are some
+convenience functions that return a RE_Options class with the appropri-
+ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
+and EXTENDED().
+If you need to set several options at once, and you don't  want  to  go
+through  the pains of declaring a RE_Options object and setting several
+options, there is a parallel method that give you such ability  on  the
+fly.  You  can  concatenate several set_xxxxx() member functions, since
+each of them returns a reference to its class object. For  example,  to
+pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
+statement, you may write:
+RE(" ^ xyz \\s+ .* blah$",
+RE_Options()
+.set_caseless(true)
+.set_extended(true)
+.set_multiline(true)).PartialMatch(sometext);
+SCANNING TEXT INCREMENTALLY
+The "Consume" operation may be useful if you want to  repeatedly  match
+regular expressions at the front of a string and skip over them as they
+match. This requires use of the "StringPiece" type, which represents  a
+sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
+pcrecpp namespace.
+Example: read lines of the form "var = value" from a string.
+string contents = ...;                 // Fill string somehow
+pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
+string var;
+int value;
+pcrecpp::RE re("(\\w+) = (\\d+)\n");
+while (re.Consume(&input, &var, &value)) {
+...;
+}
+Each successful call  to  "Consume"  will  set  "var/value",  and  also
+advance "input" so it points past the matched text.
+The  "FindAndConsume"  operation  is  similar to "Consume" but does not
+anchor your match at the beginning of  the  string.  For  example,  you
+could extract all words from a string by repeatedly calling
+pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
+PARSING HEX/OCTAL/C-RADIX NUMBERS
+By default, if you pass a pointer to a numeric value, the corresponding
+text is interpreted as a base-10  number.  You  can  instead  wrap  the
+pointer with a call to one of the operators Hex(), Octal(), or CRadix()
+to interpret the text in another base. The CRadix  operator  interprets
+C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
+base-10.
+Example:
+int a, b, c, d;
+pcrecpp::RE re("(.*) (.*) (.*) (.*)");
+re.FullMatch("100 40 0100 0x40",
+pcrecpp::Octal(&a), pcrecpp::Hex(&b),
+pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
+will leave 64 in a, b, c, and d.
+REPLACING PARTS OF STRINGS
+You can replace the first match of "pattern" in "str"  with  "rewrite".
+Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
+insert text matching corresponding parenthesized group  from  the  pat-
+tern. \0 in "rewrite" refers to the entire matching text. For example:
+string s = "yabba dabba doo";
+pcrecpp::RE("b+").Replace("d", &s);
+will  leave  "s" containing "yada dabba doo". The result is true if the
+pattern matches and a replacement occurs, false otherwise.
+GlobalReplace is like Replace except that it replaces  all  occurrences
+of  the  pattern  in  the string with the rewrite. Replacements are not
+subject to re-matching. For example:
+string s = "yabba dabba doo";
+pcrecpp::RE("b+").GlobalReplace("d", &s);
+will leave "s" containing "yada dada doo". It  returns  the  number  of
+replacements made.
+Extract  is like Replace, except that if the pattern matches, "rewrite"
+is copied into "out" (an additional argument) with substitutions.   The
+non-matching  portions  of "text" are ignored. Returns true iff a match
+occurred and the extraction happened successfully;  if no match occurs,
+the string is left unaffected.
+AUTHOR
+The C++ wrapper was contributed by Google Inc.
+Copyright (c) 2007 Google Inc.
+REVISION
+Last updated: 12 November 2007
+------------------------------------------------------------------------------
+PCRESAMPLE(3)                                                    PCRESAMPLE(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE SAMPLE PROGRAM
+A simple, complete demonstration program, to get you started with using
+PCRE, is supplied in the file pcredemo.c in the PCRE distribution.
+The program compiles the regular expression that is its first argument,
+and  matches  it  against the subject string in its second argument. No
+PCRE options are set, and default character tables are used. If  match-
+ing  succeeds,  the  program  outputs  the  portion of the subject that
+matched, together with the contents of any captured substrings.
+If the -g option is given on the command line, the program then goes on
+to check for further matches of the same regular expression in the same
+subject string. The logic is a little bit tricky because of the  possi-
+bility  of  matching an empty string. Comments in the code explain what
+is going on.
+If PCRE is installed in the standard include  and  library  directories
+for  your  system, you should be able to compile the demonstration pro-
+gram using this command:
+gcc -o pcredemo pcredemo.c -lpcre
+If PCRE is installed elsewhere, you may need to add additional  options
+to  the  command line. For example, on a Unix-like system that has PCRE
+installed in /usr/local, you  can  compile  the  demonstration  program
+using a command like this:
+gcc -o pcredemo -I/usr/local/include pcredemo.c \
+-L/usr/local/lib -lpcre
+Once  you  have  compiled the demonstration program, you can run simple
+tests like this:
+./pcredemo 'cat|dog' 'the cat sat on the mat'
+./pcredemo -g 'cat|dog' 'the dog sat on the cat'
+Note that there is a  much  more  comprehensive  test  program,  called
+pcretest,  which  supports  many  more  facilities  for testing regular
+expressions and the PCRE library. The pcredemo program is provided as a
+simple coding example.
+On some operating systems (e.g. Solaris), when PCRE is not installed in
+the standard library directory, you may get an error like this when you
+try to run pcredemo:
+ld.so.1:  a.out:  fatal:  libpcre.so.0:  open failed: No such file or
+directory
+This is caused by the way shared library support works  on  those  sys-
+tems. You need to add
+-R/usr/local/lib
+(for example) to the compile command to get round this problem.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 23 January 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------
+PCRESTACK(3)                                                      PCRESTACK(3)
+NAME
+PCRE - Perl-compatible regular expressions
+PCRE DISCUSSION OF STACK USAGE
+When  you call pcre_exec(), it makes use of an internal function called
+match(). This calls itself recursively at branch points in the pattern,
+in  order to remember the state of the match so that it can back up and
+try a different alternative if the first one fails.  As  matching  pro-
+ceeds  deeper  and deeper into the tree of possibilities, the recursion
+depth increases.
+Not all calls of match() increase the recursion depth; for an item such
+as  a* it may be called several times at the same level, after matching
+different numbers of a's. Furthermore, in a number of cases  where  the
+result  of  the  recursive call would immediately be passed back as the
+result of the current call (a "tail recursion"), the function  is  just
+restarted instead.
+The pcre_dfa_exec() function operates in an entirely different way, and
+hardly uses recursion at all. The limit on its complexity is the amount
+of  workspace  it  is  given.  The comments that follow do NOT apply to
+pcre_dfa_exec(); they are relevant only for pcre_exec().
+You can set limits on the number of times that match() is called,  both
+in  total  and  recursively. If the limit is exceeded, an error occurs.
+For details, see the section on  extra  data  for  pcre_exec()  in  the
+pcreapi documentation.
+Each  time  that match() is actually called recursively, it uses memory
+from the process stack. For certain kinds of  pattern  and  data,  very
+large  amounts of stack may be needed, despite the recognition of "tail
+recursion".  You can often reduce the amount of recursion,  and  there-
+fore  the  amount of stack used, by modifying the pattern that is being
+matched. Consider, for example, this pattern:
+([^<]|<(?!inet))+
+It matches from wherever it starts until it encounters "<inet"  or  the
+end  of  the  data,  and is the kind of pattern that might be used when
+processing an XML file. Each iteration of the outer parentheses matches
+either  one  character that is not "<" or a "<" that is not followed by
+"inet". However, each time a  parenthesis  is  processed,  a  recursion
+occurs, so this formulation uses a stack frame for each matched charac-
+ter. For a long string, a lot of stack is required. Consider  now  this
+rewritten pattern, which matches exactly the same strings:
+([^<]++|<(?!inet))+
+This  uses very much less stack, because runs of characters that do not
+contain "<" are "swallowed" in one item inside the parentheses.  Recur-
+sion  happens  only when a "<" character that is not followed by "inet"
+is encountered (and we assume this is relatively  rare).  A  possessive
+quantifier  is  used  to stop any backtracking into the runs of non-"<"
+characters, but that is not related to stack usage.
+This example shows that one way of avoiding stack problems when  match-
+ing long subject strings is to write repeated parenthesized subpatterns
+to match more than one character whenever possible.
+Compiling PCRE to use heap instead of stack
+In environments where stack memory is constrained, you  might  want  to
+compile  PCRE to use heap memory instead of stack for remembering back-
+up points. This makes it run a lot more slowly, however. Details of how
+to do this are given in the pcrebuild documentation. When built in this
+way, instead of using the stack, PCRE obtains and frees memory by call-
+ing  the  functions  that  are  pointed to by the pcre_stack_malloc and
+pcre_stack_free variables. By default,  these  point  to  malloc()  and
+free(),  but you can replace the pointers to cause PCRE to use your own
+functions. Since the block sizes are always the same,  and  are  always
+freed in reverse order, it may be possible to implement customized mem-
+ory handlers that are more efficient than the standard functions.
+Limiting PCRE's stack usage
+PCRE has an internal counter that can be used to  limit  the  depth  of
+recursion,  and  thus cause pcre_exec() to give an error code before it
+runs out of stack. By default, the limit is very  large,  and  unlikely
+ever  to operate. It can be changed when PCRE is built, and it can also
+be set when pcre_exec() is called. For details of these interfaces, see
+the pcrebuild and pcreapi documentation.
+As a very rough rule of thumb, you should reckon on about 500 bytes per
+recursion. Thus, if you want to limit your  stack  usage  to  8Mb,  you
+should  set  the  limit at 16000 recursions. A 64Mb stack, on the other
+hand, can support around 128000 recursions. The pcretest  test  program
+has a command line option (-S) that can be used to increase the size of
+its stack.
+Changing stack size in Unix-like systems
+In Unix-like environments, there is not often a problem with the  stack
+unless  very  long  strings  are  involved, though the default limit on
+stack size varies from system to system. Values from 8Mb  to  64Mb  are
+common. You can find your default limit by running the command:
+ulimit -s
+Unfortunately,  the  effect  of  running out of stack is often SIGSEGV,
+though sometimes a more explicit error message is given. You  can  nor-
+mally increase the limit on stack size by code such as this:
+struct rlimit rlim;
+getrlimit(RLIMIT_STACK, &rlim);
+rlim.rlim_cur = 100*1024*1024;
+setrlimit(RLIMIT_STACK, &rlim);
+This  reads  the current limits (soft and hard) using getrlimit(), then
+attempts to increase the soft limit to  100Mb  using  setrlimit().  You
+must do this before calling pcre_exec().
+Changing stack size in Mac OS X
+Using setrlimit(), as described above, should also work on Mac OS X. It
+is also possible to set a stack size when linking a program. There is a
+discussion   about   stack  sizes  in  Mac  OS  X  at  this  web  site:
+http://developer.apple.com/qa/qa2005/qa1419.html.
+AUTHOR
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+REVISION
+Last updated: 09 July 2008
+Copyright (c) 1997-2008 University of Cambridge.
+------------------------------------------------------------------------------