--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/libraries/spcre/libpcre/pcre/doc/html/pcresyntax.html Wed Jun 23 15:52:26 2010 +0100
@@ -0,0 +1,457 @@
+<html>
+<head>
+<title>pcresyntax specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcresyntax man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
+<li><a name="TOC2" href="#SEC2">QUOTING</a>
+<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
+<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
+<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
+<li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
+<li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
+<li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
+<li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
+<li><a name="TOC11" href="#SEC11">ALTERNATION</a>
+<li><a name="TOC12" href="#SEC12">CAPTURING</a>
+<li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
+<li><a name="TOC14" href="#SEC14">COMMENT</a>
+<li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
+<li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
+<li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
+<li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
+<li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
+<li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
+<li><a name="TOC23" href="#SEC23">CALLOUTS</a>
+<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
+<li><a name="TOC25" href="#SEC25">AUTHOR</a>
+<li><a name="TOC26" href="#SEC26">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
+<P>
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation. This document contains just a quick-reference summary of the
+syntax.
+</P>
+<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
+<P>
+<pre>
+ \x where x is non-alphanumeric is a literal x
+ \Q...\E treat enclosed characters as literal
+</PRE>
+</P>
+<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
+<P>
+<pre>
+ \a alarm, that is, the BEL character (hex 07)
+ \cx "control-x", where x is any character
+ \e escape (hex 1B)
+ \f formfeed (hex 0C)
+ \n newline (hex 0A)
+ \r carriage return (hex 0D)
+ \t tab (hex 09)
+ \ddd character with octal code ddd, or backreference
+ \xhh character with hex code hh
+ \x{hhh..} character with hex code hhh..
+</PRE>
+</P>
+<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
+<P>
+<pre>
+ . any character except newline;
+ in dotall mode, any character whatsoever
+ \C one byte, even in UTF-8 mode (best avoided)
+ \d a decimal digit
+ \D a character that is not a decimal digit
+ \h a horizontal whitespace character
+ \H a character that is not a horizontal whitespace character
+ \p{<i>xx</i>} a character with the <i>xx</i> property
+ \P{<i>xx</i>} a character without the <i>xx</i> property
+ \R a newline sequence
+ \s a whitespace character
+ \S a character that is not a whitespace character
+ \v a vertical whitespace character
+ \V a character that is not a vertical whitespace character
+ \w a "word" character
+ \W a "non-word" character
+ \X an extended Unicode sequence
+</pre>
+In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+</P>
+<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
+<P>
+<pre>
+ C Other
+ Cc Control
+ Cf Format
+ Cn Unassigned
+ Co Private use
+ Cs Surrogate
+
+ L Letter
+ Ll Lower case letter
+ Lm Modifier letter
+ Lo Other letter
+ Lt Title case letter
+ Lu Upper case letter
+ L& Ll, Lu, or Lt
+
+ M Mark
+ Mc Spacing mark
+ Me Enclosing mark
+ Mn Non-spacing mark
+
+ N Number
+ Nd Decimal number
+ Nl Letter number
+ No Other number
+
+ P Punctuation
+ Pc Connector punctuation
+ Pd Dash punctuation
+ Pe Close punctuation
+ Pf Final punctuation
+ Pi Initial punctuation
+ Po Other punctuation
+ Ps Open punctuation
+
+ S Symbol
+ Sc Currency symbol
+ Sk Modifier symbol
+ Sm Mathematical symbol
+ So Other symbol
+
+ Z Separator
+ Zl Line separator
+ Zp Paragraph separator
+ Zs Space separator
+</PRE>
+</P>
+<br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<P>
+Arabic,
+Armenian,
+Balinese,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+</P>
+<br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
+<P>
+<pre>
+ [...] positive character class
+ [^...] negative character class
+ [x-y] range (can be used for hex characters)
+ [[:xxx:]] positive POSIX named set
+ [[:^xxx:]] negative POSIX named set
+
+ alnum alphanumeric
+ alpha alphabetic
+ ascii 0-127
+ blank space or tab
+ cntrl control character
+ digit decimal digit
+ graph printing, excluding space
+ lower lower case letter
+ print printing, including space
+ punct printing, excluding alphanumeric
+ space whitespace
+ upper upper case letter
+ word same as \w
+ xdigit hexadecimal digit
+</pre>
+In PCRE, POSIX character set names recognize only ASCII characters. You can use
+\Q...\E inside a character class.
+</P>
+<br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
+<P>
+<pre>
+ ? 0 or 1, greedy
+ ?+ 0 or 1, possessive
+ ?? 0 or 1, lazy
+ * 0 or more, greedy
+ *+ 0 or more, possessive
+ *? 0 or more, lazy
+ + 1 or more, greedy
+ ++ 1 or more, possessive
+ +? 1 or more, lazy
+ {n} exactly n
+ {n,m} at least n, no more than m, greedy
+ {n,m}+ at least n, no more than m, possessive
+ {n,m}? at least n, no more than m, lazy
+ {n,} n or more, greedy
+ {n,}+ n or more, possessive
+ {n,}? n or more, lazy
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<P>
+<pre>
+ \b word boundary
+ \B not a word boundary
+ ^ start of subject
+ also after internal newline in multiline mode
+ \A start of subject
+ $ end of subject
+ also before newline at end of subject
+ also before internal newline in multiline mode
+ \Z end of subject
+ also before newline at end of subject
+ \z end of subject
+ \G first matching position in subject
+</PRE>
+</P>
+<br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
+<P>
+<pre>
+ \K reset start of match
+</PRE>
+</P>
+<br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
+<P>
+<pre>
+ expr|expr|expr...
+</PRE>
+</P>
+<br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
+<P>
+<pre>
+ (...) capturing group
+ (?<name>...) named capturing group (Perl)
+ (?'name'...) named capturing group (Perl)
+ (?P<name>...) named capturing group (Python)
+ (?:...) non-capturing group
+ (?|...) non-capturing group; reset group numbers for
+ capturing groups in each alternative
+</PRE>
+</P>
+<br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
+<P>
+<pre>
+ (?>...) atomic, non-capturing group
+</PRE>
+</P>
+<br><a name="SEC14" href="#TOC1">COMMENT</a><br>
+<P>
+<pre>
+ (?#....) comment (not nestable)
+</PRE>
+</P>
+<br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
+<P>
+<pre>
+ (?i) caseless
+ (?J) allow duplicate names
+ (?m) multiline
+ (?s) single line (dotall)
+ (?U) default ungreedy (lazy)
+ (?x) extended (ignore white space)
+ (?-...) unset option(s)
+</PRE>
+</P>
+<br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<P>
+<pre>
+ (?=...) positive look ahead
+ (?!...) negative look ahead
+ (?<=...) positive look behind
+ (?<!...) negative look behind
+</pre>
+Each top-level branch of a look behind must be of a fixed length.
+</P>
+<br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
+<P>
+<pre>
+ \n reference by number (can be ambiguous)
+ \gn reference by number
+ \g{n} reference by number
+ \g{-n} relative reference by number
+ \k<name> reference by name (Perl)
+ \k'name' reference by name (Perl)
+ \g{name} reference by name (Perl)
+ \k{name} reference by name (.NET)
+ (?P=name) reference by name (Python)
+</PRE>
+</P>
+<br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<P>
+<pre>
+ (?R) recurse whole pattern
+ (?n) call subpattern by absolute number
+ (?+n) call subpattern by relative number
+ (?-n) call subpattern by relative number
+ (?&name) call subpattern by name (Perl)
+ (?P>name) call subpattern by name (Python)
+ \g<name> call subpattern by name (Oniguruma)
+ \g'name' call subpattern by name (Oniguruma)
+ \g<n> call subpattern by absolute number (Oniguruma)
+ \g'n' call subpattern by absolute number (Oniguruma)
+ \g<+n> call subpattern by relative number (PCRE extension)
+ \g'+n' call subpattern by relative number (PCRE extension)
+ \g<-n> call subpattern by relative number (PCRE extension)
+ \g'-n' call subpattern by relative number (PCRE extension)
+</PRE>
+</P>
+<br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<P>
+<pre>
+ (?(condition)yes-pattern)
+ (?(condition)yes-pattern|no-pattern)
+
+ (?(n)... absolute reference condition
+ (?(+n)... relative reference condition
+ (?(-n)... relative reference condition
+ (?(<name>)... named reference condition (Perl)
+ (?('name')... named reference condition (Perl)
+ (?(name)... named reference condition (PCRE)
+ (?(R)... overall recursion condition
+ (?(Rn)... specific group recursion condition
+ (?(R&name)... specific recursion condition
+ (?(DEFINE)... define subpattern for reference
+ (?(assert)... assertion condition
+</PRE>
+</P>
+<br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<P>
+The following act immediately they are reached:
+<pre>
+ (*ACCEPT) force successful match
+ (*FAIL) force backtrack; synonym (*F)
+</pre>
+The following act only when a subsequent match failure causes a backtrack to
+reach them. They all force a match failure, but they differ in what happens
+afterwards. Those that advance the start-of-match point do so only if the
+pattern is not anchored.
+<pre>
+ (*COMMIT) overall failure, no advance of starting point
+ (*PRUNE) advance to next starting character
+ (*SKIP) advance start to current matching position
+ (*THEN) local failure, backtrack to next alternation
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
+<P>
+These are recognized only at the very start of the pattern or after a
+(*BSR_...) option.
+<pre>
+ (*CR)
+ (*LF)
+ (*CRLF)
+ (*ANYCRLF)
+ (*ANY)
+</PRE>
+</P>
+<br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
+<P>
+These are recognized only at the very start of the pattern or after a
+(*...) option that sets the newline convention.
+<pre>
+ (*BSR_ANYCRLF)
+ (*BSR_UNICODE)
+</PRE>
+</P>
+<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
+<P>
+<pre>
+ (?C) callout
+ (?Cn) callout with data n
+</PRE>
+</P>
+<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
+<b>pcrematching</b>(3), <b>pcre</b>(3).
+</P>
+<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC26" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 09 April 2008
+<br>
+Copyright © 1997-2008 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>