diff -r 000000000000 -r 7f656887cf89 libraries/spcre/libpcre/pcre/doc/pcrecompat.3 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/libraries/spcre/libpcre/pcre/doc/pcrecompat.3 Wed Jun 23 15:52:26 2010 +0100 @@ -0,0 +1,148 @@ +.TH PCRECOMPAT 3 +.SH NAME +PCRE - Perl-compatible regular expressions +.SH "DIFFERENCES BETWEEN PCRE AND PERL" +.rs +.sp +This document describes the differences in the ways that PCRE and Perl handle +regular expressions. The differences described here are mainly with respect to +Perl 5.8, though PCRE versions 7.0 and later contain some features that are +expected to be in the forthcoming Perl 5.10. +.P +1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what +it does have are given in the +.\" HTML +.\" +section on UTF-8 support +.\" +in the main +.\" HREF +\fBpcre\fP +.\" +page. +.P +2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits +them, but they do not mean what you might think. For example, (?!a){3} does +not assert that the next three characters are not "a". It just asserts that the +next character is not "a" three times. +.P +3. Capturing subpatterns that occur inside negative lookahead assertions are +counted, but their entries in the offsets vector are never set. Perl sets its +numerical variables from any such patterns that are matched before the +assertion fails to match something (thereby succeeding), but only if the +negative lookahead assertion contains just one branch. +.P +4. Though binary zero characters are supported in the subject string, they are +not allowed in a pattern string because it is passed as a normal C string, +terminated by zero. The escape sequence \e0 can be used in the pattern to +represent a binary zero. +.P +5. The following Perl escape sequences are not supported: \el, \eu, \eL, +\eU, and \eN. In fact these are implemented by Perl's general string-handling +and are not part of its pattern matching engine. If any of these are +encountered by PCRE, an error is generated. +.P +6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE is +built with Unicode character property support. The properties that can be +tested with \ep and \eP are limited to the general category properties such as +Lu and Nd, script names such as Greek or Han, and the derived properties Any +and L&. +.P +7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in +between are treated as literals. This is slightly different from Perl in that $ +and @ are also handled as literals inside the quotes. In Perl, they cause +variable interpolation (but of course PCRE does not have variables). Note the +following examples: +.sp + Pattern PCRE matches Perl matches +.sp +.\" JOIN + \eQabc$xyz\eE abc$xyz abc followed by the + contents of $xyz + \eQabc\e$xyz\eE abc\e$xyz abc\e$xyz + \eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz +.sp +The \eQ...\eE sequence is recognized both inside and outside character classes. +.P +8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) +constructions. However, there is support for recursive patterns. This is not +available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE "callout" +feature allows an external function to be called during pattern matching. See +the +.\" HREF +\fBpcrecallout\fP +.\" +documentation for details. +.P +9. Subpatterns that are called recursively or as "subroutines" are always +treated as atomic groups in PCRE. This is like Python, but unlike Perl. +.P +10. There are some differences that are concerned with the settings of captured +strings when part of a pattern is repeated. For example, matching "aba" against +the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". +.P +11. PCRE does support Perl 5.10's backtracking verbs (*ACCEPT), (*FAIL), (*F), +(*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an +argument. PCRE does not support (*MARK). If (*ACCEPT) is within capturing +parentheses, PCRE does not set that capture group; this is different to Perl. +.P +12. PCRE provides some extensions to the Perl regular expression facilities. +Perl 5.10 will include new features that are not in earlier versions, some of +which (such as named parentheses) have been in PCRE for some time. This list is +with respect to Perl 5.10: +.sp +(a) Although lookbehind assertions must match fixed length strings, each +alternative branch of a lookbehind assertion can match a different length of +string. Perl requires them all to have the same length. +.sp +(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ +meta-character matches only at the very end of the string. +.sp +(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special +meaning is faulted. Otherwise, like Perl, the backslash is quietly ignored. +(Perl can be made to issue a warning.) +.sp +(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is +inverted, that is, by default they are not greedy, but if followed by a +question mark they are. +.sp +(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried +only at the first matching position in the subject string. +.sp +(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE +options for \fBpcre_exec()\fP have no Perl equivalents. +.sp +(g) The \eR escape sequence can be restricted to match only CR, LF, or CRLF +by the PCRE_BSR_ANYCRLF option. +.sp +(h) The callout facility is PCRE-specific. +.sp +(i) The partial matching facility is PCRE-specific. +.sp +(j) Patterns compiled by PCRE can be saved and re-used at a later time, even on +different hosts that have the other endianness. +.sp +(k) The alternative matching function (\fBpcre_dfa_exec()\fP) matches in a +different way and is not Perl-compatible. +.sp +(l) PCRE recognizes some special sequences such as (*CR) at the start of +a pattern that set overall options that cannot be changed within the pattern. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge CB2 3QH, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 11 September 2007 +Copyright (c) 1997-2007 University of Cambridge. +.fi