diff -r 000000000000 -r 7f656887cf89 libraries/spcre/libpcre/pcre/doc/html/pcrecompat.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/libraries/spcre/libpcre/pcre/doc/html/pcrecompat.html Wed Jun 23 15:52:26 2010 +0100 @@ -0,0 +1,179 @@ + + +pcrecompat specification + + +

pcrecompat man page

+

+Return to the PCRE index page. +

+

+This page is part of the PCRE HTML documentation. It was generated automatically +from the original man page. If there is any nonsense in it, please consult the +man page, in case the conversion went wrong. +
+
+DIFFERENCES BETWEEN PCRE AND PERL +
+

+This document describes the differences in the ways that PCRE and Perl handle +regular expressions. The differences described here are mainly with respect to +Perl 5.8, though PCRE versions 7.0 and later contain some features that are +expected to be in the forthcoming Perl 5.10. +

+

+1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what +it does have are given in the +section on UTF-8 support +in the main +pcre +page. +

+

+2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits +them, but they do not mean what you might think. For example, (?!a){3} does +not assert that the next three characters are not "a". It just asserts that the +next character is not "a" three times. +

+

+3. Capturing subpatterns that occur inside negative lookahead assertions are +counted, but their entries in the offsets vector are never set. Perl sets its +numerical variables from any such patterns that are matched before the +assertion fails to match something (thereby succeeding), but only if the +negative lookahead assertion contains just one branch. +

+

+4. Though binary zero characters are supported in the subject string, they are +not allowed in a pattern string because it is passed as a normal C string, +terminated by zero. The escape sequence \0 can be used in the pattern to +represent a binary zero. +

+

+5. The following Perl escape sequences are not supported: \l, \u, \L, +\U, and \N. In fact these are implemented by Perl's general string-handling +and are not part of its pattern matching engine. If any of these are +encountered by PCRE, an error is generated. +

+

+6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is +built with Unicode character property support. The properties that can be +tested with \p and \P are limited to the general category properties such as +Lu and Nd, script names such as Greek or Han, and the derived properties Any +and L&. +

+

+7. PCRE does support the \Q...\E escape for quoting substrings. Characters in +between are treated as literals. This is slightly different from Perl in that $ +and @ are also handled as literals inside the quotes. In Perl, they cause +variable interpolation (but of course PCRE does not have variables). Note the +following examples: +

+    Pattern            PCRE matches      Perl matches
+
+    \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
+    \Qabc\$xyz\E       abc\$xyz          abc\$xyz
+    \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
+
+The \Q...\E sequence is recognized both inside and outside character classes. +

+

+8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) +constructions. However, there is support for recursive patterns. This is not +available in Perl 5.8, but will be in Perl 5.10. Also, the PCRE "callout" +feature allows an external function to be called during pattern matching. See +the +pcrecallout +documentation for details. +

+

+9. Subpatterns that are called recursively or as "subroutines" are always +treated as atomic groups in PCRE. This is like Python, but unlike Perl. +

+

+10. There are some differences that are concerned with the settings of captured +strings when part of a pattern is repeated. For example, matching "aba" against +the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". +

+

+11. PCRE does support Perl 5.10's backtracking verbs (*ACCEPT), (*FAIL), (*F), +(*COMMIT), (*PRUNE), (*SKIP), and (*THEN), but only in the forms without an +argument. PCRE does not support (*MARK). If (*ACCEPT) is within capturing +parentheses, PCRE does not set that capture group; this is different to Perl. +

+

+12. PCRE provides some extensions to the Perl regular expression facilities. +Perl 5.10 will include new features that are not in earlier versions, some of +which (such as named parentheses) have been in PCRE for some time. This list is +with respect to Perl 5.10: +
+
+(a) Although lookbehind assertions must match fixed length strings, each +alternative branch of a lookbehind assertion can match a different length of +string. Perl requires them all to have the same length. +
+
+(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ +meta-character matches only at the very end of the string. +
+
+(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special +meaning is faulted. Otherwise, like Perl, the backslash is quietly ignored. +(Perl can be made to issue a warning.) +
+
+(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is +inverted, that is, by default they are not greedy, but if followed by a +question mark they are. +
+
+(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried +only at the first matching position in the subject string. +
+
+(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE +options for pcre_exec() have no Perl equivalents. +
+
+(g) The \R escape sequence can be restricted to match only CR, LF, or CRLF +by the PCRE_BSR_ANYCRLF option. +
+
+(h) The callout facility is PCRE-specific. +
+
+(i) The partial matching facility is PCRE-specific. +
+
+(j) Patterns compiled by PCRE can be saved and re-used at a later time, even on +different hosts that have the other endianness. +
+
+(k) The alternative matching function (pcre_dfa_exec()) matches in a +different way and is not Perl-compatible. +
+
+(l) PCRE recognizes some special sequences such as (*CR) at the start of +a pattern that set overall options that cannot be changed within the pattern. +

+
+AUTHOR +
+

+Philip Hazel +
+University Computing Service +
+Cambridge CB2 3QH, England. +
+

+
+REVISION +
+

+Last updated: 11 September 2007 +
+Copyright © 1997-2007 University of Cambridge. +
+

+Return to the PCRE index page. +