libraries/spcre/libpcre/pcre/doc/html/pcrepartial.html
changeset 0 7f656887cf89
equal deleted inserted replaced
-1:000000000000 0:7f656887cf89
       
     1 <html>
       
     2 <head>
       
     3 <title>pcrepartial specification</title>
       
     4 </head>
       
     5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
       
     6 <h1>pcrepartial man page</h1>
       
     7 <p>
       
     8 Return to the <a href="index.html">PCRE index page</a>.
       
     9 </p>
       
    10 <p>
       
    11 This page is part of the PCRE HTML documentation. It was generated automatically
       
    12 from the original man page. If there is any nonsense in it, please consult the
       
    13 man page, in case the conversion went wrong.
       
    14 <br>
       
    15 <ul>
       
    16 <li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
       
    17 <li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
       
    18 <li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
       
    19 <li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
       
    20 <li><a name="TOC5" href="#SEC5">AUTHOR</a>
       
    21 <li><a name="TOC6" href="#SEC6">REVISION</a>
       
    22 </ul>
       
    23 <br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
       
    24 <P>
       
    25 In normal use of PCRE, if the subject string that is passed to
       
    26 <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
       
    27 too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
       
    28 are circumstances where it might be helpful to distinguish this case from other
       
    29 cases in which there is no match.
       
    30 </P>
       
    31 <P>
       
    32 Consider, for example, an application where a human is required to type in data
       
    33 for a field with specific formatting requirements. An example might be a date
       
    34 in the form <i>ddmmmyy</i>, defined by this pattern:
       
    35 <pre>
       
    36   ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
       
    37 </pre>
       
    38 If the application sees the user's keystrokes one by one, and can check that
       
    39 what has been typed so far is potentially valid, it is able to raise an error
       
    40 as soon as a mistake is made, possibly beeping and not reflecting the
       
    41 character that has been typed. This immediate feedback is likely to be a better
       
    42 user interface than a check that is delayed until the entire string has been
       
    43 entered.
       
    44 </P>
       
    45 <P>
       
    46 PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
       
    47 option, which can be set when calling <b>pcre_exec()</b> or
       
    48 <b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
       
    49 code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
       
    50 during the matching process the last part of the subject string matched part of
       
    51 the pattern. Unfortunately, for non-anchored matching, it is not possible to
       
    52 obtain the position of the start of the partial match. No captured data is set
       
    53 when PCRE_ERROR_PARTIAL is returned.
       
    54 </P>
       
    55 <P>
       
    56 When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
       
    57 PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
       
    58 subject is reached, there have been no complete matches, but there is still at
       
    59 least one matching possibility. The portion of the string that provided the
       
    60 partial match is set as the first matching string.
       
    61 </P>
       
    62 <P>
       
    63 Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
       
    64 last literal byte in a pattern, and abandons matching immediately if such a
       
    65 byte is not present in the subject string. This optimization cannot be used
       
    66 for a subject string that might match only partially.
       
    67 </P>
       
    68 <br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
       
    69 <P>
       
    70 Because of the way certain internal optimizations are implemented in the
       
    71 <b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
       
    72 patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
       
    73 For <b>pcre_exec()</b>, repeated single characters such as
       
    74 <pre>
       
    75   a{2,4}
       
    76 </pre>
       
    77 and repeated single metasequences such as
       
    78 <pre>
       
    79   \d+
       
    80 </pre>
       
    81 are not permitted if the maximum number of occurrences is greater than one.
       
    82 Optional items such as \d? (where the maximum is one) are permitted.
       
    83 Quantifiers with any values are permitted after parentheses, so the invalid
       
    84 examples above can be coded thus:
       
    85 <pre>
       
    86   (a){2,4}
       
    87   (\d)+
       
    88 </pre>
       
    89 These constructions run more slowly, but for the kinds of application that are
       
    90 envisaged for this facility, this is not felt to be a major restriction.
       
    91 </P>
       
    92 <P>
       
    93 If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
       
    94 <b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
       
    95 You can use the PCRE_INFO_OKPARTIAL call to <b>pcre_fullinfo()</b> to find out
       
    96 if a compiled pattern can be used for partial matching.
       
    97 </P>
       
    98 <br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
       
    99 <P>
       
   100 If the escape sequence \P is present in a <b>pcretest</b> data line, the
       
   101 PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
       
   102 uses the date example quoted above:
       
   103 <pre>
       
   104     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
       
   105   data&#62; 25jun04\P
       
   106    0: 25jun04
       
   107    1: jun
       
   108   data&#62; 25dec3\P
       
   109   Partial match
       
   110   data&#62; 3ju\P
       
   111   Partial match
       
   112   data&#62; 3juj\P
       
   113   No match
       
   114   data&#62; j\P
       
   115   No match
       
   116 </pre>
       
   117 The first data string is matched completely, so <b>pcretest</b> shows the
       
   118 matched substrings. The remaining four strings do not match the complete
       
   119 pattern, but the first two are partial matches. The same test, using
       
   120 <b>pcre_dfa_exec()</b> matching (by means of the \D escape sequence), produces
       
   121 the following output:
       
   122 <pre>
       
   123     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
       
   124   data&#62; 25jun04\P\D
       
   125    0: 25jun04
       
   126   data&#62; 23dec3\P\D
       
   127   Partial match: 23dec3
       
   128   data&#62; 3ju\P\D
       
   129   Partial match: 3ju
       
   130   data&#62; 3juj\P\D
       
   131   No match
       
   132   data&#62; j\P\D
       
   133   No match
       
   134 </pre>
       
   135 Notice that in this case the portion of the string that was matched is made
       
   136 available.
       
   137 </P>
       
   138 <br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
       
   139 <P>
       
   140 When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
       
   141 to continue the match by providing additional subject data and calling
       
   142 <b>pcre_dfa_exec()</b> again with the same compiled regular expression, this
       
   143 time setting the PCRE_DFA_RESTART option. You must also pass the same working
       
   144 space as before, because this is where details of the previous partial match
       
   145 are stored. Here is an example using <b>pcretest</b>, using the \R escape
       
   146 sequence to set the PCRE_DFA_RESTART option (\P and \D are as above):
       
   147 <pre>
       
   148     re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
       
   149   data&#62; 23ja\P\D
       
   150   Partial match: 23ja
       
   151   data&#62; n05\R\D
       
   152    0: n05
       
   153 </pre>
       
   154 The first call has "23ja" as the subject, and requests partial matching; the
       
   155 second call has "n05" as the subject for the continued (restarted) match.
       
   156 Notice that when the match is complete, only the last part is shown; PCRE does
       
   157 not retain the previously partially-matched string. It is up to the calling
       
   158 program to do that if it needs to.
       
   159 </P>
       
   160 <P>
       
   161 You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial matching
       
   162 over multiple segments. This facility can be used to pass very long subject
       
   163 strings to <b>pcre_dfa_exec()</b>. However, some care is needed for certain
       
   164 types of pattern.
       
   165 </P>
       
   166 <P>
       
   167 1. If the pattern contains tests for the beginning or end of a line, you need
       
   168 to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
       
   169 subject string for any call does not contain the beginning or end of a line.
       
   170 </P>
       
   171 <P>
       
   172 2. If the pattern contains backward assertions (including \b or \B), you need
       
   173 to arrange for some overlap in the subject strings to allow for this. For
       
   174 example, you could pass the subject in chunks that are 500 bytes long, but in
       
   175 a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
       
   176 bytes at the start of the buffer.
       
   177 </P>
       
   178 <P>
       
   179 3. Matching a subject string that is split into multiple segments does not
       
   180 always produce exactly the same result as matching over one single long string.
       
   181 The difference arises when there are multiple matching possibilities, because a
       
   182 partial match result is given only when there are no completed matches in a
       
   183 call to <b>pcre_dfa_exec()</b>. This means that as soon as the shortest match has
       
   184 been found, continuation to a new subject segment is no longer possible.
       
   185 Consider this <b>pcretest</b> example:
       
   186 <pre>
       
   187     re&#62; /dog(sbody)?/
       
   188   data&#62; do\P\D
       
   189   Partial match: do
       
   190   data&#62; gsb\R\P\D
       
   191    0: g
       
   192   data&#62; dogsbody\D
       
   193    0: dogsbody
       
   194    1: dog
       
   195 </pre>
       
   196 The pattern matches the words "dog" or "dogsbody". When the subject is
       
   197 presented in several parts ("do" and "gsb" being the first two) the match stops
       
   198 when "dog" has been found, and it is not possible to continue. On the other
       
   199 hand, if "dogsbody" is presented as a single string, both matches are found.
       
   200 </P>
       
   201 <P>
       
   202 Because of this phenomenon, it does not usually make sense to end a pattern
       
   203 that is going to be matched in this way with a variable repeat.
       
   204 </P>
       
   205 <P>
       
   206 4. Patterns that contain alternatives at the top level which do not all
       
   207 start with the same pattern item may not work as expected. For example,
       
   208 consider this pattern:
       
   209 <pre>
       
   210   1234|3789
       
   211 </pre>
       
   212 If the first part of the subject is "ABC123", a partial match of the first
       
   213 alternative is found at offset 3. There is no partial match for the second
       
   214 alternative, because such a match does not start at the same point in the
       
   215 subject string. Attempting to continue with the string "789" does not yield a
       
   216 match because only those alternatives that match at one point in the subject
       
   217 are remembered. The problem arises because the start of the second alternative
       
   218 matches within the first alternative. There is no problem with anchored
       
   219 patterns or patterns such as:
       
   220 <pre>
       
   221   1234|ABCD
       
   222 </pre>
       
   223 where no string can be a partial match for both alternatives.
       
   224 </P>
       
   225 <br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
       
   226 <P>
       
   227 Philip Hazel
       
   228 <br>
       
   229 University Computing Service
       
   230 <br>
       
   231 Cambridge CB2 3QH, England.
       
   232 <br>
       
   233 </P>
       
   234 <br><a name="SEC6" href="#TOC1">REVISION</a><br>
       
   235 <P>
       
   236 Last updated: 04 June 2007
       
   237 <br>
       
   238 Copyright &copy; 1997-2007 University of Cambridge.
       
   239 <br>
       
   240 <p>
       
   241 Return to the <a href="index.html">PCRE index page</a>.
       
   242 </p>