diff -r 000000000000 -r 7f656887cf89 libraries/spcre/libpcre/pcre/doc/pcrecallout.3 --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/libraries/spcre/libpcre/pcre/doc/pcrecallout.3 Wed Jun 23 15:52:26 2010 +0100 @@ -0,0 +1,177 @@ +.TH PCRECALLOUT 3 +.SH NAME +PCRE - Perl-compatible regular expressions +.SH "PCRE CALLOUTS" +.rs +.sp +.B int (*pcre_callout)(pcre_callout_block *); +.PP +PCRE provides a feature called "callout", which is a means of temporarily +passing control to the caller of PCRE in the middle of pattern matching. The +caller of PCRE provides an external function by putting its entry point in the +global variable \fIpcre_callout\fP. By default, this variable contains NULL, +which disables all calling out. +.P +Within a regular expression, (?C) indicates the points at which the external +function is to be called. Different callout points can be identified by putting +a number less than 256 after the letter C. The default value is zero. +For example, this pattern has two callout points: +.sp + (?C1)abc(?C2)def +.sp +If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called, +PCRE automatically inserts callouts, all with number 255, before each item in +the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern +.sp + A(\ed{2}|--) +.sp +it is processed as if it were +.sp +(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) +.sp +Notice that there is a callout before and after each parenthesis and +alternation bar. Automatic callouts can be used for tracking the progress of +pattern matching. The +.\" HREF +\fBpcretest\fP +.\" +command has an option that sets automatic callouts; when it is used, the output +indicates how the pattern is matched. This is useful information when you are +trying to optimize the performance of a particular pattern. +. +. +.SH "MISSING CALLOUTS" +.rs +.sp +You should be aware that, because of optimizations in the way PCRE matches +patterns, callouts sometimes do not happen. For example, if the pattern is +.sp + ab(?C4)cd +.sp +PCRE knows that any matching string must contain the letter "d". If the subject +string is "abyz", the lack of "d" means that matching doesn't ever start, and +the callout is never reached. However, with "abyd", though the result is still +no match, the callout is obeyed. +. +. +.SH "THE CALLOUT INTERFACE" +.rs +.sp +During matching, when PCRE reaches a callout point, the external function +defined by \fIpcre_callout\fP is called (if it is set). This applies to both +the \fBpcre_exec()\fP and the \fBpcre_dfa_exec()\fP matching functions. The +only argument to the callout function is a pointer to a \fBpcre_callout\fP +block. This structure contains the following fields: +.sp + int \fIversion\fP; + int \fIcallout_number\fP; + int *\fIoffset_vector\fP; + const char *\fIsubject\fP; + int \fIsubject_length\fP; + int \fIstart_match\fP; + int \fIcurrent_position\fP; + int \fIcapture_top\fP; + int \fIcapture_last\fP; + void *\fIcallout_data\fP; + int \fIpattern_position\fP; + int \fInext_item_length\fP; +.sp +The \fIversion\fP field is an integer containing the version number of the +block format. The initial version was 0; the current version is 1. The version +number will change again in future if additional fields are added, but the +intention is never to remove any of the existing fields. +.P +The \fIcallout_number\fP field contains the number of the callout, as compiled +into the pattern (that is, the number after ?C for manual callouts, and 255 for +automatically generated callouts). +.P +The \fIoffset_vector\fP field is a pointer to the vector of offsets that was +passed by the caller to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. When +\fBpcre_exec()\fP is used, the contents can be inspected in order to extract +substrings that have been matched so far, in the same way as for extracting +substrings after a match has completed. For \fBpcre_dfa_exec()\fP this field is +not useful. +.P +The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values +that were passed to \fBpcre_exec()\fP. +.P +The \fIstart_match\fP field normally contains the offset within the subject at +which the current match attempt started. However, if the escape sequence \eK +has been encountered, this value is changed to reflect the modified starting +point. If the pattern is not anchored, the callout function may be called +several times from the same point in the pattern for different starting points +in the subject. +.P +The \fIcurrent_position\fP field contains the offset within the subject of the +current match pointer. +.P +When the \fBpcre_exec()\fP function is used, the \fIcapture_top\fP field +contains one more than the number of the highest numbered captured substring so +far. If no substrings have been captured, the value of \fIcapture_top\fP is +one. This is always the case when \fBpcre_dfa_exec()\fP is used, because it +does not support captured substrings. +.P +The \fIcapture_last\fP field contains the number of the most recently captured +substring. If no substrings have been captured, its value is -1. This is always +the case when \fBpcre_dfa_exec()\fP is used. +.P +The \fIcallout_data\fP field contains a value that is passed to +\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP specifically so that it can be +passed back in callouts. It is passed in the \fIpcre_callout\fP field of the +\fBpcre_extra\fP data structure. If no such data was passed, the value of +\fIcallout_data\fP in a \fBpcre_callout\fP block is NULL. There is a +description of the \fBpcre_extra\fP structure in the +.\" HREF +\fBpcreapi\fP +.\" +documentation. +.P +The \fIpattern_position\fP field is present from version 1 of the +\fIpcre_callout\fP structure. It contains the offset to the next item to be +matched in the pattern string. +.P +The \fInext_item_length\fP field is present from version 1 of the +\fIpcre_callout\fP structure. It contains the length of the next item to be +matched in the pattern string. When the callout immediately precedes an +alternation bar, a closing parenthesis, or the end of the pattern, the length +is zero. When the callout precedes an opening parenthesis, the length is that +of the entire subpattern. +.P +The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to +help in distinguishing between different automatic callouts, which all have the +same callout number. However, they are set for all callouts. +. +. +.SH "RETURN VALUES" +.rs +.sp +The external callout function returns an integer to PCRE. If the value is zero, +matching proceeds as normal. If the value is greater than zero, matching fails +at the current point, but the testing of other matching possibilities goes +ahead, just as if a lookahead assertion had failed. If the value is less than +zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP) +returns the negative value. +.P +Negative values should normally be chosen from the set of PCRE_ERROR_xxx +values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure. +The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions; +it will never be used by PCRE itself. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge CB2 3QH, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 29 May 2007 +Copyright (c) 1997-2007 University of Cambridge. +.fi