|
1 .TH PCREBUILD 3 |
|
2 .SH NAME |
|
3 PCRE - Perl-compatible regular expressions |
|
4 .SH "PCRE BUILD-TIME OPTIONS" |
|
5 .rs |
|
6 .sp |
|
7 This document describes the optional features of PCRE that can be selected when |
|
8 the library is compiled. It assumes use of the \fBconfigure\fP script, where |
|
9 the optional features are selected or deselected by providing options to |
|
10 \fBconfigure\fP before running the \fBmake\fP command. However, the same |
|
11 options can be selected in both Unix-like and non-Unix-like environments using |
|
12 the GUI facility of \fBCMakeSetup\fP if you are using \fBCMake\fP instead of |
|
13 \fBconfigure\fP to build PCRE. |
|
14 .P |
|
15 The complete list of options for \fBconfigure\fP (which includes the standard |
|
16 ones such as the selection of the installation directory) can be obtained by |
|
17 running |
|
18 .sp |
|
19 ./configure --help |
|
20 .sp |
|
21 The following sections include descriptions of options whose names begin with |
|
22 --enable or --disable. These settings specify changes to the defaults for the |
|
23 \fBconfigure\fP command. Because of the way that \fBconfigure\fP works, |
|
24 --enable and --disable always come in pairs, so the complementary option always |
|
25 exists as well, but as it specifies the default, it is not described. |
|
26 . |
|
27 .SH "C++ SUPPORT" |
|
28 .rs |
|
29 .sp |
|
30 By default, the \fBconfigure\fP script will search for a C++ compiler and C++ |
|
31 header files. If it finds them, it automatically builds the C++ wrapper library |
|
32 for PCRE. You can disable this by adding |
|
33 .sp |
|
34 --disable-cpp |
|
35 .sp |
|
36 to the \fBconfigure\fP command. |
|
37 . |
|
38 .SH "UTF-8 SUPPORT" |
|
39 .rs |
|
40 .sp |
|
41 To build PCRE with support for UTF-8 character strings, add |
|
42 .sp |
|
43 --enable-utf8 |
|
44 .sp |
|
45 to the \fBconfigure\fP command. Of itself, this does not make PCRE treat |
|
46 strings as UTF-8. As well as compiling PCRE with this option, you also have |
|
47 have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP |
|
48 function. |
|
49 . |
|
50 .SH "UNICODE CHARACTER PROPERTY SUPPORT" |
|
51 .rs |
|
52 .sp |
|
53 UTF-8 support allows PCRE to process character values greater than 255 in the |
|
54 strings that it handles. On its own, however, it does not provide any |
|
55 facilities for accessing the properties of such characters. If you want to be |
|
56 able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode |
|
57 character properties, you must add |
|
58 .sp |
|
59 --enable-unicode-properties |
|
60 .sp |
|
61 to the \fBconfigure\fP command. This implies UTF-8 support, even if you have |
|
62 not explicitly requested it. |
|
63 .P |
|
64 Including Unicode property support adds around 30K of tables to the PCRE |
|
65 library. Only the general category properties such as \fILu\fP and \fINd\fP are |
|
66 supported. Details are given in the |
|
67 .\" HREF |
|
68 \fBpcrepattern\fP |
|
69 .\" |
|
70 documentation. |
|
71 . |
|
72 .SH "CODE VALUE OF NEWLINE" |
|
73 .rs |
|
74 .sp |
|
75 By default, PCRE interprets character 10 (linefeed, LF) as indicating the end |
|
76 of a line. This is the normal newline character on Unix-like systems. You can |
|
77 compile PCRE to use character 13 (carriage return, CR) instead, by adding |
|
78 .sp |
|
79 --enable-newline-is-cr |
|
80 .sp |
|
81 to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option, |
|
82 which explicitly specifies linefeed as the newline character. |
|
83 .sp |
|
84 Alternatively, you can specify that line endings are to be indicated by the two |
|
85 character sequence CRLF. If you want this, add |
|
86 .sp |
|
87 --enable-newline-is-crlf |
|
88 .sp |
|
89 to the \fBconfigure\fP command. There is a fourth option, specified by |
|
90 .sp |
|
91 --enable-newline-is-anycrlf |
|
92 .sp |
|
93 which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as |
|
94 indicating a line ending. Finally, a fifth option, specified by |
|
95 .sp |
|
96 --enable-newline-is-any |
|
97 .sp |
|
98 causes PCRE to recognize any Unicode newline sequence. |
|
99 .P |
|
100 Whatever line ending convention is selected when PCRE is built can be |
|
101 overridden when the library functions are called. At build time it is |
|
102 conventional to use the standard for your operating system. |
|
103 . |
|
104 .SH "WHAT \eR MATCHES" |
|
105 .rs |
|
106 .sp |
|
107 By default, the sequence \eR in a pattern matches any Unicode newline sequence, |
|
108 whatever has been selected as the line ending sequence. If you specify |
|
109 .sp |
|
110 --enable-bsr-anycrlf |
|
111 .sp |
|
112 the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is |
|
113 selected when PCRE is built can be overridden when the library functions are |
|
114 called. |
|
115 . |
|
116 .SH "BUILDING SHARED AND STATIC LIBRARIES" |
|
117 .rs |
|
118 .sp |
|
119 The PCRE building process uses \fBlibtool\fP to build both shared and static |
|
120 Unix libraries by default. You can suppress one of these by adding one of |
|
121 .sp |
|
122 --disable-shared |
|
123 --disable-static |
|
124 .sp |
|
125 to the \fBconfigure\fP command, as required. |
|
126 . |
|
127 .SH "POSIX MALLOC USAGE" |
|
128 .rs |
|
129 .sp |
|
130 When PCRE is called through the POSIX interface (see the |
|
131 .\" HREF |
|
132 \fBpcreposix\fP |
|
133 .\" |
|
134 documentation), additional working storage is required for holding the pointers |
|
135 to capturing substrings, because PCRE requires three integers per substring, |
|
136 whereas the POSIX interface provides only two. If the number of expected |
|
137 substrings is small, the wrapper function uses space on the stack, because this |
|
138 is faster than using \fBmalloc()\fP for each call. The default threshold above |
|
139 which the stack is no longer used is 10; it can be changed by adding a setting |
|
140 such as |
|
141 .sp |
|
142 --with-posix-malloc-threshold=20 |
|
143 .sp |
|
144 to the \fBconfigure\fP command. |
|
145 . |
|
146 .SH "HANDLING VERY LARGE PATTERNS" |
|
147 .rs |
|
148 .sp |
|
149 Within a compiled pattern, offset values are used to point from one part to |
|
150 another (for example, from an opening parenthesis to an alternation |
|
151 metacharacter). By default, two-byte values are used for these offsets, leading |
|
152 to a maximum size for a compiled pattern of around 64K. This is sufficient to |
|
153 handle all but the most gigantic patterns. Nevertheless, some people do want to |
|
154 process enormous patterns, so it is possible to compile PCRE to use three-byte |
|
155 or four-byte offsets by adding a setting such as |
|
156 .sp |
|
157 --with-link-size=3 |
|
158 .sp |
|
159 to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using |
|
160 longer offsets slows down the operation of PCRE because it has to load |
|
161 additional bytes when handling them. |
|
162 . |
|
163 .SH "AVOIDING EXCESSIVE STACK USAGE" |
|
164 .rs |
|
165 .sp |
|
166 When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking |
|
167 by making recursive calls to an internal function called \fBmatch()\fP. In |
|
168 environments where the size of the stack is limited, this can severely limit |
|
169 PCRE's operation. (The Unix environment does not usually suffer from this |
|
170 problem, but it may sometimes be necessary to increase the maximum stack size. |
|
171 There is a discussion in the |
|
172 .\" HREF |
|
173 \fBpcrestack\fP |
|
174 .\" |
|
175 documentation.) An alternative approach to recursion that uses memory from the |
|
176 heap to remember data, instead of using recursive function calls, has been |
|
177 implemented to work round the problem of limited stack size. If you want to |
|
178 build a version of PCRE that works this way, add |
|
179 .sp |
|
180 --disable-stack-for-recursion |
|
181 .sp |
|
182 to the \fBconfigure\fP command. With this configuration, PCRE will use the |
|
183 \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory |
|
184 management functions. By default these point to \fBmalloc()\fP and |
|
185 \fBfree()\fP, but you can replace the pointers so that your own functions are |
|
186 used. |
|
187 .P |
|
188 Separate functions are provided rather than using \fBpcre_malloc\fP and |
|
189 \fBpcre_free\fP because the usage is very predictable: the block sizes |
|
190 requested are always the same, and the blocks are always freed in reverse |
|
191 order. A calling program might be able to implement optimized functions that |
|
192 perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more |
|
193 slowly when built in this way. This option affects only the \fBpcre_exec()\fP |
|
194 function; it is not relevant for the the \fBpcre_dfa_exec()\fP function. |
|
195 . |
|
196 .SH "LIMITING PCRE RESOURCE USAGE" |
|
197 .rs |
|
198 .sp |
|
199 Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly |
|
200 (sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP |
|
201 function. By controlling the maximum number of times this function may be |
|
202 called during a single matching operation, a limit can be placed on the |
|
203 resources used by a single call to \fBpcre_exec()\fP. The limit can be changed |
|
204 at run time, as described in the |
|
205 .\" HREF |
|
206 \fBpcreapi\fP |
|
207 .\" |
|
208 documentation. The default is 10 million, but this can be changed by adding a |
|
209 setting such as |
|
210 .sp |
|
211 --with-match-limit=500000 |
|
212 .sp |
|
213 to the \fBconfigure\fP command. This setting has no effect on the |
|
214 \fBpcre_dfa_exec()\fP matching function. |
|
215 .P |
|
216 In some environments it is desirable to limit the depth of recursive calls of |
|
217 \fBmatch()\fP more strictly than the total number of calls, in order to |
|
218 restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion |
|
219 is specified) that is used. A second limit controls this; it defaults to the |
|
220 value that is set for --with-match-limit, which imposes no additional |
|
221 constraints. However, you can set a lower limit by adding, for example, |
|
222 .sp |
|
223 --with-match-limit-recursion=10000 |
|
224 .sp |
|
225 to the \fBconfigure\fP command. This value can also be overridden at run time. |
|
226 . |
|
227 .SH "CREATING CHARACTER TABLES AT BUILD TIME" |
|
228 .rs |
|
229 .sp |
|
230 PCRE uses fixed tables for processing characters whose code values are less |
|
231 than 256. By default, PCRE is built with a set of tables that are distributed |
|
232 in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes |
|
233 only. If you add |
|
234 .sp |
|
235 --enable-rebuild-chartables |
|
236 .sp |
|
237 to the \fBconfigure\fP command, the distributed tables are no longer used. |
|
238 Instead, a program called \fBdftables\fP is compiled and run. This outputs the |
|
239 source for new set of tables, created in the default locale of your C runtime |
|
240 system. (This method of replacing the tables does not work if you are cross |
|
241 compiling, because \fBdftables\fP is run on the local host. If you need to |
|
242 create alternative tables when cross compiling, you will have to do so "by |
|
243 hand".) |
|
244 . |
|
245 .SH "USING EBCDIC CODE" |
|
246 .rs |
|
247 .sp |
|
248 PCRE assumes by default that it will run in an environment where the character |
|
249 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for |
|
250 most computer operating systems. PCRE can, however, be compiled to run in an |
|
251 EBCDIC environment by adding |
|
252 .sp |
|
253 --enable-ebcdic |
|
254 .sp |
|
255 to the \fBconfigure\fP command. This setting implies |
|
256 --enable-rebuild-chartables. You should only use it if you know that you are in |
|
257 an EBCDIC environment (for example, an IBM mainframe operating system). |
|
258 . |
|
259 .SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT" |
|
260 .rs |
|
261 .sp |
|
262 By default, \fBpcregrep\fP reads all files as plain text. You can build it so |
|
263 that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads |
|
264 them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of |
|
265 .sp |
|
266 --enable-pcregrep-libz |
|
267 --enable-pcregrep-libbz2 |
|
268 .sp |
|
269 to the \fBconfigure\fP command. These options naturally require that the |
|
270 relevant libraries are installed on your system. Configuration will fail if |
|
271 they are not. |
|
272 . |
|
273 .SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT" |
|
274 .rs |
|
275 .sp |
|
276 If you add |
|
277 .sp |
|
278 --enable-pcretest-libreadline |
|
279 .sp |
|
280 to the \fBconfigure\fP command, \fBpcretest\fP is linked with the |
|
281 \fBlibreadline\fP library, and when its input is from a terminal, it reads it |
|
282 using the \fBreadline()\fP function. This provides line-editing and history |
|
283 facilities. Note that \fBlibreadline\fP is GPL-licenced, so if you distribute a |
|
284 binary of \fBpcretest\fP linked in this way, there may be licensing issues. |
|
285 .P |
|
286 Setting this option causes the \fB-lreadline\fP option to be added to the |
|
287 \fBpcretest\fP build. In many operating environments with a sytem-installed |
|
288 \fBlibreadline\fP this is sufficient. However, in some environments (e.g. |
|
289 if an unmodified distribution version of readline is in use), some extra |
|
290 configuration may be necessary. The INSTALL file for \fBlibreadline\fP says |
|
291 this: |
|
292 .sp |
|
293 "Readline uses the termcap functions, but does not link with the |
|
294 termcap or curses library itself, allowing applications which link |
|
295 with readline the to choose an appropriate library." |
|
296 .sp |
|
297 If your environment has not been set up so that an appropriate library is |
|
298 automatically included, you may need to add something like |
|
299 .sp |
|
300 LIBS="-ncurses" |
|
301 .sp |
|
302 immediately before the \fBconfigure\fP command. |
|
303 . |
|
304 . |
|
305 .SH "SEE ALSO" |
|
306 .rs |
|
307 .sp |
|
308 \fBpcreapi\fP(3), \fBpcre_config\fP(3). |
|
309 . |
|
310 . |
|
311 .SH AUTHOR |
|
312 .rs |
|
313 .sp |
|
314 .nf |
|
315 Philip Hazel |
|
316 University Computing Service |
|
317 Cambridge CB2 3QH, England. |
|
318 .fi |
|
319 . |
|
320 . |
|
321 .SH REVISION |
|
322 .rs |
|
323 .sp |
|
324 .nf |
|
325 Last updated: 13 April 2008 |
|
326 Copyright (c) 1997-2008 University of Cambridge. |
|
327 .fi |