|
1 <html> |
|
2 <head> |
|
3 <title>pcrebuild specification</title> |
|
4 </head> |
|
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
|
6 <h1>pcrebuild man page</h1> |
|
7 <p> |
|
8 Return to the <a href="index.html">PCRE index page</a>. |
|
9 </p> |
|
10 <p> |
|
11 This page is part of the PCRE HTML documentation. It was generated automatically |
|
12 from the original man page. If there is any nonsense in it, please consult the |
|
13 man page, in case the conversion went wrong. |
|
14 <br> |
|
15 <ul> |
|
16 <li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a> |
|
17 <li><a name="TOC2" href="#SEC2">C++ SUPPORT</a> |
|
18 <li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a> |
|
19 <li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a> |
|
20 <li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a> |
|
21 <li><a name="TOC6" href="#SEC6">WHAT \R MATCHES</a> |
|
22 <li><a name="TOC7" href="#SEC7">BUILDING SHARED AND STATIC LIBRARIES</a> |
|
23 <li><a name="TOC8" href="#SEC8">POSIX MALLOC USAGE</a> |
|
24 <li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a> |
|
25 <li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a> |
|
26 <li><a name="TOC11" href="#SEC11">LIMITING PCRE RESOURCE USAGE</a> |
|
27 <li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a> |
|
28 <li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a> |
|
29 <li><a name="TOC14" href="#SEC14">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a> |
|
30 <li><a name="TOC15" href="#SEC15">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a> |
|
31 <li><a name="TOC16" href="#SEC16">SEE ALSO</a> |
|
32 <li><a name="TOC17" href="#SEC17">AUTHOR</a> |
|
33 <li><a name="TOC18" href="#SEC18">REVISION</a> |
|
34 </ul> |
|
35 <br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br> |
|
36 <P> |
|
37 This document describes the optional features of PCRE that can be selected when |
|
38 the library is compiled. It assumes use of the <b>configure</b> script, where |
|
39 the optional features are selected or deselected by providing options to |
|
40 <b>configure</b> before running the <b>make</b> command. However, the same |
|
41 options can be selected in both Unix-like and non-Unix-like environments using |
|
42 the GUI facility of <b>CMakeSetup</b> if you are using <b>CMake</b> instead of |
|
43 <b>configure</b> to build PCRE. |
|
44 </P> |
|
45 <P> |
|
46 The complete list of options for <b>configure</b> (which includes the standard |
|
47 ones such as the selection of the installation directory) can be obtained by |
|
48 running |
|
49 <pre> |
|
50 ./configure --help |
|
51 </pre> |
|
52 The following sections include descriptions of options whose names begin with |
|
53 --enable or --disable. These settings specify changes to the defaults for the |
|
54 <b>configure</b> command. Because of the way that <b>configure</b> works, |
|
55 --enable and --disable always come in pairs, so the complementary option always |
|
56 exists as well, but as it specifies the default, it is not described. |
|
57 </P> |
|
58 <br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br> |
|
59 <P> |
|
60 By default, the <b>configure</b> script will search for a C++ compiler and C++ |
|
61 header files. If it finds them, it automatically builds the C++ wrapper library |
|
62 for PCRE. You can disable this by adding |
|
63 <pre> |
|
64 --disable-cpp |
|
65 </pre> |
|
66 to the <b>configure</b> command. |
|
67 </P> |
|
68 <br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br> |
|
69 <P> |
|
70 To build PCRE with support for UTF-8 character strings, add |
|
71 <pre> |
|
72 --enable-utf8 |
|
73 </pre> |
|
74 to the <b>configure</b> command. Of itself, this does not make PCRE treat |
|
75 strings as UTF-8. As well as compiling PCRE with this option, you also have |
|
76 have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b> |
|
77 function. |
|
78 </P> |
|
79 <br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br> |
|
80 <P> |
|
81 UTF-8 support allows PCRE to process character values greater than 255 in the |
|
82 strings that it handles. On its own, however, it does not provide any |
|
83 facilities for accessing the properties of such characters. If you want to be |
|
84 able to use the pattern escapes \P, \p, and \X, which refer to Unicode |
|
85 character properties, you must add |
|
86 <pre> |
|
87 --enable-unicode-properties |
|
88 </pre> |
|
89 to the <b>configure</b> command. This implies UTF-8 support, even if you have |
|
90 not explicitly requested it. |
|
91 </P> |
|
92 <P> |
|
93 Including Unicode property support adds around 30K of tables to the PCRE |
|
94 library. Only the general category properties such as <i>Lu</i> and <i>Nd</i> are |
|
95 supported. Details are given in the |
|
96 <a href="pcrepattern.html"><b>pcrepattern</b></a> |
|
97 documentation. |
|
98 </P> |
|
99 <br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br> |
|
100 <P> |
|
101 By default, PCRE interprets character 10 (linefeed, LF) as indicating the end |
|
102 of a line. This is the normal newline character on Unix-like systems. You can |
|
103 compile PCRE to use character 13 (carriage return, CR) instead, by adding |
|
104 <pre> |
|
105 --enable-newline-is-cr |
|
106 </pre> |
|
107 to the <b>configure</b> command. There is also a --enable-newline-is-lf option, |
|
108 which explicitly specifies linefeed as the newline character. |
|
109 <br> |
|
110 <br> |
|
111 Alternatively, you can specify that line endings are to be indicated by the two |
|
112 character sequence CRLF. If you want this, add |
|
113 <pre> |
|
114 --enable-newline-is-crlf |
|
115 </pre> |
|
116 to the <b>configure</b> command. There is a fourth option, specified by |
|
117 <pre> |
|
118 --enable-newline-is-anycrlf |
|
119 </pre> |
|
120 which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as |
|
121 indicating a line ending. Finally, a fifth option, specified by |
|
122 <pre> |
|
123 --enable-newline-is-any |
|
124 </pre> |
|
125 causes PCRE to recognize any Unicode newline sequence. |
|
126 </P> |
|
127 <P> |
|
128 Whatever line ending convention is selected when PCRE is built can be |
|
129 overridden when the library functions are called. At build time it is |
|
130 conventional to use the standard for your operating system. |
|
131 </P> |
|
132 <br><a name="SEC6" href="#TOC1">WHAT \R MATCHES</a><br> |
|
133 <P> |
|
134 By default, the sequence \R in a pattern matches any Unicode newline sequence, |
|
135 whatever has been selected as the line ending sequence. If you specify |
|
136 <pre> |
|
137 --enable-bsr-anycrlf |
|
138 </pre> |
|
139 the default is changed so that \R matches only CR, LF, or CRLF. Whatever is |
|
140 selected when PCRE is built can be overridden when the library functions are |
|
141 called. |
|
142 </P> |
|
143 <br><a name="SEC7" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br> |
|
144 <P> |
|
145 The PCRE building process uses <b>libtool</b> to build both shared and static |
|
146 Unix libraries by default. You can suppress one of these by adding one of |
|
147 <pre> |
|
148 --disable-shared |
|
149 --disable-static |
|
150 </pre> |
|
151 to the <b>configure</b> command, as required. |
|
152 </P> |
|
153 <br><a name="SEC8" href="#TOC1">POSIX MALLOC USAGE</a><br> |
|
154 <P> |
|
155 When PCRE is called through the POSIX interface (see the |
|
156 <a href="pcreposix.html"><b>pcreposix</b></a> |
|
157 documentation), additional working storage is required for holding the pointers |
|
158 to capturing substrings, because PCRE requires three integers per substring, |
|
159 whereas the POSIX interface provides only two. If the number of expected |
|
160 substrings is small, the wrapper function uses space on the stack, because this |
|
161 is faster than using <b>malloc()</b> for each call. The default threshold above |
|
162 which the stack is no longer used is 10; it can be changed by adding a setting |
|
163 such as |
|
164 <pre> |
|
165 --with-posix-malloc-threshold=20 |
|
166 </pre> |
|
167 to the <b>configure</b> command. |
|
168 </P> |
|
169 <br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br> |
|
170 <P> |
|
171 Within a compiled pattern, offset values are used to point from one part to |
|
172 another (for example, from an opening parenthesis to an alternation |
|
173 metacharacter). By default, two-byte values are used for these offsets, leading |
|
174 to a maximum size for a compiled pattern of around 64K. This is sufficient to |
|
175 handle all but the most gigantic patterns. Nevertheless, some people do want to |
|
176 process enormous patterns, so it is possible to compile PCRE to use three-byte |
|
177 or four-byte offsets by adding a setting such as |
|
178 <pre> |
|
179 --with-link-size=3 |
|
180 </pre> |
|
181 to the <b>configure</b> command. The value given must be 2, 3, or 4. Using |
|
182 longer offsets slows down the operation of PCRE because it has to load |
|
183 additional bytes when handling them. |
|
184 </P> |
|
185 <br><a name="SEC10" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br> |
|
186 <P> |
|
187 When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking |
|
188 by making recursive calls to an internal function called <b>match()</b>. In |
|
189 environments where the size of the stack is limited, this can severely limit |
|
190 PCRE's operation. (The Unix environment does not usually suffer from this |
|
191 problem, but it may sometimes be necessary to increase the maximum stack size. |
|
192 There is a discussion in the |
|
193 <a href="pcrestack.html"><b>pcrestack</b></a> |
|
194 documentation.) An alternative approach to recursion that uses memory from the |
|
195 heap to remember data, instead of using recursive function calls, has been |
|
196 implemented to work round the problem of limited stack size. If you want to |
|
197 build a version of PCRE that works this way, add |
|
198 <pre> |
|
199 --disable-stack-for-recursion |
|
200 </pre> |
|
201 to the <b>configure</b> command. With this configuration, PCRE will use the |
|
202 <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory |
|
203 management functions. By default these point to <b>malloc()</b> and |
|
204 <b>free()</b>, but you can replace the pointers so that your own functions are |
|
205 used. |
|
206 </P> |
|
207 <P> |
|
208 Separate functions are provided rather than using <b>pcre_malloc</b> and |
|
209 <b>pcre_free</b> because the usage is very predictable: the block sizes |
|
210 requested are always the same, and the blocks are always freed in reverse |
|
211 order. A calling program might be able to implement optimized functions that |
|
212 perform better than <b>malloc()</b> and <b>free()</b>. PCRE runs noticeably more |
|
213 slowly when built in this way. This option affects only the <b>pcre_exec()</b> |
|
214 function; it is not relevant for the the <b>pcre_dfa_exec()</b> function. |
|
215 </P> |
|
216 <br><a name="SEC11" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br> |
|
217 <P> |
|
218 Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly |
|
219 (sometimes recursively) when matching a pattern with the <b>pcre_exec()</b> |
|
220 function. By controlling the maximum number of times this function may be |
|
221 called during a single matching operation, a limit can be placed on the |
|
222 resources used by a single call to <b>pcre_exec()</b>. The limit can be changed |
|
223 at run time, as described in the |
|
224 <a href="pcreapi.html"><b>pcreapi</b></a> |
|
225 documentation. The default is 10 million, but this can be changed by adding a |
|
226 setting such as |
|
227 <pre> |
|
228 --with-match-limit=500000 |
|
229 </pre> |
|
230 to the <b>configure</b> command. This setting has no effect on the |
|
231 <b>pcre_dfa_exec()</b> matching function. |
|
232 </P> |
|
233 <P> |
|
234 In some environments it is desirable to limit the depth of recursive calls of |
|
235 <b>match()</b> more strictly than the total number of calls, in order to |
|
236 restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion |
|
237 is specified) that is used. A second limit controls this; it defaults to the |
|
238 value that is set for --with-match-limit, which imposes no additional |
|
239 constraints. However, you can set a lower limit by adding, for example, |
|
240 <pre> |
|
241 --with-match-limit-recursion=10000 |
|
242 </pre> |
|
243 to the <b>configure</b> command. This value can also be overridden at run time. |
|
244 </P> |
|
245 <br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br> |
|
246 <P> |
|
247 PCRE uses fixed tables for processing characters whose code values are less |
|
248 than 256. By default, PCRE is built with a set of tables that are distributed |
|
249 in the file <i>pcre_chartables.c.dist</i>. These tables are for ASCII codes |
|
250 only. If you add |
|
251 <pre> |
|
252 --enable-rebuild-chartables |
|
253 </pre> |
|
254 to the <b>configure</b> command, the distributed tables are no longer used. |
|
255 Instead, a program called <b>dftables</b> is compiled and run. This outputs the |
|
256 source for new set of tables, created in the default locale of your C runtime |
|
257 system. (This method of replacing the tables does not work if you are cross |
|
258 compiling, because <b>dftables</b> is run on the local host. If you need to |
|
259 create alternative tables when cross compiling, you will have to do so "by |
|
260 hand".) |
|
261 </P> |
|
262 <br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br> |
|
263 <P> |
|
264 PCRE assumes by default that it will run in an environment where the character |
|
265 code is ASCII (or Unicode, which is a superset of ASCII). This is the case for |
|
266 most computer operating systems. PCRE can, however, be compiled to run in an |
|
267 EBCDIC environment by adding |
|
268 <pre> |
|
269 --enable-ebcdic |
|
270 </pre> |
|
271 to the <b>configure</b> command. This setting implies |
|
272 --enable-rebuild-chartables. You should only use it if you know that you are in |
|
273 an EBCDIC environment (for example, an IBM mainframe operating system). |
|
274 </P> |
|
275 <br><a name="SEC14" href="#TOC1">PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br> |
|
276 <P> |
|
277 By default, <b>pcregrep</b> reads all files as plain text. You can build it so |
|
278 that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads |
|
279 them with <b>libz</b> or <b>libbz2</b>, respectively, by adding one or both of |
|
280 <pre> |
|
281 --enable-pcregrep-libz |
|
282 --enable-pcregrep-libbz2 |
|
283 </pre> |
|
284 to the <b>configure</b> command. These options naturally require that the |
|
285 relevant libraries are installed on your system. Configuration will fail if |
|
286 they are not. |
|
287 </P> |
|
288 <br><a name="SEC15" href="#TOC1">PCRETEST OPTION FOR LIBREADLINE SUPPORT</a><br> |
|
289 <P> |
|
290 If you add |
|
291 <pre> |
|
292 --enable-pcretest-libreadline |
|
293 </pre> |
|
294 to the <b>configure</b> command, <b>pcretest</b> is linked with the |
|
295 <b>libreadline</b> library, and when its input is from a terminal, it reads it |
|
296 using the <b>readline()</b> function. This provides line-editing and history |
|
297 facilities. Note that <b>libreadline</b> is GPL-licenced, so if you distribute a |
|
298 binary of <b>pcretest</b> linked in this way, there may be licensing issues. |
|
299 </P> |
|
300 <P> |
|
301 Setting this option causes the <b>-lreadline</b> option to be added to the |
|
302 <b>pcretest</b> build. In many operating environments with a sytem-installed |
|
303 <b>libreadline</b> this is sufficient. However, in some environments (e.g. |
|
304 if an unmodified distribution version of readline is in use), some extra |
|
305 configuration may be necessary. The INSTALL file for <b>libreadline</b> says |
|
306 this: |
|
307 <pre> |
|
308 "Readline uses the termcap functions, but does not link with the |
|
309 termcap or curses library itself, allowing applications which link |
|
310 with readline the to choose an appropriate library." |
|
311 </pre> |
|
312 If your environment has not been set up so that an appropriate library is |
|
313 automatically included, you may need to add something like |
|
314 <pre> |
|
315 LIBS="-ncurses" |
|
316 </pre> |
|
317 immediately before the <b>configure</b> command. |
|
318 </P> |
|
319 <br><a name="SEC16" href="#TOC1">SEE ALSO</a><br> |
|
320 <P> |
|
321 <b>pcreapi</b>(3), <b>pcre_config</b>(3). |
|
322 </P> |
|
323 <br><a name="SEC17" href="#TOC1">AUTHOR</a><br> |
|
324 <P> |
|
325 Philip Hazel |
|
326 <br> |
|
327 University Computing Service |
|
328 <br> |
|
329 Cambridge CB2 3QH, England. |
|
330 <br> |
|
331 </P> |
|
332 <br><a name="SEC18" href="#TOC1">REVISION</a><br> |
|
333 <P> |
|
334 Last updated: 13 April 2008 |
|
335 <br> |
|
336 Copyright © 1997-2008 University of Cambridge. |
|
337 <br> |
|
338 <p> |
|
339 Return to the <a href="index.html">PCRE index page</a>. |
|
340 </p> |