|
1 <html> |
|
2 <head> |
|
3 <title>pcresyntax specification</title> |
|
4 </head> |
|
5 <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> |
|
6 <h1>pcresyntax man page</h1> |
|
7 <p> |
|
8 Return to the <a href="index.html">PCRE index page</a>. |
|
9 </p> |
|
10 <p> |
|
11 This page is part of the PCRE HTML documentation. It was generated automatically |
|
12 from the original man page. If there is any nonsense in it, please consult the |
|
13 man page, in case the conversion went wrong. |
|
14 <br> |
|
15 <ul> |
|
16 <li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> |
|
17 <li><a name="TOC2" href="#SEC2">QUOTING</a> |
|
18 <li><a name="TOC3" href="#SEC3">CHARACTERS</a> |
|
19 <li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> |
|
20 <li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a> |
|
21 <li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a> |
|
22 <li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a> |
|
23 <li><a name="TOC8" href="#SEC8">QUANTIFIERS</a> |
|
24 <li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a> |
|
25 <li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a> |
|
26 <li><a name="TOC11" href="#SEC11">ALTERNATION</a> |
|
27 <li><a name="TOC12" href="#SEC12">CAPTURING</a> |
|
28 <li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a> |
|
29 <li><a name="TOC14" href="#SEC14">COMMENT</a> |
|
30 <li><a name="TOC15" href="#SEC15">OPTION SETTING</a> |
|
31 <li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> |
|
32 <li><a name="TOC17" href="#SEC17">BACKREFERENCES</a> |
|
33 <li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> |
|
34 <li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a> |
|
35 <li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a> |
|
36 <li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a> |
|
37 <li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a> |
|
38 <li><a name="TOC23" href="#SEC23">CALLOUTS</a> |
|
39 <li><a name="TOC24" href="#SEC24">SEE ALSO</a> |
|
40 <li><a name="TOC25" href="#SEC25">AUTHOR</a> |
|
41 <li><a name="TOC26" href="#SEC26">REVISION</a> |
|
42 </ul> |
|
43 <br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> |
|
44 <P> |
|
45 The full syntax and semantics of the regular expressions that are supported by |
|
46 PCRE are described in the |
|
47 <a href="pcrepattern.html"><b>pcrepattern</b></a> |
|
48 documentation. This document contains just a quick-reference summary of the |
|
49 syntax. |
|
50 </P> |
|
51 <br><a name="SEC2" href="#TOC1">QUOTING</a><br> |
|
52 <P> |
|
53 <pre> |
|
54 \x where x is non-alphanumeric is a literal x |
|
55 \Q...\E treat enclosed characters as literal |
|
56 </PRE> |
|
57 </P> |
|
58 <br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> |
|
59 <P> |
|
60 <pre> |
|
61 \a alarm, that is, the BEL character (hex 07) |
|
62 \cx "control-x", where x is any character |
|
63 \e escape (hex 1B) |
|
64 \f formfeed (hex 0C) |
|
65 \n newline (hex 0A) |
|
66 \r carriage return (hex 0D) |
|
67 \t tab (hex 09) |
|
68 \ddd character with octal code ddd, or backreference |
|
69 \xhh character with hex code hh |
|
70 \x{hhh..} character with hex code hhh.. |
|
71 </PRE> |
|
72 </P> |
|
73 <br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> |
|
74 <P> |
|
75 <pre> |
|
76 . any character except newline; |
|
77 in dotall mode, any character whatsoever |
|
78 \C one byte, even in UTF-8 mode (best avoided) |
|
79 \d a decimal digit |
|
80 \D a character that is not a decimal digit |
|
81 \h a horizontal whitespace character |
|
82 \H a character that is not a horizontal whitespace character |
|
83 \p{<i>xx</i>} a character with the <i>xx</i> property |
|
84 \P{<i>xx</i>} a character without the <i>xx</i> property |
|
85 \R a newline sequence |
|
86 \s a whitespace character |
|
87 \S a character that is not a whitespace character |
|
88 \v a vertical whitespace character |
|
89 \V a character that is not a vertical whitespace character |
|
90 \w a "word" character |
|
91 \W a "non-word" character |
|
92 \X an extended Unicode sequence |
|
93 </pre> |
|
94 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters. |
|
95 </P> |
|
96 <br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br> |
|
97 <P> |
|
98 <pre> |
|
99 C Other |
|
100 Cc Control |
|
101 Cf Format |
|
102 Cn Unassigned |
|
103 Co Private use |
|
104 Cs Surrogate |
|
105 |
|
106 L Letter |
|
107 Ll Lower case letter |
|
108 Lm Modifier letter |
|
109 Lo Other letter |
|
110 Lt Title case letter |
|
111 Lu Upper case letter |
|
112 L& Ll, Lu, or Lt |
|
113 |
|
114 M Mark |
|
115 Mc Spacing mark |
|
116 Me Enclosing mark |
|
117 Mn Non-spacing mark |
|
118 |
|
119 N Number |
|
120 Nd Decimal number |
|
121 Nl Letter number |
|
122 No Other number |
|
123 |
|
124 P Punctuation |
|
125 Pc Connector punctuation |
|
126 Pd Dash punctuation |
|
127 Pe Close punctuation |
|
128 Pf Final punctuation |
|
129 Pi Initial punctuation |
|
130 Po Other punctuation |
|
131 Ps Open punctuation |
|
132 |
|
133 S Symbol |
|
134 Sc Currency symbol |
|
135 Sk Modifier symbol |
|
136 Sm Mathematical symbol |
|
137 So Other symbol |
|
138 |
|
139 Z Separator |
|
140 Zl Line separator |
|
141 Zp Paragraph separator |
|
142 Zs Space separator |
|
143 </PRE> |
|
144 </P> |
|
145 <br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> |
|
146 <P> |
|
147 Arabic, |
|
148 Armenian, |
|
149 Balinese, |
|
150 Bengali, |
|
151 Bopomofo, |
|
152 Braille, |
|
153 Buginese, |
|
154 Buhid, |
|
155 Canadian_Aboriginal, |
|
156 Cherokee, |
|
157 Common, |
|
158 Coptic, |
|
159 Cuneiform, |
|
160 Cypriot, |
|
161 Cyrillic, |
|
162 Deseret, |
|
163 Devanagari, |
|
164 Ethiopic, |
|
165 Georgian, |
|
166 Glagolitic, |
|
167 Gothic, |
|
168 Greek, |
|
169 Gujarati, |
|
170 Gurmukhi, |
|
171 Han, |
|
172 Hangul, |
|
173 Hanunoo, |
|
174 Hebrew, |
|
175 Hiragana, |
|
176 Inherited, |
|
177 Kannada, |
|
178 Katakana, |
|
179 Kharoshthi, |
|
180 Khmer, |
|
181 Lao, |
|
182 Latin, |
|
183 Limbu, |
|
184 Linear_B, |
|
185 Malayalam, |
|
186 Mongolian, |
|
187 Myanmar, |
|
188 New_Tai_Lue, |
|
189 Nko, |
|
190 Ogham, |
|
191 Old_Italic, |
|
192 Old_Persian, |
|
193 Oriya, |
|
194 Osmanya, |
|
195 Phags_Pa, |
|
196 Phoenician, |
|
197 Runic, |
|
198 Shavian, |
|
199 Sinhala, |
|
200 Syloti_Nagri, |
|
201 Syriac, |
|
202 Tagalog, |
|
203 Tagbanwa, |
|
204 Tai_Le, |
|
205 Tamil, |
|
206 Telugu, |
|
207 Thaana, |
|
208 Thai, |
|
209 Tibetan, |
|
210 Tifinagh, |
|
211 Ugaritic, |
|
212 Yi. |
|
213 </P> |
|
214 <br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br> |
|
215 <P> |
|
216 <pre> |
|
217 [...] positive character class |
|
218 [^...] negative character class |
|
219 [x-y] range (can be used for hex characters) |
|
220 [[:xxx:]] positive POSIX named set |
|
221 [[:^xxx:]] negative POSIX named set |
|
222 |
|
223 alnum alphanumeric |
|
224 alpha alphabetic |
|
225 ascii 0-127 |
|
226 blank space or tab |
|
227 cntrl control character |
|
228 digit decimal digit |
|
229 graph printing, excluding space |
|
230 lower lower case letter |
|
231 print printing, including space |
|
232 punct printing, excluding alphanumeric |
|
233 space whitespace |
|
234 upper upper case letter |
|
235 word same as \w |
|
236 xdigit hexadecimal digit |
|
237 </pre> |
|
238 In PCRE, POSIX character set names recognize only ASCII characters. You can use |
|
239 \Q...\E inside a character class. |
|
240 </P> |
|
241 <br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br> |
|
242 <P> |
|
243 <pre> |
|
244 ? 0 or 1, greedy |
|
245 ?+ 0 or 1, possessive |
|
246 ?? 0 or 1, lazy |
|
247 * 0 or more, greedy |
|
248 *+ 0 or more, possessive |
|
249 *? 0 or more, lazy |
|
250 + 1 or more, greedy |
|
251 ++ 1 or more, possessive |
|
252 +? 1 or more, lazy |
|
253 {n} exactly n |
|
254 {n,m} at least n, no more than m, greedy |
|
255 {n,m}+ at least n, no more than m, possessive |
|
256 {n,m}? at least n, no more than m, lazy |
|
257 {n,} n or more, greedy |
|
258 {n,}+ n or more, possessive |
|
259 {n,}? n or more, lazy |
|
260 </PRE> |
|
261 </P> |
|
262 <br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> |
|
263 <P> |
|
264 <pre> |
|
265 \b word boundary |
|
266 \B not a word boundary |
|
267 ^ start of subject |
|
268 also after internal newline in multiline mode |
|
269 \A start of subject |
|
270 $ end of subject |
|
271 also before newline at end of subject |
|
272 also before internal newline in multiline mode |
|
273 \Z end of subject |
|
274 also before newline at end of subject |
|
275 \z end of subject |
|
276 \G first matching position in subject |
|
277 </PRE> |
|
278 </P> |
|
279 <br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br> |
|
280 <P> |
|
281 <pre> |
|
282 \K reset start of match |
|
283 </PRE> |
|
284 </P> |
|
285 <br><a name="SEC11" href="#TOC1">ALTERNATION</a><br> |
|
286 <P> |
|
287 <pre> |
|
288 expr|expr|expr... |
|
289 </PRE> |
|
290 </P> |
|
291 <br><a name="SEC12" href="#TOC1">CAPTURING</a><br> |
|
292 <P> |
|
293 <pre> |
|
294 (...) capturing group |
|
295 (?<name>...) named capturing group (Perl) |
|
296 (?'name'...) named capturing group (Perl) |
|
297 (?P<name>...) named capturing group (Python) |
|
298 (?:...) non-capturing group |
|
299 (?|...) non-capturing group; reset group numbers for |
|
300 capturing groups in each alternative |
|
301 </PRE> |
|
302 </P> |
|
303 <br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br> |
|
304 <P> |
|
305 <pre> |
|
306 (?>...) atomic, non-capturing group |
|
307 </PRE> |
|
308 </P> |
|
309 <br><a name="SEC14" href="#TOC1">COMMENT</a><br> |
|
310 <P> |
|
311 <pre> |
|
312 (?#....) comment (not nestable) |
|
313 </PRE> |
|
314 </P> |
|
315 <br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br> |
|
316 <P> |
|
317 <pre> |
|
318 (?i) caseless |
|
319 (?J) allow duplicate names |
|
320 (?m) multiline |
|
321 (?s) single line (dotall) |
|
322 (?U) default ungreedy (lazy) |
|
323 (?x) extended (ignore white space) |
|
324 (?-...) unset option(s) |
|
325 </PRE> |
|
326 </P> |
|
327 <br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> |
|
328 <P> |
|
329 <pre> |
|
330 (?=...) positive look ahead |
|
331 (?!...) negative look ahead |
|
332 (?<=...) positive look behind |
|
333 (?<!...) negative look behind |
|
334 </pre> |
|
335 Each top-level branch of a look behind must be of a fixed length. |
|
336 </P> |
|
337 <br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br> |
|
338 <P> |
|
339 <pre> |
|
340 \n reference by number (can be ambiguous) |
|
341 \gn reference by number |
|
342 \g{n} reference by number |
|
343 \g{-n} relative reference by number |
|
344 \k<name> reference by name (Perl) |
|
345 \k'name' reference by name (Perl) |
|
346 \g{name} reference by name (Perl) |
|
347 \k{name} reference by name (.NET) |
|
348 (?P=name) reference by name (Python) |
|
349 </PRE> |
|
350 </P> |
|
351 <br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> |
|
352 <P> |
|
353 <pre> |
|
354 (?R) recurse whole pattern |
|
355 (?n) call subpattern by absolute number |
|
356 (?+n) call subpattern by relative number |
|
357 (?-n) call subpattern by relative number |
|
358 (?&name) call subpattern by name (Perl) |
|
359 (?P>name) call subpattern by name (Python) |
|
360 \g<name> call subpattern by name (Oniguruma) |
|
361 \g'name' call subpattern by name (Oniguruma) |
|
362 \g<n> call subpattern by absolute number (Oniguruma) |
|
363 \g'n' call subpattern by absolute number (Oniguruma) |
|
364 \g<+n> call subpattern by relative number (PCRE extension) |
|
365 \g'+n' call subpattern by relative number (PCRE extension) |
|
366 \g<-n> call subpattern by relative number (PCRE extension) |
|
367 \g'-n' call subpattern by relative number (PCRE extension) |
|
368 </PRE> |
|
369 </P> |
|
370 <br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br> |
|
371 <P> |
|
372 <pre> |
|
373 (?(condition)yes-pattern) |
|
374 (?(condition)yes-pattern|no-pattern) |
|
375 |
|
376 (?(n)... absolute reference condition |
|
377 (?(+n)... relative reference condition |
|
378 (?(-n)... relative reference condition |
|
379 (?(<name>)... named reference condition (Perl) |
|
380 (?('name')... named reference condition (Perl) |
|
381 (?(name)... named reference condition (PCRE) |
|
382 (?(R)... overall recursion condition |
|
383 (?(Rn)... specific group recursion condition |
|
384 (?(R&name)... specific recursion condition |
|
385 (?(DEFINE)... define subpattern for reference |
|
386 (?(assert)... assertion condition |
|
387 </PRE> |
|
388 </P> |
|
389 <br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br> |
|
390 <P> |
|
391 The following act immediately they are reached: |
|
392 <pre> |
|
393 (*ACCEPT) force successful match |
|
394 (*FAIL) force backtrack; synonym (*F) |
|
395 </pre> |
|
396 The following act only when a subsequent match failure causes a backtrack to |
|
397 reach them. They all force a match failure, but they differ in what happens |
|
398 afterwards. Those that advance the start-of-match point do so only if the |
|
399 pattern is not anchored. |
|
400 <pre> |
|
401 (*COMMIT) overall failure, no advance of starting point |
|
402 (*PRUNE) advance to next starting character |
|
403 (*SKIP) advance start to current matching position |
|
404 (*THEN) local failure, backtrack to next alternation |
|
405 </PRE> |
|
406 </P> |
|
407 <br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br> |
|
408 <P> |
|
409 These are recognized only at the very start of the pattern or after a |
|
410 (*BSR_...) option. |
|
411 <pre> |
|
412 (*CR) |
|
413 (*LF) |
|
414 (*CRLF) |
|
415 (*ANYCRLF) |
|
416 (*ANY) |
|
417 </PRE> |
|
418 </P> |
|
419 <br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br> |
|
420 <P> |
|
421 These are recognized only at the very start of the pattern or after a |
|
422 (*...) option that sets the newline convention. |
|
423 <pre> |
|
424 (*BSR_ANYCRLF) |
|
425 (*BSR_UNICODE) |
|
426 </PRE> |
|
427 </P> |
|
428 <br><a name="SEC23" href="#TOC1">CALLOUTS</a><br> |
|
429 <P> |
|
430 <pre> |
|
431 (?C) callout |
|
432 (?Cn) callout with data n |
|
433 </PRE> |
|
434 </P> |
|
435 <br><a name="SEC24" href="#TOC1">SEE ALSO</a><br> |
|
436 <P> |
|
437 <b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), |
|
438 <b>pcrematching</b>(3), <b>pcre</b>(3). |
|
439 </P> |
|
440 <br><a name="SEC25" href="#TOC1">AUTHOR</a><br> |
|
441 <P> |
|
442 Philip Hazel |
|
443 <br> |
|
444 University Computing Service |
|
445 <br> |
|
446 Cambridge CB2 3QH, England. |
|
447 <br> |
|
448 </P> |
|
449 <br><a name="SEC26" href="#TOC1">REVISION</a><br> |
|
450 <P> |
|
451 Last updated: 09 April 2008 |
|
452 <br> |
|
453 Copyright © 1997-2008 University of Cambridge. |
|
454 <br> |
|
455 <p> |
|
456 Return to the <a href="index.html">PCRE index page</a>. |
|
457 </p> |