|
1 PCREGREP(1) PCREGREP(1) |
|
2 |
|
3 |
|
4 NAME |
|
5 pcregrep - a grep with Perl-compatible regular expressions. |
|
6 |
|
7 |
|
8 SYNOPSIS |
|
9 pcregrep [options] [long options] [pattern] [path1 path2 ...] |
|
10 |
|
11 |
|
12 DESCRIPTION |
|
13 |
|
14 pcregrep searches files for character patterns, in the same way as |
|
15 other grep commands do, but it uses the PCRE regular expression library |
|
16 to support patterns that are compatible with the regular expressions of |
|
17 Perl 5. See pcrepattern(3) for a full description of syntax and seman- |
|
18 tics of the regular expressions that PCRE supports. |
|
19 |
|
20 Patterns, whether supplied on the command line or in a separate file, |
|
21 are given without delimiters. For example: |
|
22 |
|
23 pcregrep Thursday /etc/motd |
|
24 |
|
25 If you attempt to use delimiters (for example, by surrounding a pattern |
|
26 with slashes, as is common in Perl scripts), they are interpreted as |
|
27 part of the pattern. Quotes can of course be used to delimit patterns |
|
28 on the command line because they are interpreted by the shell, and |
|
29 indeed they are required if a pattern contains white space or shell |
|
30 metacharacters. |
|
31 |
|
32 The first argument that follows any option settings is treated as the |
|
33 single pattern to be matched when neither -e nor -f is present. Con- |
|
34 versely, when one or both of these options are used to specify pat- |
|
35 terns, all arguments are treated as path names. At least one of -e, -f, |
|
36 or an argument pattern must be provided. |
|
37 |
|
38 If no files are specified, pcregrep reads the standard input. The stan- |
|
39 dard input can also be referenced by a name consisting of a single |
|
40 hyphen. For example: |
|
41 |
|
42 pcregrep some-pattern /file1 - /file3 |
|
43 |
|
44 By default, each line that matches a pattern is copied to the standard |
|
45 output, and if there is more than one file, the file name is output at |
|
46 the start of each line, followed by a colon. However, there are options |
|
47 that can change how pcregrep behaves. In particular, the -M option |
|
48 makes it possible to search for patterns that span line boundaries. |
|
49 What defines a line boundary is controlled by the -N (--newline) |
|
50 option. |
|
51 |
|
52 Patterns are limited to 8K or BUFSIZ characters, whichever is the |
|
53 greater. BUFSIZ is defined in <stdio.h>. When there is more than one |
|
54 pattern (specified by the use of -e and/or -f), each pattern is applied |
|
55 to each line in the order in which they are defined, except that all |
|
56 the -e patterns are tried before the -f patterns. As soon as one pat- |
|
57 tern matches (or fails to match when -v is used), no further patterns |
|
58 are considered. |
|
59 |
|
60 When --only-matching, --file-offsets, or --line-offsets is used, the |
|
61 output is the part of the line that matched (either shown literally, or |
|
62 as an offset). In this case, scanning resumes immediately following the |
|
63 match, so that further matches on the same line can be found. If there |
|
64 are multiple patterns, they are all tried on the remainder of the line. |
|
65 However, patterns that follow the one that matched are not tried on the |
|
66 earlier part of the line. |
|
67 |
|
68 If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses |
|
69 the value to set a locale when calling the PCRE library. The --locale |
|
70 option can be used to override this. |
|
71 |
|
72 |
|
73 SUPPORT FOR COMPRESSED FILES |
|
74 |
|
75 It is possible to compile pcregrep so that it uses libz or libbz2 to |
|
76 read files whose names end in .gz or .bz2, respectively. You can find |
|
77 out whether your binary has support for one or both of these file types |
|
78 by running it with the --help option. If the appropriate support is not |
|
79 present, files are treated as plain text. The standard input is always |
|
80 so treated. |
|
81 |
|
82 |
|
83 OPTIONS |
|
84 |
|
85 -- This terminate the list of options. It is useful if the next |
|
86 item on the command line starts with a hyphen but is not an |
|
87 option. This allows for the processing of patterns and file- |
|
88 names that start with hyphens. |
|
89 |
|
90 -A number, --after-context=number |
|
91 Output number lines of context after each matching line. If |
|
92 filenames and/or line numbers are being output, a hyphen sep- |
|
93 arator is used instead of a colon for the context lines. A |
|
94 line containing "--" is output between each group of lines, |
|
95 unless they are in fact contiguous in the input file. The |
|
96 value of number is expected to be relatively small. However, |
|
97 pcregrep guarantees to have up to 8K of following text avail- |
|
98 able for context output. |
|
99 |
|
100 -B number, --before-context=number |
|
101 Output number lines of context before each matching line. If |
|
102 filenames and/or line numbers are being output, a hyphen sep- |
|
103 arator is used instead of a colon for the context lines. A |
|
104 line containing "--" is output between each group of lines, |
|
105 unless they are in fact contiguous in the input file. The |
|
106 value of number is expected to be relatively small. However, |
|
107 pcregrep guarantees to have up to 8K of preceding text avail- |
|
108 able for context output. |
|
109 |
|
110 -C number, --context=number |
|
111 Output number lines of context both before and after each |
|
112 matching line. This is equivalent to setting both -A and -B |
|
113 to the same value. |
|
114 |
|
115 -c, --count |
|
116 Do not output individual lines; instead just output a count |
|
117 of the number of lines that would otherwise have been output. |
|
118 If several files are given, a count is output for each of |
|
119 them. In this mode, the -A, -B, and -C options are ignored. |
|
120 |
|
121 --colour, --color |
|
122 If this option is given without any data, it is equivalent to |
|
123 "--colour=auto". If data is required, it must be given in |
|
124 the same shell item, separated by an equals sign. |
|
125 |
|
126 --colour=value, --color=value |
|
127 This option specifies under what circumstances the part of a |
|
128 line that matched a pattern should be coloured in the output. |
|
129 The value may be "never" (the default), "always", or "auto". |
|
130 In the latter case, colouring happens only if the standard |
|
131 output is connected to a terminal. The colour can be speci- |
|
132 fied by setting the environment variable PCREGREP_COLOUR or |
|
133 PCREGREP_COLOR. The value of this variable should be a string |
|
134 of two numbers, separated by a semicolon. They are copied |
|
135 directly into the control string for setting colour on a ter- |
|
136 minal, so it is your responsibility to ensure that they make |
|
137 sense. If neither of the environment variables is set, the |
|
138 default is "1;31", which gives red. |
|
139 |
|
140 -D action, --devices=action |
|
141 If an input path is not a regular file or a directory, |
|
142 "action" specifies how it is to be processed. Valid values |
|
143 are "read" (the default) or "skip" (silently skip the path). |
|
144 |
|
145 -d action, --directories=action |
|
146 If an input path is a directory, "action" specifies how it is |
|
147 to be processed. Valid values are "read" (the default), |
|
148 "recurse" (equivalent to the -r option), or "skip" (silently |
|
149 skip the path). In the default case, directories are read as |
|
150 if they were ordinary files. In some operating systems the |
|
151 effect of reading a directory like this is an immediate end- |
|
152 of-file. |
|
153 |
|
154 -e pattern, --regex=pattern, --regexp=pattern |
|
155 Specify a pattern to be matched. This option can be used mul- |
|
156 tiple times in order to specify several patterns. It can also |
|
157 be used as a way of specifying a single pattern that starts |
|
158 with a hyphen. When -e is used, no argument pattern is taken |
|
159 from the command line; all arguments are treated as file |
|
160 names. There is an overall maximum of 100 patterns. They are |
|
161 applied to each line in the order in which they are defined |
|
162 until one matches (or fails to match if -v is used). If -f is |
|
163 used with -e, the command line patterns are matched first, |
|
164 followed by the patterns from the file, independent of the |
|
165 order in which these options are specified. Note that multi- |
|
166 ple use of -e is not the same as a single pattern with alter- |
|
167 natives. For example, X|Y finds the first character in a line |
|
168 that is X or Y, whereas if the two patterns are given sepa- |
|
169 rately, pcregrep finds X if it is present, even if it follows |
|
170 Y in the line. It finds Y only if there is no X in the line. |
|
171 This really matters only if you are using -o to show the |
|
172 part(s) of the line that matched. |
|
173 |
|
174 --exclude=pattern |
|
175 When pcregrep is searching the files in a directory as a con- |
|
176 sequence of the -r (recursive search) option, any regular |
|
177 files whose names match the pattern are excluded. Subdirecto- |
|
178 ries are not excluded by this option; they are searched |
|
179 recursively, subject to the --exclude_dir and --include_dir |
|
180 options. The pattern is a PCRE regular expression, and is |
|
181 matched against the final component of the file name (not the |
|
182 entire path). If a file name matches both --include and |
|
183 --exclude, it is excluded. There is no short form for this |
|
184 option. |
|
185 |
|
186 --exclude_dir=pattern |
|
187 When pcregrep is searching the contents of a directory as a |
|
188 consequence of the -r (recursive search) option, any subdi- |
|
189 rectories whose names match the pattern are excluded. (Note |
|
190 that the --exclude option does not affect subdirectories.) |
|
191 The pattern is a PCRE regular expression, and is matched |
|
192 against the final component of the name (not the entire |
|
193 path). If a subdirectory name matches both --include_dir and |
|
194 --exclude_dir, it is excluded. There is no short form for |
|
195 this option. |
|
196 |
|
197 -F, --fixed-strings |
|
198 Interpret each pattern as a list of fixed strings, separated |
|
199 by newlines, instead of as a regular expression. The -w |
|
200 (match as a word) and -x (match whole line) options can be |
|
201 used with -F. They apply to each of the fixed strings. A line |
|
202 is selected if any of the fixed strings are found in it (sub- |
|
203 ject to -w or -x, if present). |
|
204 |
|
205 -f filename, --file=filename |
|
206 Read a number of patterns from the file, one per line, and |
|
207 match them against each line of input. A data line is output |
|
208 if any of the patterns match it. The filename can be given as |
|
209 "-" to refer to the standard input. When -f is used, patterns |
|
210 specified on the command line using -e may also be present; |
|
211 they are tested before the file's patterns. However, no other |
|
212 pattern is taken from the command line; all arguments are |
|
213 treated as file names. There is an overall maximum of 100 |
|
214 patterns. Trailing white space is removed from each line, and |
|
215 blank lines are ignored. An empty file contains no patterns |
|
216 and therefore matches nothing. See also the comments about |
|
217 multiple patterns versus a single pattern with alternatives |
|
218 in the description of -e above. |
|
219 |
|
220 --file-offsets |
|
221 Instead of showing lines or parts of lines that match, show |
|
222 each match as an offset from the start of the file and a |
|
223 length, separated by a comma. In this mode, no context is |
|
224 shown. That is, the -A, -B, and -C options are ignored. If |
|
225 there is more than one match in a line, each of them is shown |
|
226 separately. This option is mutually exclusive with --line- |
|
227 offsets and --only-matching. |
|
228 |
|
229 -H, --with-filename |
|
230 Force the inclusion of the filename at the start of output |
|
231 lines when searching a single file. By default, the filename |
|
232 is not shown in this case. For matching lines, the filename |
|
233 is followed by a colon and a space; for context lines, a |
|
234 hyphen separator is used. If a line number is also being out- |
|
235 put, it follows the file name without a space. |
|
236 |
|
237 -h, --no-filename |
|
238 Suppress the output filenames when searching multiple files. |
|
239 By default, filenames are shown when multiple files are |
|
240 searched. For matching lines, the filename is followed by a |
|
241 colon and a space; for context lines, a hyphen separator is |
|
242 used. If a line number is also being output, it follows the |
|
243 file name without a space. |
|
244 |
|
245 --help Output a help message, giving brief details of the command |
|
246 options and file type support, and then exit. |
|
247 |
|
248 -i, --ignore-case |
|
249 Ignore upper/lower case distinctions during comparisons. |
|
250 |
|
251 --include=pattern |
|
252 When pcregrep is searching the files in a directory as a con- |
|
253 sequence of the -r (recursive search) option, only those reg- |
|
254 ular files whose names match the pattern are included. Subdi- |
|
255 rectories are always included and searched recursively, sub- |
|
256 ject to the --include_dir and --exclude_dir options. The pat- |
|
257 tern is a PCRE regular expression, and is matched against the |
|
258 final component of the file name (not the entire path). If a |
|
259 file name matches both --include and --exclude, it is |
|
260 excluded. There is no short form for this option. |
|
261 |
|
262 --include_dir=pattern |
|
263 When pcregrep is searching the contents of a directory as a |
|
264 consequence of the -r (recursive search) option, only those |
|
265 subdirectories whose names match the pattern are included. |
|
266 (Note that the --include option does not affect subdirecto- |
|
267 ries.) The pattern is a PCRE regular expression, and is |
|
268 matched against the final component of the name (not the |
|
269 entire path). If a subdirectory name matches both |
|
270 --include_dir and --exclude_dir, it is excluded. There is no |
|
271 short form for this option. |
|
272 |
|
273 -L, --files-without-match |
|
274 Instead of outputting lines from the files, just output the |
|
275 names of the files that do not contain any lines that would |
|
276 have been output. Each file name is output once, on a sepa- |
|
277 rate line. |
|
278 |
|
279 -l, --files-with-matches |
|
280 Instead of outputting lines from the files, just output the |
|
281 names of the files containing lines that would have been out- |
|
282 put. Each file name is output once, on a separate line. |
|
283 Searching stops as soon as a matching line is found in a |
|
284 file. |
|
285 |
|
286 --label=name |
|
287 This option supplies a name to be used for the standard input |
|
288 when file names are being output. If not supplied, "(standard |
|
289 input)" is used. There is no short form for this option. |
|
290 |
|
291 --line-offsets |
|
292 Instead of showing lines or parts of lines that match, show |
|
293 each match as a line number, the offset from the start of the |
|
294 line, and a length. The line number is terminated by a colon |
|
295 (as usual; see the -n option), and the offset and length are |
|
296 separated by a comma. In this mode, no context is shown. |
|
297 That is, the -A, -B, and -C options are ignored. If there is |
|
298 more than one match in a line, each of them is shown sepa- |
|
299 rately. This option is mutually exclusive with --file-offsets |
|
300 and --only-matching. |
|
301 |
|
302 --locale=locale-name |
|
303 This option specifies a locale to be used for pattern match- |
|
304 ing. It overrides the value in the LC_ALL or LC_CTYPE envi- |
|
305 ronment variables. If no locale is specified, the PCRE |
|
306 library's default (usually the "C" locale) is used. There is |
|
307 no short form for this option. |
|
308 |
|
309 -M, --multiline |
|
310 Allow patterns to match more than one line. When this option |
|
311 is given, patterns may usefully contain literal newline char- |
|
312 acters and internal occurrences of ^ and $ characters. The |
|
313 output for any one match may consist of more than one line. |
|
314 When this option is set, the PCRE library is called in "mul- |
|
315 tiline" mode. There is a limit to the number of lines that |
|
316 can be matched, imposed by the way that pcregrep buffers the |
|
317 input file as it scans it. However, pcregrep ensures that at |
|
318 least 8K characters or the rest of the document (whichever is |
|
319 the shorter) are available for forward matching, and simi- |
|
320 larly the previous 8K characters (or all the previous charac- |
|
321 ters, if fewer than 8K) are guaranteed to be available for |
|
322 lookbehind assertions. |
|
323 |
|
324 -N newline-type, --newline=newline-type |
|
325 The PCRE library supports five different conventions for |
|
326 indicating the ends of lines. They are the single-character |
|
327 sequences CR (carriage return) and LF (linefeed), the two- |
|
328 character sequence CRLF, an "anycrlf" convention, which rec- |
|
329 ognizes any of the preceding three types, and an "any" con- |
|
330 vention, in which any Unicode line ending sequence is assumed |
|
331 to end a line. The Unicode sequences are the three just men- |
|
332 tioned, plus VT (vertical tab, U+000B), FF (formfeed, |
|
333 U+000C), NEL (next line, U+0085), LS (line separator, |
|
334 U+2028), and PS (paragraph separator, U+2029). |
|
335 |
|
336 When the PCRE library is built, a default line-ending |
|
337 sequence is specified. This is normally the standard |
|
338 sequence for the operating system. Unless otherwise specified |
|
339 by this option, pcregrep uses the library's default. The |
|
340 possible values for this option are CR, LF, CRLF, ANYCRLF, or |
|
341 ANY. This makes it possible to use pcregrep on files that |
|
342 have come from other environments without having to modify |
|
343 their line endings. If the data that is being scanned does |
|
344 not agree with the convention set by this option, pcregrep |
|
345 may behave in strange ways. |
|
346 |
|
347 -n, --line-number |
|
348 Precede each output line by its line number in the file, fol- |
|
349 lowed by a colon and a space for matching lines or a hyphen |
|
350 and a space for context lines. If the filename is also being |
|
351 output, it precedes the line number. This option is forced if |
|
352 --line-offsets is used. |
|
353 |
|
354 -o, --only-matching |
|
355 Show only the part of the line that matched a pattern. In |
|
356 this mode, no context is shown. That is, the -A, -B, and -C |
|
357 options are ignored. If there is more than one match in a |
|
358 line, each of them is shown separately. If -o is combined |
|
359 with -v (invert the sense of the match to find non-matching |
|
360 lines), no output is generated, but the return code is set |
|
361 appropriately. This option is mutually exclusive with --file- |
|
362 offsets and --line-offsets. |
|
363 |
|
364 -q, --quiet |
|
365 Work quietly, that is, display nothing except error messages. |
|
366 The exit status indicates whether or not any matches were |
|
367 found. |
|
368 |
|
369 -r, --recursive |
|
370 If any given path is a directory, recursively scan the files |
|
371 it contains, taking note of any --include and --exclude set- |
|
372 tings. By default, a directory is read as a normal file; in |
|
373 some operating systems this gives an immediate end-of-file. |
|
374 This option is a shorthand for setting the -d option to |
|
375 "recurse". |
|
376 |
|
377 -s, --no-messages |
|
378 Suppress error messages about non-existent or unreadable |
|
379 files. Such files are quietly skipped. However, the return |
|
380 code is still 2, even if matches were found in other files. |
|
381 |
|
382 -u, --utf-8 |
|
383 Operate in UTF-8 mode. This option is available only if PCRE |
|
384 has been compiled with UTF-8 support. Both patterns and sub- |
|
385 ject lines must be valid strings of UTF-8 characters. |
|
386 |
|
387 -V, --version |
|
388 Write the version numbers of pcregrep and the PCRE library |
|
389 that is being used to the standard error stream. |
|
390 |
|
391 -v, --invert-match |
|
392 Invert the sense of the match, so that lines which do not |
|
393 match any of the patterns are the ones that are found. |
|
394 |
|
395 -w, --word-regex, --word-regexp |
|
396 Force the patterns to match only whole words. This is equiva- |
|
397 lent to having \b at the start and end of the pattern. |
|
398 |
|
399 -x, --line-regex, --line-regexp |
|
400 Force the patterns to be anchored (each must start matching |
|
401 at the beginning of a line) and in addition, require them to |
|
402 match entire lines. This is equivalent to having ^ and $ |
|
403 characters at the start and end of each alternative branch in |
|
404 every pattern. |
|
405 |
|
406 |
|
407 ENVIRONMENT VARIABLES |
|
408 |
|
409 The environment variables LC_ALL and LC_CTYPE are examined, in that |
|
410 order, for a locale. The first one that is set is used. This can be |
|
411 overridden by the --locale option. If no locale is set, the PCRE |
|
412 library's default (usually the "C" locale) is used. |
|
413 |
|
414 |
|
415 NEWLINES |
|
416 |
|
417 The -N (--newline) option allows pcregrep to scan files with different |
|
418 newline conventions from the default. However, the setting of this |
|
419 option does not affect the way in which pcregrep writes information to |
|
420 the standard error and output streams. It uses the string "\n" in C |
|
421 printf() calls to indicate newlines, relying on the C I/O library to |
|
422 convert this to an appropriate sequence if the output is sent to a |
|
423 file. |
|
424 |
|
425 |
|
426 OPTIONS COMPATIBILITY |
|
427 |
|
428 The majority of short and long forms of pcregrep's options are the same |
|
429 as in the GNU grep program. Any long option of the form --xxx-regexp |
|
430 (GNU terminology) is also available as --xxx-regex (PCRE terminology). |
|
431 However, the --locale, -M, --multiline, -u, and --utf-8 options are |
|
432 specific to pcregrep. |
|
433 |
|
434 |
|
435 OPTIONS WITH DATA |
|
436 |
|
437 There are four different ways in which an option with data can be spec- |
|
438 ified. If a short form option is used, the data may follow immedi- |
|
439 ately, or in the next command line item. For example: |
|
440 |
|
441 -f/some/file |
|
442 -f /some/file |
|
443 |
|
444 If a long form option is used, the data may appear in the same command |
|
445 line item, separated by an equals character, or (with one exception) it |
|
446 may appear in the next command line item. For example: |
|
447 |
|
448 --file=/some/file |
|
449 --file /some/file |
|
450 |
|
451 Note, however, that if you want to supply a file name beginning with ~ |
|
452 as data in a shell command, and have the shell expand ~ to a home |
|
453 directory, you must separate the file name from the option, because the |
|
454 shell does not treat ~ specially unless it is at the start of an item. |
|
455 |
|
456 The exception to the above is the --colour (or --color) option, for |
|
457 which the data is optional. If this option does have data, it must be |
|
458 given in the first form, using an equals character. Otherwise it will |
|
459 be assumed that it has no data. |
|
460 |
|
461 |
|
462 MATCHING ERRORS |
|
463 |
|
464 It is possible to supply a regular expression that takes a very long |
|
465 time to fail to match certain lines. Such patterns normally involve |
|
466 nested indefinite repeats, for example: (a+)*\d when matched against a |
|
467 line of a's with no final digit. The PCRE matching function has a |
|
468 resource limit that causes it to abort in these circumstances. If this |
|
469 happens, pcregrep outputs an error message and the line that caused the |
|
470 problem to the standard error stream. If there are more than 20 such |
|
471 errors, pcregrep gives up. |
|
472 |
|
473 |
|
474 DIAGNOSTICS |
|
475 |
|
476 Exit status is 0 if any matches were found, 1 if no matches were found, |
|
477 and 2 for syntax errors and non-existent or inacessible files (even if |
|
478 matches were found in other files) or too many matching errors. Using |
|
479 the -s option to suppress error messages about inaccessble files does |
|
480 not affect the return code. |
|
481 |
|
482 |
|
483 SEE ALSO |
|
484 |
|
485 pcrepattern(3), pcretest(1). |
|
486 |
|
487 |
|
488 AUTHOR |
|
489 |
|
490 Philip Hazel |
|
491 University Computing Service |
|
492 Cambridge CB2 3QH, England. |
|
493 |
|
494 |
|
495 REVISION |
|
496 |
|
497 Last updated: 08 March 2008 |
|
498 Copyright (c) 1997-2008 University of Cambridge. |