|
1 # |
|
2 # Module Parse::Yapp.pm. |
|
3 # |
|
4 # Copyright (c) 1998-2001, Francois Desarmenien, all right reserved. |
|
5 # |
|
6 # See the Copyright section at the end of the Parse/Yapp.pm pod section |
|
7 # for usage and distribution rights. |
|
8 # |
|
9 # |
|
10 package Parse::Yapp; |
|
11 |
|
12 use strict; |
|
13 use vars qw($VERSION @ISA); |
|
14 @ISA = qw(Parse::Yapp::Output); |
|
15 |
|
16 use Parse::Yapp::Output; |
|
17 |
|
18 # $VERSION is in Parse/Yapp/Driver.pm |
|
19 |
|
20 |
|
21 1; |
|
22 |
|
23 __END__ |
|
24 |
|
25 =head1 NAME |
|
26 |
|
27 Parse::Yapp - Perl extension for generating and using LALR parsers. |
|
28 |
|
29 =head1 SYNOPSIS |
|
30 |
|
31 yapp -m MyParser grammar_file.yp |
|
32 |
|
33 ... |
|
34 |
|
35 use MyParser; |
|
36 |
|
37 $parser=new MyParser(); |
|
38 $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub); |
|
39 |
|
40 $nberr=$parser->YYNberr(); |
|
41 |
|
42 $parser->YYData->{DATA}= [ 'Anything', 'You Want' ]; |
|
43 |
|
44 $data=$parser->YYData->{DATA}[0]; |
|
45 |
|
46 =head1 DESCRIPTION |
|
47 |
|
48 Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules |
|
49 that let you generate and use yacc like thread safe (reentrant) parsers with |
|
50 perl object oriented interface. |
|
51 |
|
52 The script yapp is a front-end to the Parse::Yapp module and let you |
|
53 easily create a Perl OO parser from an input grammar file. |
|
54 |
|
55 =head2 The Grammar file |
|
56 |
|
57 =over 4 |
|
58 |
|
59 =item C<Comments> |
|
60 |
|
61 Through all your files, comments are either Perl style, introduced by I<#> |
|
62 up to the end of line, or C style, enclosed between I</*> and I<*/>. |
|
63 |
|
64 |
|
65 =item C<Tokens and string literals> |
|
66 |
|
67 |
|
68 Through all the grammar files, two kind of symbols may appear: |
|
69 I<Non-terminal> symbols, called also I<left-hand-side> symbols, |
|
70 which are the names of your rules, and I<Terminal> symbols, called |
|
71 also I<Tokens>. |
|
72 |
|
73 Tokens are the symbols your lexer function will feed your parser with |
|
74 (see below). They are of two flavours: symbolic tokens and string |
|
75 literals. |
|
76 |
|
77 Non-terminals and symbolic tokens share the same identifier syntax: |
|
78 |
|
79 [A-Za-z][A-Za-z0-9_]* |
|
80 |
|
81 String literals are enclosed in single quotes and can contain almost |
|
82 anything. They will be output to your parser file double-quoted, making |
|
83 any special character as such. '"', '$' and '@' will be automatically |
|
84 quoted with '\', making their writing more natural. On the other hand, |
|
85 if you need a single quote inside your literal, just quote it with '\'. |
|
86 |
|
87 You cannot have a literal I<'error'> in your grammar as it would |
|
88 confuse the driver with the I<error> token. Use a symbolic token instead. |
|
89 In case you inadvertently use it, this will produce a warning telling you |
|
90 you should have written it I<error> and will treat it as if it were the |
|
91 I<error> token, which is certainly NOT what you meant. |
|
92 |
|
93 |
|
94 =item C<Grammar file syntax> |
|
95 |
|
96 It is very close to yacc syntax (in fact, I<Parse::Yapp> should compile |
|
97 a clean I<yacc> grammar without any modification, whereas the opposite |
|
98 is not true). |
|
99 |
|
100 This file is divided in three sections, separated by C<%%>: |
|
101 |
|
102 header section |
|
103 %% |
|
104 rules section |
|
105 %% |
|
106 footer section |
|
107 |
|
108 =over 4 |
|
109 |
|
110 =item B<The Header Section> section may optionally contain: |
|
111 |
|
112 =item * |
|
113 |
|
114 One or more code blocks enclosed inside C<%{> and C<%}> just like in |
|
115 yacc. They may contain any valid Perl code and will be copied verbatim |
|
116 at the very beginning of the parser module. They are not as useful as |
|
117 they are in yacc, but you can use them, for example, for global variable |
|
118 declarations, though you will notice later that such global variables can |
|
119 be avoided to make a reentrant parser module. |
|
120 |
|
121 =item * |
|
122 |
|
123 Precedence declarations, introduced by C<%left>, C<%right> and C<%nonassoc> |
|
124 specifying associativity, followed by the list of tokens or litterals |
|
125 having the same precedence and associativity. |
|
126 The precedence beeing the latter declared will be having the highest level. |
|
127 (see the yacc or bison manuals for a full explanation of how they work, |
|
128 as they are implemented exactly the same way in Parse::Yapp) |
|
129 |
|
130 =item * |
|
131 |
|
132 C<%start> followed by a rule's left hand side, declaring this rule to |
|
133 be the starting rule of your grammar. The default, when C<%start> is not |
|
134 used, is the first rule in your grammar section. |
|
135 |
|
136 =item * |
|
137 |
|
138 C<%token> followed by a list of symbols, forcing them to be recognized |
|
139 as tokens, generating a syntax error if used in the left hand side of |
|
140 a rule declaration. |
|
141 Note that in Parse::Yapp, you I<don't> need to declare tokens as in yacc: any |
|
142 symbol not appearing as a left hand side of a rule is considered to be |
|
143 a token. |
|
144 Other yacc declarations or constructs such as C<%type> and C<%union> are |
|
145 parsed but (almost) ignored. |
|
146 |
|
147 =item * |
|
148 |
|
149 C<%expect> followed by a number, suppress warnings about number of Shift/Reduce |
|
150 conflicts when both numbers match, a la bison. |
|
151 |
|
152 |
|
153 =item B<The Rule Section> contains your grammar rules: |
|
154 |
|
155 A rule is made of a left-hand-side symbol, followed by a C<':'> and one |
|
156 or more right-hand-sides separated by C<'|'> and terminated by a C<';'>: |
|
157 |
|
158 exp: exp '+' exp |
|
159 | exp '-' exp |
|
160 ; |
|
161 |
|
162 A right hand side may be empty: |
|
163 |
|
164 input: #empty |
|
165 | input line |
|
166 ; |
|
167 |
|
168 (if you have more than one empty rhs, Parse::Yapp will issue a warning, |
|
169 as this is usually a mistake, and you will certainly have a reduce/reduce |
|
170 conflict) |
|
171 |
|
172 |
|
173 A rhs may be followed by an optional C<%prec> directive, followed |
|
174 by a token, giving the rule an explicit precedence (see yacc manuals |
|
175 for its precise meaning) and optionnal semantic action code block (see |
|
176 below). |
|
177 |
|
178 exp: '-' exp %prec NEG { -$_[1] } |
|
179 | exp '+' exp { $_[1] + $_[3] } |
|
180 | NUM |
|
181 ; |
|
182 |
|
183 Note that in Parse::Yapp, a lhs I<cannot> appear more than once as |
|
184 a rule name (This differs from yacc). |
|
185 |
|
186 |
|
187 =item C<The footer section> |
|
188 |
|
189 may contain any valid Perl code and will be appended at the very end |
|
190 of your parser module. Here you can write your lexer, error report |
|
191 subs and anything relevant to you parser. |
|
192 |
|
193 =item C<Semantic actions> |
|
194 |
|
195 Semantic actions are run every time a I<reduction> occurs in the |
|
196 parsing flow and they must return a semantic value. |
|
197 |
|
198 They are (usually, but see below C<In rule actions>) written at |
|
199 the very end of the rhs, enclosed with C<{ }>, and are copied verbatim |
|
200 to your parser file, inside of the rules table. |
|
201 |
|
202 Be aware that matching braces in Perl is much more difficult than |
|
203 in C: inside strings they don't need to match. While in C it is |
|
204 very easy to detect the beginning of a string construct, or a |
|
205 single character, it is much more difficult in Perl, as there |
|
206 are so many ways of writing such literals. So there is no check |
|
207 for that today. If you need a brace in a double-quoted string, just |
|
208 quote it (C<\{> or C<\}>). For single-quoted strings, you will need |
|
209 to make a comment matching it I<in th right order>. |
|
210 Sorry for the inconvenience. |
|
211 |
|
212 { |
|
213 "{ My string block }". |
|
214 "\{ My other string block \}". |
|
215 qq/ My unmatched brace \} /. |
|
216 # Force the match: { |
|
217 q/ for my closing brace } / |
|
218 q/ My opening brace { / |
|
219 # must be closed: } |
|
220 } |
|
221 |
|
222 All of these constructs should work. |
|
223 |
|
224 |
|
225 In Parse::Yapp, semantic actions are called like normal Perl sub calls, |
|
226 with their arguments passed in C<@_>, and their semantic value are |
|
227 their return values. |
|
228 |
|
229 $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while |
|
230 $_[0] is the parser object itself. |
|
231 |
|
232 Having $_[0] beeing the parser object itself allows you to call |
|
233 parser methods. Thats how the yacc macros are implemented: |
|
234 |
|
235 yyerrok is done by calling $_[0]->YYErrok |
|
236 YYERROR is done by calling $_[0]->YYError |
|
237 YYACCEPT is done by calling $_[0]->YYAccept |
|
238 YYABORT is done by calling $_[0]->YYAbort |
|
239 |
|
240 All those methods explicitly return I<undef>, for convenience. |
|
241 |
|
242 YYRECOVERING is done by calling $_[0]->YYRecovering |
|
243 |
|
244 Four useful methods in error recovery sub |
|
245 |
|
246 $_[0]->YYCurtok |
|
247 $_[0]->YYCurval |
|
248 $_[0]->YYExpect |
|
249 $_[0]->YYLexer |
|
250 |
|
251 return respectivly the current input token that made the parse fail, |
|
252 its semantic value (both can be used to modify their values too, but |
|
253 I<know what you are doing> ! See I<Error reporting routine> section for |
|
254 an example), a list which contains the tokens the parser expected when |
|
255 the failure occured and a reference to the lexer routine. |
|
256 |
|
257 Note that if C<$_[0]-E<gt>YYCurtok> is declared as a C<%nonassoc> token, |
|
258 it can be included in C<$_[0]-E<gt>YYExpect> list whenever the input |
|
259 try to use it in an associative way. This is not a bug: the token |
|
260 IS expected to report an error if encountered. |
|
261 |
|
262 To detect such a thing in your error reporting sub, the following |
|
263 example should do the trick: |
|
264 |
|
265 grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect |
|
266 and do { |
|
267 #Non-associative token used in an associative expression |
|
268 }; |
|
269 |
|
270 Accessing semantics values on the left of your reducing rule is done |
|
271 through the method |
|
272 |
|
273 $_[0]->YYSemval( index ) |
|
274 |
|
275 where index is an integer. Its value being I<1 .. n> returns the same values |
|
276 than I<$_[1] .. $_[n]>, but I<-n .. 0> returns values on the left of the rule |
|
277 beeing reduced (It is related to I<$-n .. $0 .. $n> in yacc, but you |
|
278 cannot use I<$_[0]> or I<$_[-n]> constructs in Parse::Yapp for obvious reasons) |
|
279 |
|
280 |
|
281 There is also a provision for a user data area in the parser object, |
|
282 accessed by the method: |
|
283 |
|
284 $_[0]->YYData |
|
285 |
|
286 which returns a reference to an anonymous hash, which let you have |
|
287 all of your parsing data held inside the object (see the Calc.yp |
|
288 or ParseYapp.yp files in the distribution for some examples). |
|
289 That's how you can make you parser module reentrant: all of your |
|
290 module states and variables are held inside the parser object. |
|
291 |
|
292 Note: unfortunatly, method calls in Perl have a lot of overhead, |
|
293 and when YYData is used, it may be called a huge number |
|
294 of times. If your are not a *real* purist and efficiency |
|
295 is your concern, you may access directly the user-space |
|
296 in the object: $parser->{USER} wich is a reference to an |
|
297 anonymous hash array, and then benchmark. |
|
298 |
|
299 If no action is specified for a rule, the equivalant of a default |
|
300 action is run, which returns the first parameter: |
|
301 |
|
302 { $_[1] } |
|
303 |
|
304 =item C<In rule actions> |
|
305 |
|
306 It is also possible to embed semantic actions inside of a rule: |
|
307 |
|
308 typedef: TYPE { $type = $_[1] } identlist { ... } ; |
|
309 |
|
310 When the Parse::Yapp's parser encounter such an embedded action, it modifies |
|
311 the grammar as if you wrote (although @x-1 is not a legal lhs value): |
|
312 |
|
313 @x-1: /* empty */ { $type = $_[1] }; |
|
314 typedef: TYPE @x-1 identlist { ... } ; |
|
315 |
|
316 where I<x> is a sequential number incremented for each "in rule" action, |
|
317 and I<-1> represents the "dot position" in the rule where the action arises. |
|
318 |
|
319 In such actions, you can use I<$_[1]..$_[n]> variables, which are the |
|
320 semantic values on the left of your action. |
|
321 |
|
322 Be aware that the way Parse::Yapp modifies your grammar because of |
|
323 I<in rule actions> can produce, in some cases, spurious conflicts |
|
324 that wouldn't happen otherwise. |
|
325 |
|
326 =item C<Generating the Parser Module> |
|
327 |
|
328 Now that you grammar file is written, you can use yapp on it |
|
329 to generate your parser module: |
|
330 |
|
331 yapp -v Calc.yp |
|
332 |
|
333 will create two files F<Calc.pm>, your parser module, and F<Calc.output> |
|
334 a verbose output of your parser rules, conflicts, warnings, states |
|
335 and summary. |
|
336 |
|
337 What your are missing now is a lexer routine. |
|
338 |
|
339 =item C<The Lexer sub> |
|
340 |
|
341 is called each time the parser need to read the next token. |
|
342 |
|
343 It is called with only one argument that is the parser object itself, |
|
344 so you can access its methods, specially the |
|
345 |
|
346 $_[0]->YYData |
|
347 |
|
348 data area. |
|
349 |
|
350 It is its duty to return the next token and value to the parser. |
|
351 They C<must> be returned as a list of two variables, the first one |
|
352 is the token known by the parser (symbolic or literal), the second |
|
353 one beeing anything you want (usualy the content of the token, or the |
|
354 literal value) from a simple scalar value to any complex reference, |
|
355 as the parsing driver never use it but to call semantic actions: |
|
356 |
|
357 ( 'NUMBER', $num ) |
|
358 or |
|
359 ( '>=', '>=' ) |
|
360 or |
|
361 ( 'ARRAY', [ @values ] ) |
|
362 |
|
363 When the lexer reach the end of input, it must return the C<''> |
|
364 empty token with an undef value: |
|
365 |
|
366 ( '', undef ) |
|
367 |
|
368 Note that your lexer should I<never> return C<'error'> as token |
|
369 value: for the driver, this is the error token used for error |
|
370 recovery and would lead to odd reactions. |
|
371 |
|
372 Now that you have your lexer written, maybe you will need to output |
|
373 meaningful error messages, instead of the default which is to print |
|
374 'Parse error.' on STDERR. |
|
375 |
|
376 So you will need an Error reporting sub. |
|
377 |
|
378 item C<Error reporting routine> |
|
379 |
|
380 If you want one, write it knowing that it is passed as parameter |
|
381 the parser object. So you can share information whith the lexer |
|
382 routine quite easily. |
|
383 |
|
384 You can also use the C<$_[0]-E<gt>YYErrok> method in it, which will |
|
385 resume parsing as if no error occured. Of course, since the invalid |
|
386 token is still invalid, you're supposed to fix the problem by |
|
387 yourself. |
|
388 |
|
389 The method C<$_[0]-E<gt>YYLexer> may help you, as it returns a reference |
|
390 to the lexer routine, and can be called as |
|
391 |
|
392 ($tok,$val)=&{$_[0]->Lexer} |
|
393 |
|
394 to get the next token and semantic value from the input stream. To |
|
395 make them current for the parser, use: |
|
396 |
|
397 ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val) |
|
398 |
|
399 and know what you're doing... |
|
400 |
|
401 =item C<Parsing> |
|
402 |
|
403 Now you've got everything to do the parsing. |
|
404 |
|
405 First, use the parser module: |
|
406 |
|
407 use Calc; |
|
408 |
|
409 Then create the parser object: |
|
410 |
|
411 $parser=new Calc; |
|
412 |
|
413 Now, call the YYParse method, telling it where to find the lexer |
|
414 and error report subs: |
|
415 |
|
416 $result=$parser->YYParse(yylex => \&Lexer, |
|
417 yyerror => \&ErrorReport); |
|
418 |
|
419 (assuming Lexer and ErrorReport subs have been written in your current |
|
420 package) |
|
421 |
|
422 The order in which parameters appear is unimportant. |
|
423 |
|
424 Et voila. |
|
425 |
|
426 The YYParse method will do the parse, then return the last semantic |
|
427 value returned, or undef if error recovery cannot recover. |
|
428 |
|
429 If you need to be sure the parse has been successful (in case your |
|
430 last returned semantic value I<is> undef) make a call to: |
|
431 |
|
432 $parser->YYNberr() |
|
433 |
|
434 which returns the total number of time the error reporting sub has been called. |
|
435 |
|
436 =item C<Error Recovery> |
|
437 |
|
438 in Parse::Yapp is implemented the same way it is in yacc. |
|
439 |
|
440 =item C<Debugging Parser> |
|
441 |
|
442 To debug your parser, you can call the YYParse method with a debug parameter: |
|
443 |
|
444 $parser->YYParse( ... , yydebug => value, ... ) |
|
445 |
|
446 where value is a bitfield, each bit representing a specific debug output: |
|
447 |
|
448 Bit Value Outputs |
|
449 0x01 Token reading (useful for Lexer debugging) |
|
450 0x02 States information |
|
451 0x04 Driver actions (shifts, reduces, accept...) |
|
452 0x08 Parse Stack dump |
|
453 0x10 Error Recovery tracing |
|
454 |
|
455 To have a full debugging ouput, use |
|
456 |
|
457 debug => 0x1F |
|
458 |
|
459 Debugging output is sent to STDERR, and be aware that it can produce |
|
460 C<huge> outputs. |
|
461 |
|
462 =item C<Standalone Parsers> |
|
463 |
|
464 By default, the parser modules generated will need the Parse::Yapp |
|
465 module installed on the system to run. They use the Parse::Yapp::Driver |
|
466 which can be safely shared between parsers in the same script. |
|
467 |
|
468 In the case you'd prefer to have a standalone module generated, use |
|
469 the C<-s> switch with yapp: this will automagically copy the driver |
|
470 code into your module so you can use/distribute it without the need |
|
471 of the Parse::Yapp module, making it really a C<Standalone Parser>. |
|
472 |
|
473 If you do so, please remember to include Parse::Yapp's copyright notice |
|
474 in your main module copyright, so others can know about Parse::Yapp module. |
|
475 |
|
476 =item C<Source file line numbers> |
|
477 |
|
478 by default will be included in the generated parser module, which will help |
|
479 to find the guilty line in your source file in case of a syntax error. |
|
480 You can disable this feature by compiling your grammar with yapp using |
|
481 the C<-n> switch. |
|
482 |
|
483 =back |
|
484 |
|
485 =head1 BUGS AND SUGGESTIONS |
|
486 |
|
487 If you find bugs, think of anything that could improve Parse::Yapp |
|
488 or have any questions related to it, feel free to contact the author. |
|
489 |
|
490 =head1 AUTHOR |
|
491 |
|
492 Francois Desarmenien <francois@fdesar.net> |
|
493 |
|
494 =head1 SEE ALSO |
|
495 |
|
496 yapp(1) perl(1) yacc(1) bison(1). |
|
497 |
|
498 =head1 COPYRIGHT |
|
499 |
|
500 The Parse::Yapp module and its related modules and shell scripts are copyright |
|
501 (c) 1998-2001 Francois Desarmenien, France. All rights reserved. |
|
502 |
|
503 You may use and distribute them under the terms of either |
|
504 the GNU General Public License or the Artistic License, |
|
505 as specified in the Perl README file. |
|
506 |
|
507 If you use the "standalone parser" option so people don't need to install |
|
508 Parse::Yapp on their systems in order to run you software, this copyright |
|
509 noticed should be included in your software copyright too, and the copyright |
|
510 notice in the embedded driver should be left untouched. |
|
511 |
|
512 =cut |