deprecated/buildtools/buildsystemtools/lib/Parse/Yapp.pm
changeset 662 60be34e1b006
parent 655 3f65fd25dfd4
equal deleted inserted replaced
654:7c11c3d8d025 662:60be34e1b006
       
     1 #
       
     2 # Module Parse::Yapp.pm.
       
     3 #
       
     4 # Copyright (c) 1998-2001, Francois Desarmenien, all right reserved.
       
     5 #
       
     6 # See the Copyright section at the end of the Parse/Yapp.pm pod section
       
     7 # for usage and distribution rights.
       
     8 #
       
     9 #
       
    10 package Parse::Yapp;
       
    11 
       
    12 use strict;
       
    13 use vars qw($VERSION @ISA);
       
    14 @ISA = qw(Parse::Yapp::Output);
       
    15 
       
    16 use Parse::Yapp::Output;
       
    17 
       
    18 # $VERSION is in Parse/Yapp/Driver.pm
       
    19 
       
    20 
       
    21 1;
       
    22 
       
    23 __END__
       
    24 
       
    25 =head1 NAME
       
    26 
       
    27 Parse::Yapp - Perl extension for generating and using LALR parsers. 
       
    28 
       
    29 =head1 SYNOPSIS
       
    30 
       
    31   yapp -m MyParser grammar_file.yp
       
    32 
       
    33   ...
       
    34 
       
    35   use MyParser;
       
    36 
       
    37   $parser=new MyParser();
       
    38   $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
       
    39 
       
    40   $nberr=$parser->YYNberr();
       
    41 
       
    42   $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
       
    43 
       
    44   $data=$parser->YYData->{DATA}[0];
       
    45 
       
    46 =head1 DESCRIPTION
       
    47 
       
    48 Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules
       
    49 that let you generate and use yacc like thread safe (reentrant) parsers with
       
    50 perl object oriented interface.
       
    51 
       
    52 The script yapp is a front-end to the Parse::Yapp module and let you
       
    53 easily create a Perl OO parser from an input grammar file.
       
    54 
       
    55 =head2 The Grammar file
       
    56 
       
    57 =over 4
       
    58 
       
    59 =item C<Comments>
       
    60 
       
    61 Through all your files, comments are either Perl style, introduced by I<#>
       
    62 up to the end of line, or C style, enclosed between  I</*> and I<*/>.
       
    63 
       
    64 
       
    65 =item C<Tokens and string literals>
       
    66 
       
    67 
       
    68 Through all the grammar files, two kind of symbols may appear:
       
    69 I<Non-terminal> symbols, called also I<left-hand-side> symbols,
       
    70 which are the names of your rules, and I<Terminal> symbols, called
       
    71 also I<Tokens>.
       
    72 
       
    73 Tokens are the symbols your lexer function will feed your parser with
       
    74 (see below). They are of two flavours: symbolic tokens and string
       
    75 literals.
       
    76 
       
    77 Non-terminals and symbolic tokens share the same identifier syntax:
       
    78 
       
    79 		[A-Za-z][A-Za-z0-9_]*
       
    80 
       
    81 String literals are enclosed in single quotes and can contain almost
       
    82 anything. They will be output to your parser file double-quoted, making
       
    83 any special character as such. '"', '$' and '@' will be automatically
       
    84 quoted with '\', making their writing more natural. On the other hand,
       
    85 if you need a single quote inside your literal, just quote it with '\'.
       
    86 
       
    87 You cannot have a literal I<'error'> in your grammar as it would
       
    88 confuse the driver with the I<error> token. Use a symbolic token instead.
       
    89 In case you inadvertently use it, this will produce a warning telling you
       
    90 you should have written it I<error> and will treat it as if it were the
       
    91 I<error> token, which is certainly NOT what you meant.
       
    92 
       
    93 
       
    94 =item C<Grammar file syntax>
       
    95 
       
    96 It is very close to yacc syntax (in fact, I<Parse::Yapp> should compile
       
    97 a clean I<yacc> grammar without any modification, whereas the opposite
       
    98 is not true).
       
    99 
       
   100 This file is divided in three sections, separated by C<%%>:
       
   101 
       
   102 	header section
       
   103 	%%
       
   104 	rules section
       
   105 	%%
       
   106 	footer section
       
   107 
       
   108 =over 4
       
   109 
       
   110 =item B<The Header Section> section may optionally contain:
       
   111 
       
   112 =item *
       
   113 
       
   114 One or more code blocks enclosed inside C<%{> and C<%}> just like in
       
   115 yacc. They may contain any valid Perl code and will be copied verbatim
       
   116 at the very beginning of the parser module. They are not as useful as
       
   117 they are in yacc, but you can use them, for example, for global variable
       
   118 declarations, though you will notice later that such global variables can
       
   119 be avoided to make a reentrant parser module.
       
   120 
       
   121 =item *
       
   122 
       
   123 Precedence declarations, introduced by C<%left>, C<%right> and C<%nonassoc>
       
   124 specifying associativity, followed by the list of tokens or litterals
       
   125 having the same precedence and associativity.
       
   126 The precedence beeing the latter declared will be having the highest level.
       
   127 (see the yacc or bison manuals for a full explanation of how they work,
       
   128 as they are implemented exactly the same way in Parse::Yapp)
       
   129 
       
   130 =item *
       
   131 
       
   132 C<%start> followed by a rule's left hand side, declaring this rule to
       
   133 be the starting rule of your grammar. The default, when C<%start> is not
       
   134 used, is the first rule in your grammar section.
       
   135 
       
   136 =item *
       
   137 
       
   138 C<%token> followed by a list of symbols, forcing them to be recognized
       
   139 as tokens, generating a syntax error if used in the left hand side of
       
   140 a rule declaration.
       
   141 Note that in Parse::Yapp, you I<don't> need to declare tokens as in yacc: any
       
   142 symbol not appearing as a left hand side of a rule is considered to be
       
   143 a token.
       
   144 Other yacc declarations or constructs such as C<%type> and C<%union> are
       
   145 parsed but (almost) ignored.
       
   146 
       
   147 =item *
       
   148 
       
   149 C<%expect> followed by a number, suppress warnings about number of Shift/Reduce
       
   150 conflicts when both numbers match, a la bison.
       
   151 
       
   152 
       
   153 =item B<The Rule Section> contains your grammar rules:
       
   154 
       
   155 A rule is made of a left-hand-side symbol, followed by a C<':'> and one
       
   156 or more right-hand-sides separated by C<'|'> and terminated by a C<';'>:
       
   157 
       
   158     exp:    exp '+' exp
       
   159         |   exp '-' exp
       
   160         ;
       
   161 
       
   162 A right hand side may be empty:
       
   163 
       
   164     input:  #empty
       
   165         |   input line
       
   166         ;
       
   167 
       
   168 (if you have more than one empty rhs, Parse::Yapp will issue a warning,
       
   169 as this is usually a mistake, and you will certainly have a reduce/reduce
       
   170 conflict)
       
   171 
       
   172 
       
   173 A rhs may be followed by an optional C<%prec> directive, followed
       
   174 by a token, giving the rule an explicit precedence (see yacc manuals
       
   175 for its precise meaning) and optionnal semantic action code block (see
       
   176 below).
       
   177 
       
   178     exp:   '-' exp %prec NEG { -$_[1] }
       
   179         |  exp '+' exp       { $_[1] + $_[3] }
       
   180         |  NUM
       
   181         ;
       
   182 
       
   183 Note that in Parse::Yapp, a lhs I<cannot> appear more than once as
       
   184 a rule name (This differs from yacc).
       
   185 
       
   186 
       
   187 =item C<The footer section>
       
   188 
       
   189 may contain any valid Perl code and will be appended at the very end
       
   190 of your parser module. Here you can write your lexer, error report
       
   191 subs and anything relevant to you parser.
       
   192 
       
   193 =item C<Semantic actions>
       
   194 
       
   195 Semantic actions are run every time a I<reduction> occurs in the
       
   196 parsing flow and they must return a semantic value.
       
   197 
       
   198 They are (usually, but see below C<In rule actions>) written at
       
   199 the very end of the rhs, enclosed with C<{ }>, and are copied verbatim
       
   200 to your parser file, inside of the rules table.
       
   201 
       
   202 Be aware that matching braces in Perl is much more difficult than
       
   203 in C: inside strings they don't need to match. While in C it is
       
   204 very easy to detect the beginning of a string construct, or a
       
   205 single character, it is much more difficult in Perl, as there
       
   206 are so many ways of writing such literals. So there is no check
       
   207 for that today. If you need a brace in a double-quoted string, just
       
   208 quote it (C<\{> or C<\}>). For single-quoted strings, you will need
       
   209 to make a comment matching it I<in th right order>.
       
   210 Sorry for the inconvenience.
       
   211 
       
   212     {
       
   213         "{ My string block }".
       
   214         "\{ My other string block \}".
       
   215         qq/ My unmatched brace \} /.
       
   216         # Force the match: {
       
   217         q/ for my closing brace } /
       
   218         q/ My opening brace { /
       
   219         # must be closed: }
       
   220     }
       
   221 
       
   222 All of these constructs should work.
       
   223 
       
   224 
       
   225 In Parse::Yapp, semantic actions are called like normal Perl sub calls,
       
   226 with their arguments passed in C<@_>, and their semantic value are
       
   227 their return values.
       
   228 
       
   229 $_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
       
   230 $_[0] is the parser object itself.
       
   231 
       
   232 Having $_[0] beeing the parser object itself allows you to call
       
   233 parser methods. Thats how the yacc macros are implemented:
       
   234 
       
   235 	yyerrok is done by calling $_[0]->YYErrok
       
   236 	YYERROR is done by calling $_[0]->YYError
       
   237 	YYACCEPT is done by calling $_[0]->YYAccept
       
   238 	YYABORT is done by calling $_[0]->YYAbort
       
   239 
       
   240 All those methods explicitly return I<undef>, for convenience.
       
   241 
       
   242     YYRECOVERING is done by calling $_[0]->YYRecovering
       
   243 
       
   244 Four useful methods in error recovery sub
       
   245 
       
   246     $_[0]->YYCurtok
       
   247     $_[0]->YYCurval
       
   248     $_[0]->YYExpect
       
   249     $_[0]->YYLexer
       
   250 
       
   251 return respectivly the current input token that made the parse fail,
       
   252 its semantic value (both can be used to modify their values too, but
       
   253 I<know what you are doing> ! See I<Error reporting routine> section for
       
   254 an example), a list which contains the tokens the parser expected when
       
   255 the failure occured and a reference to the lexer routine.
       
   256 
       
   257 Note that if C<$_[0]-E<gt>YYCurtok> is declared as a C<%nonassoc> token,
       
   258 it can be included in C<$_[0]-E<gt>YYExpect> list whenever the input
       
   259 try to use it in an associative way. This is not a bug: the token
       
   260 IS expected to report an error if encountered.
       
   261 
       
   262 To detect such a thing in your error reporting sub, the following
       
   263 example should do the trick:
       
   264 
       
   265         grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
       
   266     and do {
       
   267         #Non-associative token used in an associative expression
       
   268     };
       
   269 
       
   270 Accessing semantics values on the left of your reducing rule is done
       
   271 through the method
       
   272 
       
   273     $_[0]->YYSemval( index )
       
   274 
       
   275 where index is an integer. Its value being I<1 .. n> returns the same values
       
   276 than I<$_[1] .. $_[n]>, but I<-n .. 0> returns values on the left of the rule
       
   277 beeing reduced (It is related to I<$-n .. $0 .. $n> in yacc, but you
       
   278 cannot use I<$_[0]> or I<$_[-n]> constructs in Parse::Yapp for obvious reasons)
       
   279 
       
   280 
       
   281 There is also a provision for a user data area in the parser object,
       
   282 accessed by the method:
       
   283 
       
   284     $_[0]->YYData
       
   285 
       
   286 which returns a reference to an anonymous hash, which let you have
       
   287 all of your parsing data held inside the object (see the Calc.yp
       
   288 or ParseYapp.yp files in the distribution for some examples).
       
   289 That's how you can make you parser module reentrant: all of your
       
   290 module states and variables are held inside the parser object.
       
   291 
       
   292 Note: unfortunatly, method calls in Perl have a lot of overhead,
       
   293       and when YYData is used, it may be called a huge number
       
   294       of times. If your are not a *real* purist and efficiency
       
   295       is your concern, you may access directly the user-space
       
   296       in the object: $parser->{USER} wich is a reference to an
       
   297       anonymous hash array, and then benchmark.
       
   298 
       
   299 If no action is specified for a rule, the equivalant of a default
       
   300 action is run, which returns the first parameter:
       
   301 
       
   302    { $_[1] }
       
   303 
       
   304 =item C<In rule actions>
       
   305 
       
   306 It is also possible to embed semantic actions inside of a rule:
       
   307 
       
   308     typedef:    TYPE { $type = $_[1] } identlist { ... } ;
       
   309 
       
   310 When the Parse::Yapp's parser encounter such an embedded action, it modifies
       
   311 the grammar as if you wrote (although @x-1 is not a legal lhs value):
       
   312 
       
   313     @x-1:   /* empty */ { $type = $_[1] };
       
   314     typedef:    TYPE @x-1 identlist { ... } ;
       
   315 
       
   316 where I<x> is a sequential number incremented for each "in rule" action,
       
   317 and I<-1> represents the "dot position" in the rule where the action arises.
       
   318 
       
   319 In such actions, you can use I<$_[1]..$_[n]> variables, which are the
       
   320 semantic values on the left of your action.
       
   321 
       
   322 Be aware that the way Parse::Yapp modifies your grammar because of
       
   323 I<in rule actions> can produce, in some cases, spurious conflicts
       
   324 that wouldn't happen otherwise.  
       
   325 
       
   326 =item C<Generating the Parser Module>
       
   327 
       
   328 Now that you grammar file is written, you can use yapp on it
       
   329 to generate your parser module:
       
   330 
       
   331     yapp -v Calc.yp
       
   332 
       
   333 will create two files F<Calc.pm>, your parser module, and F<Calc.output>
       
   334 a verbose output of your parser rules, conflicts, warnings, states
       
   335 and summary.
       
   336 
       
   337 What your are missing now is a lexer routine.
       
   338 
       
   339 =item C<The Lexer sub>
       
   340 
       
   341 is called each time the parser need to read the next token.
       
   342 
       
   343 It is called with only one argument that is the parser object itself,
       
   344 so you can access its methods, specially the
       
   345 
       
   346     $_[0]->YYData
       
   347 
       
   348 data area.
       
   349 
       
   350 It is its duty to return the next token and value to the parser.
       
   351 They C<must> be returned as a list of two variables, the first one
       
   352 is the token known by the parser (symbolic or literal), the second
       
   353 one beeing anything you want (usualy the content of the token, or the
       
   354 literal value) from a simple scalar value to any complex reference,
       
   355 as the parsing driver never use it but to call semantic actions:
       
   356 
       
   357     ( 'NUMBER', $num )
       
   358 or
       
   359     ( '>=', '>=' )
       
   360 or
       
   361     ( 'ARRAY', [ @values ] )
       
   362 
       
   363 When the lexer reach the end of input, it must return the C<''>
       
   364 empty token with an undef value:
       
   365 
       
   366      ( '', undef )
       
   367 
       
   368 Note that your lexer should I<never> return C<'error'> as token
       
   369 value: for the driver, this is the error token used for error
       
   370 recovery and would lead to odd reactions.
       
   371 
       
   372 Now that you have your lexer written, maybe you will need to output
       
   373 meaningful error messages, instead of the default which is to print
       
   374 'Parse error.' on STDERR.
       
   375 
       
   376 So you will need an Error reporting sub.
       
   377 
       
   378 item C<Error reporting routine>
       
   379 
       
   380 If you want one, write it knowing that it is passed as parameter
       
   381 the parser object. So you can share information whith the lexer
       
   382 routine quite easily.
       
   383 
       
   384 You can also use the C<$_[0]-E<gt>YYErrok> method in it, which will
       
   385 resume parsing as if no error occured. Of course, since the invalid
       
   386 token is still invalid, you're supposed to fix the problem by
       
   387 yourself.
       
   388 
       
   389 The method C<$_[0]-E<gt>YYLexer> may help you, as it returns a reference
       
   390 to the lexer routine, and can be called as
       
   391 
       
   392     ($tok,$val)=&{$_[0]->Lexer}
       
   393 
       
   394 to get the next token and semantic value from the input stream. To
       
   395 make them current for the parser, use:
       
   396 
       
   397     ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
       
   398 
       
   399 and know what you're doing...
       
   400 
       
   401 =item C<Parsing>
       
   402 
       
   403 Now you've got everything to do the parsing.
       
   404 
       
   405 First, use the parser module:
       
   406 
       
   407     use Calc;
       
   408 
       
   409 Then create the parser object:
       
   410 
       
   411     $parser=new Calc;
       
   412 
       
   413 Now, call the YYParse method, telling it where to find the lexer
       
   414 and error report subs:
       
   415 
       
   416     $result=$parser->YYParse(yylex => \&Lexer,
       
   417                            yyerror => \&ErrorReport);
       
   418 
       
   419 (assuming Lexer and ErrorReport subs have been written in your current
       
   420 package)
       
   421 
       
   422 The order in which parameters appear is unimportant.
       
   423 
       
   424 Et voila.
       
   425 
       
   426 The YYParse method will do the parse, then return the last semantic
       
   427 value returned, or undef if error recovery cannot recover.
       
   428 
       
   429 If you need to be sure the parse has been successful (in case your
       
   430 last returned semantic value I<is> undef) make a call to:
       
   431 
       
   432     $parser->YYNberr()
       
   433 
       
   434 which returns the total number of time the error reporting sub has been called.
       
   435 
       
   436 =item C<Error Recovery>
       
   437 
       
   438 in Parse::Yapp is implemented the same way it is in yacc.
       
   439 
       
   440 =item C<Debugging Parser>
       
   441 
       
   442 To debug your parser, you can call the YYParse method with a debug parameter:
       
   443 
       
   444     $parser->YYParse( ... , yydebug => value, ... )
       
   445 
       
   446 where value is a bitfield, each bit representing a specific debug output:
       
   447 
       
   448     Bit Value    Outputs
       
   449     0x01         Token reading (useful for Lexer debugging)
       
   450     0x02         States information
       
   451     0x04         Driver actions (shifts, reduces, accept...)
       
   452     0x08         Parse Stack dump
       
   453     0x10         Error Recovery tracing
       
   454 
       
   455 To have a full debugging ouput, use
       
   456 
       
   457     debug => 0x1F
       
   458 
       
   459 Debugging output is sent to STDERR, and be aware that it can produce
       
   460 C<huge> outputs.
       
   461 
       
   462 =item C<Standalone Parsers>
       
   463 
       
   464 By default, the parser modules generated will need the Parse::Yapp
       
   465 module installed on the system to run. They use the Parse::Yapp::Driver
       
   466 which can be safely shared between parsers in the same script.
       
   467 
       
   468 In the case you'd prefer to have a standalone module generated, use
       
   469 the C<-s> switch with yapp: this will automagically copy the driver
       
   470 code into your module so you can use/distribute it without the need
       
   471 of the Parse::Yapp module, making it really a C<Standalone Parser>.
       
   472 
       
   473 If you do so, please remember to include Parse::Yapp's copyright notice
       
   474 in your main module copyright, so others can know about Parse::Yapp module.
       
   475 
       
   476 =item C<Source file line numbers>
       
   477 
       
   478 by default will be included in the generated parser module, which will help
       
   479 to find the guilty line in your source file in case of a syntax error.
       
   480 You can disable this feature by compiling your grammar with yapp using
       
   481 the C<-n> switch.
       
   482 
       
   483 =back
       
   484 
       
   485 =head1 BUGS AND SUGGESTIONS
       
   486 
       
   487 If you find bugs, think of anything that could improve Parse::Yapp
       
   488 or have any questions related to it, feel free to contact the author.
       
   489 
       
   490 =head1 AUTHOR
       
   491 
       
   492 Francois Desarmenien  <francois@fdesar.net>
       
   493 
       
   494 =head1 SEE ALSO
       
   495 
       
   496 yapp(1) perl(1) yacc(1) bison(1).
       
   497 
       
   498 =head1 COPYRIGHT
       
   499 
       
   500 The Parse::Yapp module and its related modules and shell scripts are copyright
       
   501 (c) 1998-2001 Francois Desarmenien, France. All rights reserved.
       
   502 
       
   503 You may use and distribute them under the terms of either
       
   504 the GNU General Public License or the Artistic License,
       
   505 as specified in the Perl README file.
       
   506 
       
   507 If you use the "standalone parser" option so people don't need to install
       
   508 Parse::Yapp on their systems in order to run you software, this copyright
       
   509 noticed should be included in your software copyright too, and the copyright
       
   510 notice in the embedded driver should be left untouched.
       
   511 
       
   512 =cut