deprecated/buildtools/buildsystemtools/lib/Parse/Yapp.pm
changeset 662 60be34e1b006
parent 655 3f65fd25dfd4
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/deprecated/buildtools/buildsystemtools/lib/Parse/Yapp.pm	Wed Oct 27 16:03:51 2010 +0800
@@ -0,0 +1,512 @@
+#
+# Module Parse::Yapp.pm.
+#
+# Copyright (c) 1998-2001, Francois Desarmenien, all right reserved.
+#
+# See the Copyright section at the end of the Parse/Yapp.pm pod section
+# for usage and distribution rights.
+#
+#
+package Parse::Yapp;
+
+use strict;
+use vars qw($VERSION @ISA);
+@ISA = qw(Parse::Yapp::Output);
+
+use Parse::Yapp::Output;
+
+# $VERSION is in Parse/Yapp/Driver.pm
+
+
+1;
+
+__END__
+
+=head1 NAME
+
+Parse::Yapp - Perl extension for generating and using LALR parsers. 
+
+=head1 SYNOPSIS
+
+  yapp -m MyParser grammar_file.yp
+
+  ...
+
+  use MyParser;
+
+  $parser=new MyParser();
+  $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);
+
+  $nberr=$parser->YYNberr();
+
+  $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];
+
+  $data=$parser->YYData->{DATA}[0];
+
+=head1 DESCRIPTION
+
+Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules
+that let you generate and use yacc like thread safe (reentrant) parsers with
+perl object oriented interface.
+
+The script yapp is a front-end to the Parse::Yapp module and let you
+easily create a Perl OO parser from an input grammar file.
+
+=head2 The Grammar file
+
+=over 4
+
+=item C<Comments>
+
+Through all your files, comments are either Perl style, introduced by I<#>
+up to the end of line, or C style, enclosed between  I</*> and I<*/>.
+
+
+=item C<Tokens and string literals>
+
+
+Through all the grammar files, two kind of symbols may appear:
+I<Non-terminal> symbols, called also I<left-hand-side> symbols,
+which are the names of your rules, and I<Terminal> symbols, called
+also I<Tokens>.
+
+Tokens are the symbols your lexer function will feed your parser with
+(see below). They are of two flavours: symbolic tokens and string
+literals.
+
+Non-terminals and symbolic tokens share the same identifier syntax:
+
+		[A-Za-z][A-Za-z0-9_]*
+
+String literals are enclosed in single quotes and can contain almost
+anything. They will be output to your parser file double-quoted, making
+any special character as such. '"', '$' and '@' will be automatically
+quoted with '\', making their writing more natural. On the other hand,
+if you need a single quote inside your literal, just quote it with '\'.
+
+You cannot have a literal I<'error'> in your grammar as it would
+confuse the driver with the I<error> token. Use a symbolic token instead.
+In case you inadvertently use it, this will produce a warning telling you
+you should have written it I<error> and will treat it as if it were the
+I<error> token, which is certainly NOT what you meant.
+
+
+=item C<Grammar file syntax>
+
+It is very close to yacc syntax (in fact, I<Parse::Yapp> should compile
+a clean I<yacc> grammar without any modification, whereas the opposite
+is not true).
+
+This file is divided in three sections, separated by C<%%>:
+
+	header section
+	%%
+	rules section
+	%%
+	footer section
+
+=over 4
+
+=item B<The Header Section> section may optionally contain:
+
+=item *
+
+One or more code blocks enclosed inside C<%{> and C<%}> just like in
+yacc. They may contain any valid Perl code and will be copied verbatim
+at the very beginning of the parser module. They are not as useful as
+they are in yacc, but you can use them, for example, for global variable
+declarations, though you will notice later that such global variables can
+be avoided to make a reentrant parser module.
+
+=item *
+
+Precedence declarations, introduced by C<%left>, C<%right> and C<%nonassoc>
+specifying associativity, followed by the list of tokens or litterals
+having the same precedence and associativity.
+The precedence beeing the latter declared will be having the highest level.
+(see the yacc or bison manuals for a full explanation of how they work,
+as they are implemented exactly the same way in Parse::Yapp)
+
+=item *
+
+C<%start> followed by a rule's left hand side, declaring this rule to
+be the starting rule of your grammar. The default, when C<%start> is not
+used, is the first rule in your grammar section.
+
+=item *
+
+C<%token> followed by a list of symbols, forcing them to be recognized
+as tokens, generating a syntax error if used in the left hand side of
+a rule declaration.
+Note that in Parse::Yapp, you I<don't> need to declare tokens as in yacc: any
+symbol not appearing as a left hand side of a rule is considered to be
+a token.
+Other yacc declarations or constructs such as C<%type> and C<%union> are
+parsed but (almost) ignored.
+
+=item *
+
+C<%expect> followed by a number, suppress warnings about number of Shift/Reduce
+conflicts when both numbers match, a la bison.
+
+
+=item B<The Rule Section> contains your grammar rules:
+
+A rule is made of a left-hand-side symbol, followed by a C<':'> and one
+or more right-hand-sides separated by C<'|'> and terminated by a C<';'>:
+
+    exp:    exp '+' exp
+        |   exp '-' exp
+        ;
+
+A right hand side may be empty:
+
+    input:  #empty
+        |   input line
+        ;
+
+(if you have more than one empty rhs, Parse::Yapp will issue a warning,
+as this is usually a mistake, and you will certainly have a reduce/reduce
+conflict)
+
+
+A rhs may be followed by an optional C<%prec> directive, followed
+by a token, giving the rule an explicit precedence (see yacc manuals
+for its precise meaning) and optionnal semantic action code block (see
+below).
+
+    exp:   '-' exp %prec NEG { -$_[1] }
+        |  exp '+' exp       { $_[1] + $_[3] }
+        |  NUM
+        ;
+
+Note that in Parse::Yapp, a lhs I<cannot> appear more than once as
+a rule name (This differs from yacc).
+
+
+=item C<The footer section>
+
+may contain any valid Perl code and will be appended at the very end
+of your parser module. Here you can write your lexer, error report
+subs and anything relevant to you parser.
+
+=item C<Semantic actions>
+
+Semantic actions are run every time a I<reduction> occurs in the
+parsing flow and they must return a semantic value.
+
+They are (usually, but see below C<In rule actions>) written at
+the very end of the rhs, enclosed with C<{ }>, and are copied verbatim
+to your parser file, inside of the rules table.
+
+Be aware that matching braces in Perl is much more difficult than
+in C: inside strings they don't need to match. While in C it is
+very easy to detect the beginning of a string construct, or a
+single character, it is much more difficult in Perl, as there
+are so many ways of writing such literals. So there is no check
+for that today. If you need a brace in a double-quoted string, just
+quote it (C<\{> or C<\}>). For single-quoted strings, you will need
+to make a comment matching it I<in th right order>.
+Sorry for the inconvenience.
+
+    {
+        "{ My string block }".
+        "\{ My other string block \}".
+        qq/ My unmatched brace \} /.
+        # Force the match: {
+        q/ for my closing brace } /
+        q/ My opening brace { /
+        # must be closed: }
+    }
+
+All of these constructs should work.
+
+
+In Parse::Yapp, semantic actions are called like normal Perl sub calls,
+with their arguments passed in C<@_>, and their semantic value are
+their return values.
+
+$_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
+$_[0] is the parser object itself.
+
+Having $_[0] beeing the parser object itself allows you to call
+parser methods. Thats how the yacc macros are implemented:
+
+	yyerrok is done by calling $_[0]->YYErrok
+	YYERROR is done by calling $_[0]->YYError
+	YYACCEPT is done by calling $_[0]->YYAccept
+	YYABORT is done by calling $_[0]->YYAbort
+
+All those methods explicitly return I<undef>, for convenience.
+
+    YYRECOVERING is done by calling $_[0]->YYRecovering
+
+Four useful methods in error recovery sub
+
+    $_[0]->YYCurtok
+    $_[0]->YYCurval
+    $_[0]->YYExpect
+    $_[0]->YYLexer
+
+return respectivly the current input token that made the parse fail,
+its semantic value (both can be used to modify their values too, but
+I<know what you are doing> ! See I<Error reporting routine> section for
+an example), a list which contains the tokens the parser expected when
+the failure occured and a reference to the lexer routine.
+
+Note that if C<$_[0]-E<gt>YYCurtok> is declared as a C<%nonassoc> token,
+it can be included in C<$_[0]-E<gt>YYExpect> list whenever the input
+try to use it in an associative way. This is not a bug: the token
+IS expected to report an error if encountered.
+
+To detect such a thing in your error reporting sub, the following
+example should do the trick:
+
+        grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
+    and do {
+        #Non-associative token used in an associative expression
+    };
+
+Accessing semantics values on the left of your reducing rule is done
+through the method
+
+    $_[0]->YYSemval( index )
+
+where index is an integer. Its value being I<1 .. n> returns the same values
+than I<$_[1] .. $_[n]>, but I<-n .. 0> returns values on the left of the rule
+beeing reduced (It is related to I<$-n .. $0 .. $n> in yacc, but you
+cannot use I<$_[0]> or I<$_[-n]> constructs in Parse::Yapp for obvious reasons)
+
+
+There is also a provision for a user data area in the parser object,
+accessed by the method:
+
+    $_[0]->YYData
+
+which returns a reference to an anonymous hash, which let you have
+all of your parsing data held inside the object (see the Calc.yp
+or ParseYapp.yp files in the distribution for some examples).
+That's how you can make you parser module reentrant: all of your
+module states and variables are held inside the parser object.
+
+Note: unfortunatly, method calls in Perl have a lot of overhead,
+      and when YYData is used, it may be called a huge number
+      of times. If your are not a *real* purist and efficiency
+      is your concern, you may access directly the user-space
+      in the object: $parser->{USER} wich is a reference to an
+      anonymous hash array, and then benchmark.
+
+If no action is specified for a rule, the equivalant of a default
+action is run, which returns the first parameter:
+
+   { $_[1] }
+
+=item C<In rule actions>
+
+It is also possible to embed semantic actions inside of a rule:
+
+    typedef:    TYPE { $type = $_[1] } identlist { ... } ;
+
+When the Parse::Yapp's parser encounter such an embedded action, it modifies
+the grammar as if you wrote (although @x-1 is not a legal lhs value):
+
+    @x-1:   /* empty */ { $type = $_[1] };
+    typedef:    TYPE @x-1 identlist { ... } ;
+
+where I<x> is a sequential number incremented for each "in rule" action,
+and I<-1> represents the "dot position" in the rule where the action arises.
+
+In such actions, you can use I<$_[1]..$_[n]> variables, which are the
+semantic values on the left of your action.
+
+Be aware that the way Parse::Yapp modifies your grammar because of
+I<in rule actions> can produce, in some cases, spurious conflicts
+that wouldn't happen otherwise.  
+
+=item C<Generating the Parser Module>
+
+Now that you grammar file is written, you can use yapp on it
+to generate your parser module:
+
+    yapp -v Calc.yp
+
+will create two files F<Calc.pm>, your parser module, and F<Calc.output>
+a verbose output of your parser rules, conflicts, warnings, states
+and summary.
+
+What your are missing now is a lexer routine.
+
+=item C<The Lexer sub>
+
+is called each time the parser need to read the next token.
+
+It is called with only one argument that is the parser object itself,
+so you can access its methods, specially the
+
+    $_[0]->YYData
+
+data area.
+
+It is its duty to return the next token and value to the parser.
+They C<must> be returned as a list of two variables, the first one
+is the token known by the parser (symbolic or literal), the second
+one beeing anything you want (usualy the content of the token, or the
+literal value) from a simple scalar value to any complex reference,
+as the parsing driver never use it but to call semantic actions:
+
+    ( 'NUMBER', $num )
+or
+    ( '>=', '>=' )
+or
+    ( 'ARRAY', [ @values ] )
+
+When the lexer reach the end of input, it must return the C<''>
+empty token with an undef value:
+
+     ( '', undef )
+
+Note that your lexer should I<never> return C<'error'> as token
+value: for the driver, this is the error token used for error
+recovery and would lead to odd reactions.
+
+Now that you have your lexer written, maybe you will need to output
+meaningful error messages, instead of the default which is to print
+'Parse error.' on STDERR.
+
+So you will need an Error reporting sub.
+
+item C<Error reporting routine>
+
+If you want one, write it knowing that it is passed as parameter
+the parser object. So you can share information whith the lexer
+routine quite easily.
+
+You can also use the C<$_[0]-E<gt>YYErrok> method in it, which will
+resume parsing as if no error occured. Of course, since the invalid
+token is still invalid, you're supposed to fix the problem by
+yourself.
+
+The method C<$_[0]-E<gt>YYLexer> may help you, as it returns a reference
+to the lexer routine, and can be called as
+
+    ($tok,$val)=&{$_[0]->Lexer}
+
+to get the next token and semantic value from the input stream. To
+make them current for the parser, use:
+
+    ($_[0]->YYCurtok, $_[0]->YYCurval) = ($tok, $val)
+
+and know what you're doing...
+
+=item C<Parsing>
+
+Now you've got everything to do the parsing.
+
+First, use the parser module:
+
+    use Calc;
+
+Then create the parser object:
+
+    $parser=new Calc;
+
+Now, call the YYParse method, telling it where to find the lexer
+and error report subs:
+
+    $result=$parser->YYParse(yylex => \&Lexer,
+                           yyerror => \&ErrorReport);
+
+(assuming Lexer and ErrorReport subs have been written in your current
+package)
+
+The order in which parameters appear is unimportant.
+
+Et voila.
+
+The YYParse method will do the parse, then return the last semantic
+value returned, or undef if error recovery cannot recover.
+
+If you need to be sure the parse has been successful (in case your
+last returned semantic value I<is> undef) make a call to:
+
+    $parser->YYNberr()
+
+which returns the total number of time the error reporting sub has been called.
+
+=item C<Error Recovery>
+
+in Parse::Yapp is implemented the same way it is in yacc.
+
+=item C<Debugging Parser>
+
+To debug your parser, you can call the YYParse method with a debug parameter:
+
+    $parser->YYParse( ... , yydebug => value, ... )
+
+where value is a bitfield, each bit representing a specific debug output:
+
+    Bit Value    Outputs
+    0x01         Token reading (useful for Lexer debugging)
+    0x02         States information
+    0x04         Driver actions (shifts, reduces, accept...)
+    0x08         Parse Stack dump
+    0x10         Error Recovery tracing
+
+To have a full debugging ouput, use
+
+    debug => 0x1F
+
+Debugging output is sent to STDERR, and be aware that it can produce
+C<huge> outputs.
+
+=item C<Standalone Parsers>
+
+By default, the parser modules generated will need the Parse::Yapp
+module installed on the system to run. They use the Parse::Yapp::Driver
+which can be safely shared between parsers in the same script.
+
+In the case you'd prefer to have a standalone module generated, use
+the C<-s> switch with yapp: this will automagically copy the driver
+code into your module so you can use/distribute it without the need
+of the Parse::Yapp module, making it really a C<Standalone Parser>.
+
+If you do so, please remember to include Parse::Yapp's copyright notice
+in your main module copyright, so others can know about Parse::Yapp module.
+
+=item C<Source file line numbers>
+
+by default will be included in the generated parser module, which will help
+to find the guilty line in your source file in case of a syntax error.
+You can disable this feature by compiling your grammar with yapp using
+the C<-n> switch.
+
+=back
+
+=head1 BUGS AND SUGGESTIONS
+
+If you find bugs, think of anything that could improve Parse::Yapp
+or have any questions related to it, feel free to contact the author.
+
+=head1 AUTHOR
+
+Francois Desarmenien  <francois@fdesar.net>
+
+=head1 SEE ALSO
+
+yapp(1) perl(1) yacc(1) bison(1).
+
+=head1 COPYRIGHT
+
+The Parse::Yapp module and its related modules and shell scripts are copyright
+(c) 1998-2001 Francois Desarmenien, France. All rights reserved.
+
+You may use and distribute them under the terms of either
+the GNU General Public License or the Artistic License,
+as specified in the Perl README file.
+
+If you use the "standalone parser" option so people don't need to install
+Parse::Yapp on their systems in order to run you software, this copyright
+noticed should be included in your software copyright too, and the copyright
+notice in the embedded driver should be left untouched.
+
+=cut