diff -r 8b87ea768cb8 -r 60053dab7e2a dummy_foundation/lib/XML/Parser/Expat.pod --- a/dummy_foundation/lib/XML/Parser/Expat.pod Wed Jun 03 18:33:51 2009 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,511 +0,0 @@ -=head1 WARNING - -This manual page was copied from the XML::Parser distribution (version 2.27) -written by Clark Cooper. You can find newer versions at CPAN. - -=head1 NAME - -XML::Parser::Expat - Lowlevel access to James Clark's expat XML parser - -=head1 SYNOPSIS - - use XML::Parser::Expat; - - $parser = new XML::Parser::Expat; - $parser->setHandlers('Start' => \&sh, - 'End' => \&eh, - 'Char' => \&ch); - open(FOO, 'info.xml') or die "Couldn't open"; - $parser->parse(*FOO); - close(FOO); - # $parser->parse(' here we go '); - - sub sh - { - my ($p, $el, %atts) = @_; - $p->setHandlers('Char' => \&spec) - if ($el eq 'special'); - ... - } - - sub eh - { - my ($p, $el) = @_; - $p->setHandlers('Char' => \&ch) # Special elements won't contain - if ($el eq 'special'); # other special elements - ... - } - -=head1 DESCRIPTION - -This module provides an interface to James Clark's XML parser, expat. As in -expat, a single instance of the parser can only parse one document. Calls -to parsestring after the first for a given instance will die. - -Expat (and XML::Parser::Expat) are event based. As the parser recognizes -parts of the document (say the start or end of an XML element), then any -handlers registered for that type of an event are called with suitable -parameters. - -=head1 METHODS - -=over 4 - -=item new - -This is a class method, the constructor for XML::Parser::Expat. Options are -passed as keyword value pairs. The recognized options are: - -=over 4 - -=item * ProtocolEncoding - -The protocol encoding name. The default is none. The expat built-in -encodings are: C, C, C, and C. -Other encodings may be used if they have encoding maps in one of the -directories in the @Encoding_Path list. Setting the protocol encoding -overrides any encoding in the XML declaration. - -=item * Namespaces - -When this option is given with a true value, then the parser does namespace -processing. By default, namespace processing is turned off. When it is -turned on, the parser consumes I attributes and strips off prefixes -from element and attributes names where those prefixes have a defined -namespace. A name's namespace can be found using the L<"namespace"> method -and two names can be checked for absolute equality with the L<"eq_name"> -method. - -=item * NoExpand - -Normally, the parser will try to expand references to entities defined in -the internal subset. If this option is set to a true value, and a default -handler is also set, then the default handler will be called when an -entity reference is seen in text. This has no effect if a default handler -has not been registered, and it has no effect on the expansion of entity -references inside attribute values. - -=item * Stream_Delimiter - -This option takes a string value. When this string is found alone on a line -while parsing from a stream, then the parse is ended as if it saw an end of -file. The intended use is with a stream of xml documents in a MIME multipart -format. The string should not contain a trailing newline. - -=item * ErrorContext - -When this option is defined, errors are reported in context. The value -of ErrorContext should be the number of lines to show on either side of -the line in which the error occurred. - -=item * ParseParamEnt - -Unless standalone is set to "yes" in the XML declaration, setting this to -a true value allows the external DTD to be read, and parameter entities -to be parsed and expanded. - -=item * Base - -The base to use for relative pathnames or URLs. This can also be done by -using the base method. - -=back - -=item setHandlers(TYPE, HANDLER [, TYPE, HANDLER [...]]) - -This method registers handlers for the various events. If no handlers are -registered, then a call to parsestring or parsefile will only determine if -the corresponding XML document is well formed (by returning without error.) -This may be called from within a handler, after the parse has started. - -Setting a handler to something that evaluates to false unsets that -handler. - -This method returns a list of type, handler pairs corresponding to the -input. The handlers returned are the ones that were in effect before the -call to setHandlers. - -The recognized events and the parameters passed to the corresponding -handlers are: - -=over 4 - -=item * Start (Parser, Element [, Attr, Val [,...]]) - -This event is generated when an XML start tag is recognized. Parser is -an XML::Parser::Expat instance. Element is the name of the XML element that -is opened with the start tag. The Attr & Val pairs are generated for each -attribute in the start tag. - -=item * End (Parser, Element) - -This event is generated when an XML end tag is recognized. Note that -an XML empty tag () generates both a start and an end event. - -There is always a lower level start and end handler installed that wrap -the corresponding callbacks. This is to handle the context mechanism. -A consequence of this is that the default handler (see below) will not -see a start tag or end tag unless the default_current method is called. - -=item * Char (Parser, String) - -This event is generated when non-markup is recognized. The non-markup -sequence of characters is in String. A single non-markup sequence of -characters may generate multiple calls to this handler. Whatever the -encoding of the string in the original document, this is given to the -handler in UTF-8. - -=item * Proc (Parser, Target, Data) - -This event is generated when a processing instruction is recognized. - -=item * Comment (Parser, String) - -This event is generated when a comment is recognized. - -=item * CdataStart (Parser) - -This is called at the start of a CDATA section. - -=item * CdataEnd (Parser) - -This is called at the end of a CDATA section. - -=item * Default (Parser, String) - -This is called for any characters that don't have a registered handler. -This includes both characters that are part of markup for which no -events are generated (markup declarations) and characters that -could generate events, but for which no handler has been registered. - -Whatever the encoding in the original document, the string is returned to -the handler in UTF-8. - -=item * Unparsed (Parser, Entity, Base, Sysid, Pubid, Notation) - -This is called for a declaration of an unparsed entity. Entity is the name -of the entity. Base is the base to be used for resolving a relative URI. -Sysid is the system id. Pubid is the public id. Notation is the notation -name. Base and Pubid may be undefined. - -=item * Notation (Parser, Notation, Base, Sysid, Pubid) - -This is called for a declaration of notation. Notation is the notation name. -Base is the base to be used for resolving a relative URI. Sysid is the system -id. Pubid is the public id. Base, Sysid, and Pubid may all be undefined. - -=item * ExternEnt (Parser, Base, Sysid, Pubid) - -This is called when an external entity is referenced. Base is the base to be -used for resolving a relative URI. Sysid is the system id. Pubid is the public -id. Base, and Pubid may be undefined. - -This handler should either return a string, which represents the contents of -the external entity, or return an open filehandle that can be read to obtain -the contents of the external entity, or return undef, which indicates the -external entity couldn't be found and will generate a parse error. - -If an open filehandle is returned, it must be returned as either a glob -(*FOO) or as a reference to a glob (e.g. an instance of IO::Handle). The -parser will close the filehandle after using it. - -=item * Entity (Parser, Name, Val, Sysid, Pubid, Ndata) - -This is called when an entity is declared. For internal entities, the Val -parameter will contain the value and the remaining three parameters will -be undefined. For external entities, the Val parameter -will be undefined, the Sysid parameter will have the system id, the Pubid -parameter will have the public id if it was provided (it will be undefined -otherwise), the Ndata parameter will contain the notation for unparsed -entities. If this is a parameter entity declaration, then a '%' will be -prefixed to the name. - -Note that this handler and the Unparsed handler above overlap. If both are -set, then this handler will not be called for unparsed entities. - -=item * Element (Parser, Name, Model) - -The element handler is called when an element declaration is found. Name is -the element name, and Model is the content model as a string. - -=item * Attlist (Parser, Elname, Attname, Type, Default, Fixed) - -This handler is called for each attribute in an ATTLIST declaration. -So an ATTLIST declaration that has multiple attributes -will generate multiple calls to this handler. The Elname parameter is the -name of the element with which the attribute is being associated. The Attname -parameter is the name of the attribute. Type is the attribute type, given as -a string. Default is the default value, which will either be "#REQUIRED", -"#IMPLIED" or a quoted string (i.e. the returned string will begin and end -with a quote character). If Fixed is true, then this is a fixed attribute. - -=item * Doctype (Parser, Name, Sysid, Pubid, Internal) - -This handler is called for DOCTYPE declarations. Name is the document type -name. Sysid is the system id of the document type, if it was provided, -otherwise it's undefined. Pubid is the public id of the document type, -which will be undefined if no public id was given. Internal is the internal -subset, given as a string. If there was no internal subset, it will be -undefined. Internal will contain all whitespace, comments, processing -instructions, and declarations seen in the internal subset. The declarations -will be there whether or not they have been processed by another handler -(except for unparsed entities processed by the Unparsed handler). However, -comments and processing instructions will not appear if they've been processed -by their respective handlers. - -=item * XMLDecl (Parser, Version, Encoding, Standalone) - -This handler is called for xml declarations. Version is a string containg -the version. Encoding is either undefined or contains an encoding string. -Standalone will be either true, false, or undefined if the standalone attribute -is yes, no, or not made respectively. - -=back - -=item namespace(name) - -Return the URI of the namespace that the name belongs to. If the name doesn't -belong to any namespace, an undef is returned. This is only valid on names -received through the Start or End handlers from a single document, or through -a call to the generate_ns_name method. In other words, don't use names -generated from one instance of XML::Parser::Expat with other instances. - -=item eq_name(name1, name2) - -Return true if name1 and name2 are identical (i.e. same name and from -the same namespace.) This is only meaningful if both names were obtained -through the Start or End handlers from a single document, or through -a call to the generate_ns_name method. - -=item generate_ns_name(name, namespace) - -Return a name, associated with a given namespace, good for using with the -above 2 methods. The namespace argument should be the namespace URI, not -a prefix. - -=item new_ns_prefixes - -When called from a start tag handler, returns namespace prefixes declared -with this start tag. If called elsewere (or if there were no namespace -prefixes declared), it returns an empty list. Setting of the default -namespace is indicated with '#default' as a prefix. - -=item expand_ns_prefix(prefix) - -Return the uri to which the given prefix is currently bound. Returns -undef if the prefix isn't currently bound. Use '#default' to find the -current binding of the default namespace (if any). - -=item current_ns_prefixes - -Return a list of currently bound namespace prefixes. The order of the -the prefixes in the list has no meaning. If the default namespace is -currently bound, '#default' appears in the list. - -=item recognized_string - -Returns the string from the document that was recognized in order to call -the current handler. For instance, when called from a start handler, it -will give us the the start-tag string. The string is encoded in UTF-8. - -=item original_string - -Returns the verbatim string from the document that was recognized in -order to call the current handler. The string is in the original document -encoding. - -=item default_current - -When called from a handler, causes the sequence of characters that generated -the corresponding event to be sent to the default handler (if one is -registered). Use of this method is deprecated in favor the recognized_string -method, which you can use without installing a default handler. - -=item xpcroak(message) - -Concatenate onto the given message the current line number within the -XML document plus the message implied by ErrorContext. Then croak with -the formed message. - -=item xpcarp(message) - -Concatenate onto the given message the current line number within the -XML document plus the message implied by ErrorContext. Then carp with -the formed message. - -=item current_line - -Returns the line number of the current position of the parse. - -=item current_column - -Returns the column number of the current position of the parse. - -=item current_byte - -Returns the current position of the parse. - -=item base([NEWBASE]); - -Returns the current value of the base for resolving relative URIs. If -NEWBASE is supplied, changes the base to that value. - -=item context - -Returns a list of element names that represent open elements, with the -last one being the innermost. Inside start and end tag handlers, this -will be the tag of the parent element. - -=item current_element - -Returns the name of the innermost currently opened element. Inside -start or end handlers, returns the parent of the element associated -with those tags. - -=item in_element(NAME) - -Returns true if NAME is equal to the name of the innermost currently opened -element. If namespace processing is being used and you want to check -against a name that may be in a namespace, then use the generate_ns_name -method to create the NAME argument. - -=item within_element(NAME) - -Returns the number of times the given name appears in the context list. -If namespace processing is being used and you want to check -against a name that may be in a namespace, then use the generate_ns_name -method to create the NAME argument. - -=item depth - -Returns the size of the context list. - -=item element_index - -Returns an integer that is the depth-first visit order of the current -element. This will be zero outside of the root element. For example, -this will return 1 when called from the start handler for the root element -start tag. - -=item skip_until(INDEX) - -INDEX is an integer that represents an element index. When this method -is called, all handlers are suspended until the start tag for an element -that has an index number equal to INDEX is seen. If a start handler has -been set, then this is the first tag that the start handler will see -after skip_until has been called. - - -=item position_in_context(LINES) - -Returns a string that shows the current parse position. LINES should be -an integer >= 0 that represents the number of lines on either side of the -current parse line to place into the returned string. - -=item xml_escape(TEXT [, CHAR [, CHAR ...]]) - -Returns TEXT with markup characters turned into character entities. Any -additional characters provided as arguments are also turned into character -references where found in TEXT. - -=item parse (SOURCE) - -The SOURCE parameter should either be a string containing the whole XML -document, or it should be an open IO::Handle. Only a single document -may be parsed for a given instance of XML::Parser::Expat, so this will croak -if it's been called previously for this instance. - -=item parsestring(XML_DOC_STRING) - -Parses the given string as an XML document. Only a single document may be -parsed for a given instance of XML::Parser::Expat, so this will die if either -parsestring or parsefile has been called for this instance previously. - -This method is deprecated in favor of the parse method. - -=item parsefile(FILENAME) - -Parses the XML document in the given file. Will die if parsestring or -parsefile has been called previously for this instance. - -=item is_defaulted(ATTNAME) - -NO LONGER WORKS. To find out if an attribute is defaulted please use -the specified_attr method. - -=item specified_attr - -When the start handler receives lists of attributes and values, the -non-defaulted (i.e. explicitly specified) attributes occur in the list -first. This method returns the number of specified items in the list. -So if this number is equal to the length of the list, there were no -defaulted values. Otherwise the number points to the index of the -first defaulted attribute name. - -=item finish - -Unsets all handlers (including internal ones that set context), but expat -continues parsing to the end of the document or until it finds an error. -It should finish up a lot faster than with the handlers set. - -=item release - -There are data structures used by XML::Parser::Expat that have circular -references. This means that these structures will never be garbage -collected unless these references are explicitly broken. Calling this -method breaks those references (and makes the instance unusable.) - -Normally, higher level calls handle this for you, but if you are using -XML::Parser::Expat directly, then it's your responsibility to call it. - -=back - -=head2 XML::Parser::ExpatNB Methods - -The class XML::Parser::ExpatNB is a subclass of XML::Parser::Expat used -for non-blocking access to the expat library. It does not support the parse, -parsestring, or parsefile methods, but it does have these additional methods: - -=over 4 - -=item parse_more(DATA) - -Feed expat more text to munch on. - -=item parse_done - -Tell expat that it's gotten the whole document. - -=back - -=head1 FUNCTIONS - -=over 4 - -=item XML::Parser::Expat::load_encoding(ENCODING) - -Load an external encoding. ENCODING is either the name of an encoding or -the name of a file. The basename is converted to lowercase and a '.enc' -extension is appended unless there's one already there. Then, unless -it's an absolute pathname (i.e. begins with '/'), the first file by that -name discovered in the @Encoding_Path path list is used. - -The encoding in the file is loaded and kept in the %Encoding_Table -table. Earlier encodings of the same name are replaced. - -This function is automaticly called by expat when it encounters an encoding -it doesn't know about. Expat shouldn't call this twice for the same -encoding name. The only reason users should use this function is to -explicitly load an encoding not contained in the @Encoding_Path list. - -=back - -=head1 AUTHORS - -Larry Wall > wrote version 1.0. - -Clark Cooper > picked up support, changed the API -for this version (2.x), provided documentation, and added some standard -package features. - -=cut