doc/src/xml-processing/xml-processing.qdoc
author Eckhart Koeppen <eckhart.koppen@nokia.com>
Wed, 28 Apr 2010 13:15:16 +0300
branchRCL_3
changeset 15 b25b6dc3ff8b
parent 7 3f74d0d4af4c
permissions -rw-r--r--
2010-17 8b4fb6db9a24f58dafbd4734d9c4a87a72f9ad8f

/****************************************************************************
**
** Copyright (C) 2010 Nokia Corporation and/or its subsidiary(-ies).
** All rights reserved.
** Contact: Nokia Corporation (qt-info@nokia.com)
**
** This file is part of the documentation of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:LGPL$
** No Commercial Usage
** This file contains pre-release code and may not be distributed.
** You may use this file in accordance with the terms and conditions
** contained in the Technology Preview License Agreement accompanying
** this package.
**
** GNU Lesser General Public License Usage
** Alternatively, this file may be used under the terms of the GNU Lesser
** General Public License version 2.1 as published by the Free Software
** Foundation and appearing in the file LICENSE.LGPL included in the
** packaging of this file.  Please review the following information to
** ensure the GNU Lesser General Public License version 2.1 requirements
** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html.
**
** In addition, as a special exception, Nokia gives you certain additional
** rights.  These rights are described in the Nokia Qt LGPL Exception
** version 1.1, included in the file LGPL_EXCEPTION.txt in this package.
**
** If you have questions regarding the use of this file, please contact
** Nokia at qt-info@nokia.com.
**
**
**
**
**
**
**
**
** $QT_END_LICENSE$
**
****************************************************************************/

/*!
    \group xml-tools
    \title XML Classes

    \brief Classes that support XML, via, for example DOM and SAX.

    These classes are relevant to XML users.
    
    \generatelist{related}
*/

/*!
    \page xml-processing.html
    \title XML Processing
    \brief An Overview of the XML processing facilities in Qt.

    In addition to core XML support, classes for higher level querying
    and manipulation of XML data are provided by the QtXmlPatterns
    module. In the QtSvg module, the QSvgRenderer and QSvgGenerator
    classes can read and write a subset of SVG, an XML-based file
    format. Qt also provides helper functions that may be useful to
    those working with XML and XHTML: see Qt::escape() and
    Qt::convertFromPlainText().

    \section1 Topics:

    \list
    \o \l {Classes for XML Processing}
    \o \l {An Introduction to Namespaces}
    \o \l {XML Streaming}
    \o \l {The SAX Interface}
    \o \l {Working with the DOM Tree}
    \o \l {Using XML Technologies}{XQuery/XPath and XML Schema}
        \list
        \o \l{A Short Path to XQuery}
        \endlist
    \endlist
    
    \section1 Classes for XML Processing

    These classes are relevant to XML users.

    \annotatedlist xml-tools
*/

/*!
    \page xml-namespaces.html
    \title An Introduction to Namespaces
    \target namespaces

    \contentspage XML Processing
    \nextpage XML Streaming

    Parts of the Qt XML module documentation assume that you are familiar
    with XML namespaces. Here we present a brief introduction; skip to
    \link #namespacesConventions Qt XML documentation conventions \endlink
    if you already know this material.

    Namespaces are a concept introduced into XML to allow a more modular
    design. With their help data processing software can easily resolve
    naming conflicts in XML documents.

    Consider the following example:

    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 6

    Here we find three different uses of the name \e title. If you wish to
    process this document you will encounter problems because each of the
    \e titles should be displayed in a different manner -- even though
    they have the same name.

    The solution would be to have some means of identifying the first
    occurrence of \e title as the title of a book, i.e. to use the \e
    title element of a book namespace to distinguish it from, for example,
    the chapter title, e.g.:
    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 7

    \e book in this case is a \e prefix denoting the namespace.

    Before we can apply a namespace to element or attribute names we must
    declare it.

    Namespaces are URIs like \e http://www.example.com/fnord/book/. This
    does not mean that data must be available at this address; the URI is
    simply used to provide a unique name.

    We declare namespaces in the same way as attributes; strictly speaking
    they \e are attributes. To make for example \e
    http://www.example.com/fnord/ the document's default XML namespace \e
    xmlns we write

    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 8

    To distinguish the \e http://www.example.com/fnord/book/ namespace from
    the default, we must supply it with a prefix:

    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 9

    A namespace that is declared like this can be applied to element and
    attribute names by prepending the appropriate prefix and a ":"
    delimiter. We have already seen this with the \e book:title element.

    Element names without a prefix belong to the default namespace. This
    rule does not apply to attributes: an attribute without a prefix does
    not belong to any of the declared XML namespaces at all. Attributes
    always belong to the "traditional" namespace of the element in which
    they appear. A "traditional" namespace is not an XML namespace, it
    simply means that all attribute names belonging to one element must be
    different. Later we will see how to assign an XML namespace to an
    attribute.

    Due to the fact that attributes without prefixes are not in any XML
    namespace there is no collision between the attribute \e title (that
    belongs to the \e author element) and for example the \e title element
    within a \e chapter.

    Let's clarify this with an example:
    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 10

    Within the \e document element we have two namespaces declared. The
    default namespace \e http://www.example.com/fnord/ applies to the \e
    book element, the \e chapter element, the appropriate \e title element
    and of course to \e document itself.

    The \e book:author and \e book:title elements belong to the namespace
    with the URI \e http://www.example.com/fnord/book/.

    The two \e book:author attributes \e title and \e name have no XML
    namespace assigned. They are only members of the "traditional"
    namespace of the element \e book:author, meaning that for example two
    \e title attributes in \e book:author are forbidden.

    In the above example we circumvent the last rule by adding a \e title
    attribute from the \e http://www.example.com/fnord/ namespace to \e
    book:author: the \e fnord:title comes from the namespace with the
    prefix \e fnord that is declared in the \e book:author element.

    Clearly the \e fnord namespace has the same namespace URI as the
    default namespace. So why didn't we simply use the default namespace
    we'd already declared? The answer is quite complex:
    \list
    \o attributes without a prefix don't belong to any XML namespace at
    all, not even to the default namespace;
    \o additionally omitting the prefix would lead to a \e title-title clash;
    \o writing it as \e xmlns:title would declare a new namespace with the
    prefix \e title instead of applying the default \e xmlns namespace.
    \endlist

    With the Qt XML classes elements and attributes can be accessed in two
    ways: either by refering to their qualified names consisting of the
    namespace prefix and the "real" name (or \e local name) or by the
    combination of local name and namespace URI.

    More information on XML namespaces can be found at
    \l http://www.w3.org/TR/REC-xml-names/.

    \target namespacesConventions
    \section1 Conventions Used in the Qt XML Documentation

    The following terms are used to distinguish the parts of names within
    the context of namespaces:
    \list
    \o  The \e {qualified name}
        is the name as it appears in the document. (In the above example \e
        book:title is a qualified name.)
    \o  A \e {namespace prefix} in a qualified name
        is the part to the left of the ":". (\e book is the namespace prefix in
        \e book:title.)
    \o  The \e {local part} of a name (also refered to as the \e {local
        name}) appears to the right of the ":". (Thus \e title is the
        local part of \e book:title.)
    \o  The \e {namespace URI} ("Uniform Resource Identifier") is a unique
        identifier for a namespace. It looks like a URL
        (e.g. \e http://www.example.com/fnord/ ) but does not require
        data to be accessible by the given protocol at the named address.
    \endlist

    Elements without a ":" (like \e chapter in the example) do not have a
    namespace prefix. In this case the local part and the qualified name
    are identical (i.e. \e chapter).

    \sa {DOM Bookmarks Example}, {SAX Bookmarks Example}
*/

/*!
    \page xml-streaming.html
    \title XML Streaming

    \previouspage An Introduction to Namespaces
    \contentspage XML Processing
    \nextpage The SAX Interface

    Since version 4.3, Qt provides two new classes for reading and
    writing XML: QXmlStreamReader and QXmlStreamWriter.

    The QXmlStreamReader and QXmlStreamWriter are two new classes provided
    in Qt 4.3 and later. A stream reader reports an XML document as a stream
    of tokens. This differs from SAX as SAX applications provide handlers to
    receive XML events from the parser whereas the QXmlStreamReader drives the
    loop, pulling tokens from the reader when they are needed.
    This pulling approach makes it possible to build recursive descent parsers,
    allowing XML parsing code to be split into different methods or classes.

    QXmlStreamReader is a well-formed XML 1.0 parser that excludes external
    parsed entities. Hence, data provided by the stream reader adheres to the
    W3C's criteria for well-formed XML, as long as no error occurs. Otherwise,
    functions such as \l{QXmlStreamReader::atEnd()}{atEnd()},
    \l{QXmlStreamReader::error()}{error()} and \l{QXmlStreamReader::hasError()}
    {hasError()} can be used to check and view the errors.

    An example of QXmlStreamReader implementation would be the \c XbelReader in
    \l{QXmlStream Bookmarks Example}, which is a subclass of QXmlStreamReader.
    The constructor takes \a treeWidget as a parameter and the class has Xbel
    specific functions:

    \snippet examples/xml/streambookmarks/xbelreader.h 1

    \dots
    \snippet examples/xml/streambookmarks/xbelreader.h 2
    \dots

    The \c read() function accepts a QIODevice and sets it with
    \l{QXmlStreamReader::setDevice()}{setDevice()}. The
    \l{QXmlStreamReader::raiseError()}{raiseError()} function is used to
    display a custom error message, inidicating that the file's version
    is incorrect.

    \snippet examples/xml/streambookmarks/xbelreader.cpp 1

    The pendent to QXmlStreamReader is QXmlStreamWriter, which provides an XML
    writer with a simple streaming API. QXmlStreamWriter operates on a
    QIODevice and has specialised functions for all XML tokens or events you
    want to write, such as \l{QXmlStreamWriter::writeDTD()}{writeDTD()},
    \l{QXmlStreamWriter::writeCharacters()}{writeCharacters()},
    \l{QXmlStreamWriter::writeComment()}{writeComment()} and so on.

    To write XML document with QXmlStreamWriter, you start a document with the
    \l{QXmlStreamWriter::writeStartDocument()}{writeStartDocument()} function
    and end it with \l{QXmlStreamWriter::writeEndDocument()}
    {writeEndDocument()}, which implicitly closes all remaining open tags.
    Element tags are opened with \l{QXmlStreamWriter::writeStartDocument()}
    {writeStartDocument()} and followed by 
    \l{QXmlStreamWriter::writeAttribute()}{writeAttribute()} or
    \l{QXmlStreamWriter::writeAttributes()}{writeAttributes()},
    element content, and then \l{QXmlStreamWriter::writeEndDocument()}
    {writeEndDocument()}. Also, \l{QXmlStreamWriter::writeEmptyElement()}
    {writeEmptyElement()} can be used to write empty elements.

    Element content comprises characters, entity references or nested elements.
    Content can be written with \l{QXmlStreamWriter::writeCharacters()}
    {writeCharacters()}, a function that also takes care of escaping all
    forbidden characters and character sequences,
    \l{QXmlStreamWriter::writeEntityReference()}{writeEntityReference()},
    or subsequent calls to \l{QXmlStreamWriter::writeStartElement()}
    {writeStartElement()}.

    The \c XbelWriter class from \l{QXmlStream Bookmarks Example} is a subclass
    of QXmlStreamWriter. Its \c writeFile() function illustrates the core
    functions of QXmlStreamWriter mentioned above:

    \snippet examples/xml/streambookmarks/xbelwriter.cpp 1
*/

/*!
    \page xml-sax.html
    \title The SAX interface

    \previouspage XML Streaming
    \contentspage XML Processing
    \nextpage Working with the DOM Tree

    SAX is an event-based standard interface for XML parsers.
    The Qt interface follows the design of the SAX2 Java implementation.
    Its naming scheme was adapted to fit the Qt naming conventions.
    Details on SAX2 can be found at \l{http://www.saxproject.org}.

    Support for SAX2 filters and the reader factory are under
    development. The Qt implementation does not include the SAX1
    compatibility classes present in the Java interface.

    \section1 Introduction to SAX2

    The SAX2 interface is an event-driven mechanism to provide the user with
    document information. An "event" in this context means something
    reported by the parser, for example, it has encountered a start tag,
    or an end tag, etc.

    To make it less abstract consider the following example:
    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 3

    Whilst reading (a SAX2 parser is usually referred to as "reader") 
    the above document three events would be triggered:
    \list 1
    \o A start tag occurs (\c{<quote>}).
    \o Character data (i.e. text) is found, "A quotation.".
    \o An end tag is parsed (\c{</quote>}).
    \endlist

    Each time such an event occurs the parser reports it; you can set up
    event handlers to respond to these events.

    Whilst this is a fast and simple approach to read XML documents,
    manipulation is difficult because data is not stored, simply handled
    and discarded serially. The \l{Working with the DOM Tree}{DOM interface}
    reads in and stores the whole document in a tree structure;
    this takes more memory, but makes it easier to manipulate the
    document's structure.

    The Qt XML module provides an abstract class, \l QXmlReader, that
    defines the interface for potential SAX2 readers. Qt includes a reader
    implementation, \l QXmlSimpleReader, that is easy to adapt through
    subclassing.

    The reader reports parsing events through special handler classes:
    \table
    \header \o Handler class \o Description
    \row \o \l QXmlContentHandler
         \o Reports events related to the content of a document (e.g. the start tag
            or characters).
    \row \o \l QXmlDTDHandler
         \o Reports events related to the DTD (e.g. notation declarations).
    \row \o \l QXmlErrorHandler
         \o Reports errors or warnings that occurred during parsing.
    \row \o \l QXmlEntityResolver
         \o Reports external entities during parsing and allows users to resolve
            external entities themselves instead of leaving it to the reader.
    \row \o \l QXmlDeclHandler
         \o Reports further DTD related events (e.g. attribute declarations).
    \row \o \l QXmlLexicalHandler
         \o Reports events related to the lexical structure of the
            document (the beginning of the DTD, comments etc.).
    \endtable

    These classes are abstract classes describing the interface. The \l
    QXmlDefaultHandler class provides a "do nothing" default
    implementation for all of them. Therefore users only need to overload
    the QXmlDefaultHandler functions they are interested in. 

    To read input XML data a special class \l QXmlInputSource is used.

    Apart from those already mentioned, the following SAX2 support classes 
    provide additional useful functionality:
    \table
    \header \o Class \o Description
    \row \o \l QXmlAttributes
         \o Used to pass attributes in a start element event.
    \row \o \l QXmlLocator
         \o Used to obtain the actual parsing position of an event.
    \row \o \l QXmlNamespaceSupport
         \o Used to implement namespace support for a reader. Note that
            namespaces do not change the parsing behavior. They are only
            reported through the handler.
    \endtable

    The \l{SAX Bookmarks example} illustrates how to subclass
    QXmlDefaultHandler to read an XML bookmark file (XBEL) and
    how to generate XML by hand.

    \section1 SAX2 Features

    The behavior of an XML reader depends on its support for certain
    optional features. For example, a reader may have the feature "report
    attributes used for namespace declarations and prefixes along with
    the local name of a tag". Like every other feature this has a unique
    name represented by a URI: it is called
    \e http://xml.org/sax/features/namespace-prefixes.

    The Qt SAX2 implementation can report whether the reader has
    particular functionality using the QXmlReader::hasFeature()
    function. Available features can be tested with QXmlReader::feature(),
    and switched on or off using QXmlReader::setFeature(). 

    Consider the example 
    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 4
    A reader that does not support the \e
    http://xml.org/sax/features/namespace-prefixes feature would report
    the element name \e document but not its attributes \e xmlns:book and
    \e xmlns with their values. A reader with the feature \e
    http://xml.org/sax/features/namespace-prefixes reports the namespace
    attributes if the \link QXmlReader::feature() feature\endlink is
    switched on.

    Other features include \e http://xml.org/sax/features/namespace
    (namespace processing, implies \e
    http://xml.org/sax/features/namespace-prefixes) and \e
    http://xml.org/sax/features/validation (the ability to report
    validation errors). 

    Whilst SAX2 leaves it to the user to define and implement whatever
    features are required, support for \e
    http://xml.org/sax/features/namespace (and thus \e
    http://xml.org/sax/features/namespace-prefixes) is mandantory.
    The \l QXmlSimpleReader implementation of \l QXmlReader,
    supports them, and can do namespace processing.

    \l QXmlSimpleReader is not validating, so it
    does not support \e http://xml.org/sax/features/validation.

    \section1 Namespace Support via Features

    As we have seen in the previous section, we can configure the
    behavior of the reader when it comes to namespace
    processing. This is done by setting and unsetting the 
    \e http://xml.org/sax/features/namespaces and
    \e http://xml.org/sax/features/namespace-prefixes features.

    They influence the reporting behavior in the following way:
    \list 1
    \o Namespace prefixes and local parts of elements and attributes can
    be reported.
    \o The qualified names of elements and attributes are reported.
    \o \l QXmlContentHandler::startPrefixMapping() and \l
    QXmlContentHandler::endPrefixMapping() are called by the reader.
    \o Attributes that declare namespaces (i.e. the attribute \e xmlns and
    attributes starting with \e{xmlns:}) are reported.
    \endlist

    Consider the following element:
    \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 5
    With \e http://xml.org/sax/features/namespace-prefixes set to true 
    the reader will report four attributes; but with the \e
    namespace-prefixes feature set to false only three, with the \e
    xmlns:fnord attribute defining a namespace being "invisible" to the
    reader.

    The \e http://xml.org/sax/features/namespaces feature is responsible
    for reporting local names, namespace prefixes and URIs. With \e
    http://xml.org/sax/features/namespaces set to true the parser will
    report \e title as the local name of the \e fnord:title attribute, \e
    fnord being the namespace prefix and \e http://example.com/fnord/ as
    the namespace URI. When \e http://xml.org/sax/features/namespaces is
    false none of them are reported.

    In the current implementation the Qt XML classes follow the definition
    that the prefix \e xmlns itself isn't associated with any namespace at all
    (see \link http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using 
    http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using \endlink).
    Therefore even with \e http://xml.org/sax/features/namespaces and
    \e http://xml.org/sax/features/namespace-prefixes both set to true
    the reader won't return either a local name, a namespace prefix or
    a namespace URI for \e xmlns:fnord.

    This might be changed in the future following the W3C suggestion
    \link http://www.w3.org/2000/xmlns/ http://www.w3.org/2000/xmlns/ \endlink
    to associate \e xmlns with the namespace \e http://www.w3.org/2000/xmlns. 

    As the SAX2 standard suggests, \l QXmlSimpleReader defaults to having
    \e http://xml.org/sax/features/namespaces set to true and
    \e http://xml.org/sax/features/namespace-prefixes set to false.
    When changing this behavior using \l QXmlSimpleReader::setFeature() 
    note that the combination of both features set to
    false is illegal.

    \section2 Summary

    \l QXmlSimpleReader implements the following behavior:

    \table
    \header \o (namespaces, namespace-prefixes)
            \o Namespace prefix and local part
            \o Qualified names
            \o Prefix mapping
            \o xmlns attributes
    \row \o (true, false) \o Yes \o Yes* \o Yes \o No
    \row \o (true, true) \o Yes \o Yes \o Yes \o Yes
    \row \o (false, true) \o No* \o Yes \o No* \o Yes
    \row \o (false, false) \i41 Illegal
    \endtable

    The behavior of the entries marked with an asterisk (*) is not specified by SAX.

    \section1 Properties

    Properties are a more general concept. They have a unique name,
    represented as an URI, but their value is \c void*. Thus nearly
    anything can be used as a property value. This concept involves some
    danger, though: there is no means of ensuring type-safety; the user
    must take care that they pass the right type. Properties are
    useful if a reader supports special handler classes.

    The URIs used for features and properties often look like URLs, e.g.
    \c http://xml.org/sax/features/namespace. This does not mean that the
    data required is at this address. It is simply a way of defining
    unique names.

    Anyone can define and use new SAX2 properties for their readers.
    Property support is not mandatory.

    To set or query properties the following functions are provided: \l
    QXmlReader::setProperty(), \l QXmlReader::property() and \l
    QXmlReader::hasProperty().                     
*/

/*!
    \page xml-dom.tml
    \title Working with the DOM Tree
    \target dom

    \previouspage The SAX Interface
    \contentspage XML Processing
    \nextpage {Using XML Technologies}{XQuery/XPath and XML Schema}

    DOM Level 2 is a W3C Recommendation for XML interfaces that maps the
    constituents of an XML document to a tree structure. The specification
    of DOM Level 2 can be found at \l{http://www.w3.org/DOM/}.

    \target domIntro
    \section1 Introduction to DOM

    DOM provides an interface to access and change the content and
    structure of an XML file. It makes a hierarchical view of the document
    (a tree view). Thus -- in contrast to the SAX2 interface -- an object
    model of the document is resident in memory after parsing which makes
    manipulation easy. 

    All DOM nodes in the document tree are subclasses of \l QDomNode. The
    document itself is represented as a \l QDomDocument object.

    Here are the available node classes and their potential child classes:

    \list
    \o \l QDomDocument: Possible children are
            \list
            \o \l QDomElement (at most one)
            \o \l QDomProcessingInstruction
            \o \l QDomComment
            \o \l QDomDocumentType
            \endlist
    \o \l QDomDocumentFragment: Possible children are
            \list
            \o \l QDomElement
            \o \l QDomProcessingInstruction
            \o \l QDomComment
            \o \l QDomText
            \o \l QDomCDATASection
            \o \l QDomEntityReference
            \endlist
    \o \l QDomDocumentType: No children
    \o \l QDomEntityReference: Possible children are
            \list
            \o \l QDomElement
            \o \l QDomProcessingInstruction
            \o \l QDomComment
            \o \l QDomText
            \o \l QDomCDATASection
            \o \l QDomEntityReference
            \endlist
    \o \l QDomElement: Possible children are
            \list
            \o \l QDomElement
            \o \l QDomText
            \o \l QDomComment
            \o \l QDomProcessingInstruction
            \o \l QDomCDATASection
            \o \l QDomEntityReference
            \endlist
    \o \l QDomAttr: Possible children are
            \list
            \o \l QDomText
            \o \l QDomEntityReference
            \endlist
    \o \l QDomProcessingInstruction: No children
    \o \l QDomComment: No children
    \o \l QDomText: No children
    \o \l QDomCDATASection: No children
    \o \l QDomEntity: Possible children are
            \list
            \o \l QDomElement
            \o \l QDomProcessingInstruction
            \o \l QDomComment
            \o \l QDomText
            \o \l QDomCDATASection
            \o \l QDomEntityReference
            \endlist
    \o \l QDomNotation: No children
    \endlist

    With \l QDomNodeList and \l QDomNamedNodeMap two collection classes 
    are provided: \l QDomNodeList is a list of nodes,
    and \l QDomNamedNodeMap is used to handle unordered sets of nodes
    (often used for attributes).

    The \l QDomImplementation class allows the user to query features of the
    DOM implementation.

    To get started please refer to the \l QDomDocument documentation.
    You might also want to take a look at the \l{DOM Bookmarks example},
    which illustrates how to read and write an XML bookmark file (XBEL)
    using DOM.
*/