src/xmlpatterns/qtokenautomaton/README
author Dremov Kirill (Nokia-D-MSW/Tampere) <kirill.dremov@nokia.com>
Fri, 19 Feb 2010 23:40:16 +0200
branchRCL_3
changeset 4 3b1da2848fc7
parent 0 1918ee327afb
permissions -rw-r--r--
Revision: 201003 Kit: 201007
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     2
qtokenautomaton is a token generator, that generates a simple, Unicode aware
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     3
tokenizer for C++ that uses the Qt API.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     4
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     5
Introduction
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     6
=====================
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     7
QTokenAutomaton generates a C++ class that essentially has this interface:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     8
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     9
    class YourTokenizer
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    10
    {
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    11
    protected:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    12
        enum Token
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    13
        {
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    14
            A,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    15
            B,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    16
            C,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    17
            NoKeyword
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    18
        };
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    19
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    20
        static inline Token toToken(const QString &string);
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    21
        static inline Token toToken(const QStringRef &string);
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    22
        static Token toToken(const QChar *data, int length);
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    23
        static QString toString(Token token);
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    24
    };
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    25
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    26
When calling toToken(), the tokenizer returns the enum value corresponding to
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    27
the string. This is done with O(N) time complexity, where N is the length of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    28
the string. The returned value can then subsequently be efficiently switched
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    29
over. The alternatives, either a long chain of if statements comparing one
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    30
QString to several other QStrings; or inserting all strings first into a hash,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    31
are less efficient.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    32
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    33
For instance, the latter case of using a hash would involve when excluding the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    34
initial populating of the hash, O(N) + O(1) where 0(1) is assumed to be a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    35
non-conflicting hash lookup.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    36
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    37
toString(), which returns the string for the token that an enum value
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    38
represents, is implemented to store the strings in an efficient manner.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    39
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    40
A typical usage scenario is in combination with QXmlStreamReader. When parsing
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    41
a certain format, for instance XHTML, each element name, body, span, table and
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    42
so forth, typically needs special treatment. QTokenAutomaton conceptually cuts
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    43
the string comparisons down to one.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    44
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    45
Beyond efficiency, QTokenAutomaton also increases type safety, since C++
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    46
identifiers are used instead of string literals.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    47
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    48
Usage
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    49
=====================
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    50
Using it is approached as follows:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    51
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    52
1. Create a token file. Use exampleFile.xml as a template.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    53
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    54
2. Make sure it is valid by validating against qtokenautomaton.xsd. On
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    55
   Linux, this can be achieved by running `xmllint --noout
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    56
   --schema qtokenautomaton.xsd yourFile.xml`
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    57
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    58
3. Produce the C++ files by invoking the stylesheet with an XSL-T 2.0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    59
   processor[1]. For instance, with the implementation Saxon, this would be:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    60
   `java net.sf.saxon.Transform -xsl:qautomaton2cpp.xsl yourFile.xml`
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    61
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    62
4. Include the produced C++ files with your build system.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    63
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    64
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    65
1.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    66
In Qt there is as of 4.4 no support for XSL-T.