src/xmlpatterns/qtokenautomaton/README
changeset 0 1918ee327afb
equal deleted inserted replaced
-1:000000000000 0:1918ee327afb
       
     1 
       
     2 qtokenautomaton is a token generator, that generates a simple, Unicode aware
       
     3 tokenizer for C++ that uses the Qt API.
       
     4 
       
     5 Introduction
       
     6 =====================
       
     7 QTokenAutomaton generates a C++ class that essentially has this interface:
       
     8 
       
     9     class YourTokenizer
       
    10     {
       
    11     protected:
       
    12         enum Token
       
    13         {
       
    14             A,
       
    15             B,
       
    16             C,
       
    17             NoKeyword
       
    18         };
       
    19 
       
    20         static inline Token toToken(const QString &string);
       
    21         static inline Token toToken(const QStringRef &string);
       
    22         static Token toToken(const QChar *data, int length);
       
    23         static QString toString(Token token);
       
    24     };
       
    25 
       
    26 When calling toToken(), the tokenizer returns the enum value corresponding to
       
    27 the string. This is done with O(N) time complexity, where N is the length of
       
    28 the string. The returned value can then subsequently be efficiently switched
       
    29 over. The alternatives, either a long chain of if statements comparing one
       
    30 QString to several other QStrings; or inserting all strings first into a hash,
       
    31 are less efficient.
       
    32 
       
    33 For instance, the latter case of using a hash would involve when excluding the
       
    34 initial populating of the hash, O(N) + O(1) where 0(1) is assumed to be a
       
    35 non-conflicting hash lookup.
       
    36 
       
    37 toString(), which returns the string for the token that an enum value
       
    38 represents, is implemented to store the strings in an efficient manner.
       
    39 
       
    40 A typical usage scenario is in combination with QXmlStreamReader. When parsing
       
    41 a certain format, for instance XHTML, each element name, body, span, table and
       
    42 so forth, typically needs special treatment. QTokenAutomaton conceptually cuts
       
    43 the string comparisons down to one.
       
    44 
       
    45 Beyond efficiency, QTokenAutomaton also increases type safety, since C++
       
    46 identifiers are used instead of string literals.
       
    47 
       
    48 Usage
       
    49 =====================
       
    50 Using it is approached as follows:
       
    51 
       
    52 1. Create a token file. Use exampleFile.xml as a template.
       
    53 
       
    54 2. Make sure it is valid by validating against qtokenautomaton.xsd. On
       
    55    Linux, this can be achieved by running `xmllint --noout
       
    56    --schema qtokenautomaton.xsd yourFile.xml`
       
    57 
       
    58 3. Produce the C++ files by invoking the stylesheet with an XSL-T 2.0
       
    59    processor[1]. For instance, with the implementation Saxon, this would be:
       
    60    `java net.sf.saxon.Transform -xsl:qautomaton2cpp.xsl yourFile.xml`
       
    61 
       
    62 4. Include the produced C++ files with your build system.
       
    63 
       
    64 
       
    65 1.
       
    66 In Qt there is as of 4.4 no support for XSL-T.