|
1 |
|
2 qtokenautomaton is a token generator, that generates a simple, Unicode aware |
|
3 tokenizer for C++ that uses the Qt API. |
|
4 |
|
5 Introduction |
|
6 ===================== |
|
7 QTokenAutomaton generates a C++ class that essentially has this interface: |
|
8 |
|
9 class YourTokenizer |
|
10 { |
|
11 protected: |
|
12 enum Token |
|
13 { |
|
14 A, |
|
15 B, |
|
16 C, |
|
17 NoKeyword |
|
18 }; |
|
19 |
|
20 static inline Token toToken(const QString &string); |
|
21 static inline Token toToken(const QStringRef &string); |
|
22 static Token toToken(const QChar *data, int length); |
|
23 static QString toString(Token token); |
|
24 }; |
|
25 |
|
26 When calling toToken(), the tokenizer returns the enum value corresponding to |
|
27 the string. This is done with O(N) time complexity, where N is the length of |
|
28 the string. The returned value can then subsequently be efficiently switched |
|
29 over. The alternatives, either a long chain of if statements comparing one |
|
30 QString to several other QStrings; or inserting all strings first into a hash, |
|
31 are less efficient. |
|
32 |
|
33 For instance, the latter case of using a hash would involve when excluding the |
|
34 initial populating of the hash, O(N) + O(1) where 0(1) is assumed to be a |
|
35 non-conflicting hash lookup. |
|
36 |
|
37 toString(), which returns the string for the token that an enum value |
|
38 represents, is implemented to store the strings in an efficient manner. |
|
39 |
|
40 A typical usage scenario is in combination with QXmlStreamReader. When parsing |
|
41 a certain format, for instance XHTML, each element name, body, span, table and |
|
42 so forth, typically needs special treatment. QTokenAutomaton conceptually cuts |
|
43 the string comparisons down to one. |
|
44 |
|
45 Beyond efficiency, QTokenAutomaton also increases type safety, since C++ |
|
46 identifiers are used instead of string literals. |
|
47 |
|
48 Usage |
|
49 ===================== |
|
50 Using it is approached as follows: |
|
51 |
|
52 1. Create a token file. Use exampleFile.xml as a template. |
|
53 |
|
54 2. Make sure it is valid by validating against qtokenautomaton.xsd. On |
|
55 Linux, this can be achieved by running `xmllint --noout |
|
56 --schema qtokenautomaton.xsd yourFile.xml` |
|
57 |
|
58 3. Produce the C++ files by invoking the stylesheet with an XSL-T 2.0 |
|
59 processor[1]. For instance, with the implementation Saxon, this would be: |
|
60 `java net.sf.saxon.Transform -xsl:qautomaton2cpp.xsl yourFile.xml` |
|
61 |
|
62 4. Include the produced C++ files with your build system. |
|
63 |
|
64 |
|
65 1. |
|
66 In Qt there is as of 4.4 no support for XSL-T. |