author | eckhart.koppen@nokia.com |
Wed, 31 Mar 2010 11:06:36 +0300 | |
changeset 7 | f7bc934e204c |
permissions | -rw-r--r-- |
7
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
1 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
2 |
qtokenautomaton is a token generator, that generates a simple, Unicode aware |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
3 |
tokenizer for C++ that uses the Qt API. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
4 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
5 |
Introduction |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
6 |
===================== |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
7 |
QTokenAutomaton generates a C++ class that essentially has this interface: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
8 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
9 |
class YourTokenizer |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
10 |
{ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
11 |
protected: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
12 |
enum Token |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
13 |
{ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
14 |
A, |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
15 |
B, |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
16 |
C, |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
17 |
NoKeyword |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
18 |
}; |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
19 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
20 |
static inline Token toToken(const QString &string); |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
21 |
static inline Token toToken(const QStringRef &string); |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
22 |
static Token toToken(const QChar *data, int length); |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
23 |
static QString toString(Token token); |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
24 |
}; |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
25 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
26 |
When calling toToken(), the tokenizer returns the enum value corresponding to |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
27 |
the string. This is done with O(N) time complexity, where N is the length of |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
28 |
the string. The returned value can then subsequently be efficiently switched |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
29 |
over. The alternatives, either a long chain of if statements comparing one |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
30 |
QString to several other QStrings; or inserting all strings first into a hash, |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
31 |
are less efficient. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
32 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
33 |
For instance, the latter case of using a hash would involve when excluding the |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
34 |
initial populating of the hash, O(N) + O(1) where 0(1) is assumed to be a |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
35 |
non-conflicting hash lookup. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
36 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
37 |
toString(), which returns the string for the token that an enum value |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
38 |
represents, is implemented to store the strings in an efficient manner. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
39 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
40 |
A typical usage scenario is in combination with QXmlStreamReader. When parsing |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
41 |
a certain format, for instance XHTML, each element name, body, span, table and |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
42 |
so forth, typically needs special treatment. QTokenAutomaton conceptually cuts |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
43 |
the string comparisons down to one. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
44 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
45 |
Beyond efficiency, QTokenAutomaton also increases type safety, since C++ |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
46 |
identifiers are used instead of string literals. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
47 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
48 |
Usage |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
49 |
===================== |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
50 |
Using it is approached as follows: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
51 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
52 |
1. Create a token file. Use exampleFile.xml as a template. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
53 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
54 |
2. Make sure it is valid by validating against qtokenautomaton.xsd. On |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
55 |
Linux, this can be achieved by running `xmllint --noout |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
56 |
--schema qtokenautomaton.xsd yourFile.xml` |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
57 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
58 |
3. Produce the C++ files by invoking the stylesheet with an XSL-T 2.0 |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
59 |
processor[1]. For instance, with the implementation Saxon, this would be: |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
60 |
`java net.sf.saxon.Transform -xsl:qautomaton2cpp.xsl yourFile.xml` |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
61 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
62 |
4. Include the produced C++ files with your build system. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
63 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
64 |
|
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
65 |
1. |
f7bc934e204c
5cabc75a39ca2f064f70b40f72ed93c74c4dc19b
eckhart.koppen@nokia.com
parents:
diff
changeset
|
66 |
In Qt there is as of 4.4 no support for XSL-T. |