|
1 /**************************************************************************** |
|
2 ** |
|
3 ** Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies). |
|
4 ** All rights reserved. |
|
5 ** Contact: Nokia Corporation (qt-info@nokia.com) |
|
6 ** |
|
7 ** This file is part of the documentation of the Qt Toolkit. |
|
8 ** |
|
9 ** $QT_BEGIN_LICENSE:LGPL$ |
|
10 ** No Commercial Usage |
|
11 ** This file contains pre-release code and may not be distributed. |
|
12 ** You may use this file in accordance with the terms and conditions |
|
13 ** contained in the Technology Preview License Agreement accompanying |
|
14 ** this package. |
|
15 ** |
|
16 ** GNU Lesser General Public License Usage |
|
17 ** Alternatively, this file may be used under the terms of the GNU Lesser |
|
18 ** General Public License version 2.1 as published by the Free Software |
|
19 ** Foundation and appearing in the file LICENSE.LGPL included in the |
|
20 ** packaging of this file. Please review the following information to |
|
21 ** ensure the GNU Lesser General Public License version 2.1 requirements |
|
22 ** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. |
|
23 ** |
|
24 ** In addition, as a special exception, Nokia gives you certain additional |
|
25 ** rights. These rights are described in the Nokia Qt LGPL Exception |
|
26 ** version 1.1, included in the file LGPL_EXCEPTION.txt in this package. |
|
27 ** |
|
28 ** If you have questions regarding the use of this file, please contact |
|
29 ** Nokia at qt-info@nokia.com. |
|
30 ** |
|
31 ** |
|
32 ** |
|
33 ** |
|
34 ** |
|
35 ** |
|
36 ** |
|
37 ** |
|
38 ** $QT_END_LICENSE$ |
|
39 ** |
|
40 ****************************************************************************/ |
|
41 |
|
42 /*! |
|
43 \page xmlprocessing.html |
|
44 \title Using XML Technologies |
|
45 |
|
46 \previouspage Working with the DOM Tree |
|
47 \contentspage XML Processing |
|
48 |
|
49 \keyword Patternist |
|
50 |
|
51 \brief An overview of Qt's support for using XML technologies in |
|
52 Qt programs. |
|
53 |
|
54 \tableofcontents |
|
55 |
|
56 \section1 Introduction |
|
57 |
|
58 XQuery is a language for traversing XML documents to select and |
|
59 aggregate items of interest and to transform them for output as |
|
60 XML or some other format. XPath is the \e{element selection} part |
|
61 of XQuery. |
|
62 |
|
63 The QtXmlPatterns module supports using |
|
64 \l{http://www.w3.org/TR/xquery} {XQuery 1.0} and |
|
65 \l{http://www.w3.org/TR/xpath20} {XPath 2.0} in Qt applications, |
|
66 for querying XML data \e{and} for querying |
|
67 \l{QAbstractXmlNodeModel} {non-XML data that can be modeled to |
|
68 look like XML}. The QtXmlPatterns module is included in the \l{Qt |
|
69 Full Framework Edition}, and the \l{Open Source Versions of Qt}. |
|
70 Readers who are not familiar with the XQuery/XPath language can read |
|
71 \l {A Short Path to XQuery} for a brief introduction. |
|
72 |
|
73 \section1 Advantages of using QtXmlPatterns and XQuery |
|
74 |
|
75 The XQuery/XPath language simplifies data searching and |
|
76 transformation tasks by eliminating the need for doing a lot of |
|
77 C++ or Java procedural programming for each new query task. Here |
|
78 is an XQuery that constructs a bibliography of the contents of a |
|
79 library: |
|
80 |
|
81 \target qtxmlpatterns_example_query |
|
82 \quotefile snippets/patternist/introductionExample.xq |
|
83 |
|
84 First, the query opens a \c{<bibliography>} element in the |
|
85 output. The |
|
86 \l{xquery-introduction.html#using-path-expressions-to-match-select-items} |
|
87 {embedded path expression} then loads the XML document describing |
|
88 the contents of the library (\c{library.xml}) and begins the |
|
89 search. For each \c{<book>} element it finds, where the publisher |
|
90 was Addison-Wesley and the publication year was after 1991, it |
|
91 creates a new \c{<book>} element in the output as a child of the |
|
92 open \c{<bibliography>} element. Each new \c{<book>} element gets |
|
93 the book's title as its contents and the book's publication year |
|
94 as an attribute. Finally, the \c{<bibliography>} element is |
|
95 closed. |
|
96 |
|
97 The advantages of using QtXmlPatterns and XQuery in your Qt |
|
98 programs are summarized as follows: |
|
99 |
|
100 \list |
|
101 |
|
102 \o \bold{Ease of development}: All the C++ programming required to |
|
103 perform data query tasks can be replaced by a simple XQuery |
|
104 like the example above. |
|
105 |
|
106 \o \bold{Comprehensive functionality}: The |
|
107 \l{http://www.w3.org/TR/xquery/#id-expressions} {expression |
|
108 syntax} and rich set of |
|
109 \l{http://www.w3.org/TR/xpath-functions} {functions and |
|
110 operators} provided by XQuery are sufficient for performing any |
|
111 data searching, selecting, and sorting tasks. |
|
112 |
|
113 \o \bold{Conformance to standards}: Conformance to all applicable |
|
114 XML and XQuery standards ensures that QtXmlPatterns can always |
|
115 process XML documents generated by other conformant |
|
116 applications, and that XML documents created with QtXmlPatterns |
|
117 can be processed by other conformant applications. |
|
118 |
|
119 \o \bold{Maximal flexibility} The QtXmlPatterns module can be used |
|
120 to query XML data \e{and} non-XML data that can be |
|
121 \l{QAbstractXmlNodeModel} {modeled to look like XML}. |
|
122 |
|
123 \endlist |
|
124 |
|
125 \section1 Using the QtXmlPatterns module |
|
126 |
|
127 There are two ways QtXmlPatterns can be used to evaluate queries. |
|
128 You can run the query engine in your Qt application using the |
|
129 QtXmlPatterns C++ API, or you can run the query engine from the |
|
130 command line using Qt's \c{xmlpatterns} command line utility. |
|
131 |
|
132 \section2 Running the query engine from your Qt application |
|
133 |
|
134 If we save the example XQuery shown above in a text file (e.g. |
|
135 \c{myquery.xq}), we can run it from a Qt application using a |
|
136 standard QtXmlPatterns code sequence: |
|
137 |
|
138 \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 3 |
|
139 |
|
140 First construct a QFile for the text file containing the XQuery |
|
141 (\c{myquery.xq}). Then create an instance of QXmlQuery and call |
|
142 its \l{QXmlQuery::}{setQuery()} function to load and parse the |
|
143 XQuery file. Then create an \l{QXmlSerializer} {XML serializer} to |
|
144 output the query's result set as unformatted XML. Finally, call |
|
145 the \l{QXmlQuery::}{evaluateTo()} function to evaluate the query |
|
146 and serialize the results as XML. |
|
147 |
|
148 \note If you compile Qt yourself, the QtXmlPatterns module will |
|
149 \e{not} be built if exceptions are disabled, or if you compile Qt |
|
150 with a compiler that doesn't support member templates, e.g., MSVC |
|
151 6. |
|
152 |
|
153 See the QXmlQuery documentation for more information about the |
|
154 QtXmlPatterns C++ API. |
|
155 |
|
156 \section2 Running the query engine from the command line utility |
|
157 |
|
158 \e xmlpatterns is a command line utility for running XQueries. It |
|
159 expects the name of a file containing the XQuery text. |
|
160 |
|
161 \snippet doc/src/snippets/code/doc_src_qtxmlpatterns.qdoc 2 |
|
162 |
|
163 The XQuery in \c{myQuery.xq} will be evaluated and its output |
|
164 written to \c stdout. Pass the \c -help switch to get the list of |
|
165 input flags and their meanings. |
|
166 |
|
167 xmlpatterns can be used in scripting. However, the descriptions |
|
168 and messages it outputs were not meant to be parsed and may be |
|
169 changed in future releases of Qt. |
|
170 |
|
171 \target QtXDM |
|
172 \section1 The XQuery Data Model |
|
173 |
|
174 XQuery represents data items as \e{atomic values} or \e{nodes}. An |
|
175 atomic value is a value in the domain of one of the |
|
176 \l{http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {built-in |
|
177 datatypes} defined in \l{http://www.w3.org/TR/xmlschema-2} {Part |
|
178 2} of the W3C XML Schema. A node is normally an XML element or |
|
179 attribute, but when non-XML data is \l{QAbstractXmlNodeModel} |
|
180 {modeled to look like XML}, a node can also represent a non-XML |
|
181 data items. |
|
182 |
|
183 When you run an XQuery using the C++ API in a Qt application, you |
|
184 will often want to bind program variables to $variables in the |
|
185 XQuery. After the query is evaluated, you will want to interpret |
|
186 the sequence of data items in the result set. |
|
187 |
|
188 \section2 Binding program variables to XQuery variables |
|
189 |
|
190 When you want to run a parameterized XQuery from your Qt |
|
191 application, you will need to \l{QXmlQuery::bindVariable()} {bind |
|
192 variables} in your program to $name variables in your XQuery. |
|
193 |
|
194 Suppose you want to parameterize the bibliography XQuery in the |
|
195 example above. You could define variables for the catalog that |
|
196 contains the library (\c{$file}), the publisher name |
|
197 (\c{$publisher}), and the year of publication (\c{$year}): |
|
198 |
|
199 \target qtxmlpatterns_example_query2 |
|
200 \quotefile snippets/patternist/introExample2.xq |
|
201 |
|
202 Modify the QtXmlPatterns code to use one of the \l{QXmlQuery::} |
|
203 {bindVariable()} functions to bind a program variable to each |
|
204 XQuery $variable: |
|
205 |
|
206 \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 4 |
|
207 |
|
208 Each program variable is passed to QtXmlPatterns as a QVariant of |
|
209 the type of the C++ variable or constant from which it is |
|
210 constructed. Note that QtXmlPatterns assumes that the type of the |
|
211 QVariant in the bindVariable() call is the correct type, so the |
|
212 $variable it is bound to must be used in the XQuery accordingly. |
|
213 The following table shows how QVariant types are mapped to XQuery |
|
214 $variable types: |
|
215 |
|
216 \table |
|
217 |
|
218 \header |
|
219 \o QVariant type |
|
220 \o XQuery $variable type |
|
221 |
|
222 \row |
|
223 \o QVariant::LongLong |
|
224 \o \c xs:integer |
|
225 |
|
226 \row |
|
227 \o QVariant::Int |
|
228 \o \c xs:integer |
|
229 |
|
230 \row |
|
231 \o QVariant::UInt |
|
232 \o \c xs:nonNegativeInteger |
|
233 |
|
234 \row |
|
235 \o QVariant::ULongLong |
|
236 \o \c xs:unsignedLong |
|
237 |
|
238 \row |
|
239 \o QVariant::String |
|
240 \o \c xs:string |
|
241 |
|
242 \row |
|
243 \o QVariant::Double |
|
244 \o \c xs:double |
|
245 |
|
246 \row |
|
247 \o QVariant::Bool |
|
248 \o \c xs:boolean |
|
249 |
|
250 \row |
|
251 \o QVariant::Double |
|
252 \o \c xs:decimal |
|
253 |
|
254 \row |
|
255 \o QVariant::ByteArray |
|
256 \o \c xs:base64Binary |
|
257 |
|
258 \row |
|
259 \o QVariant::StringList |
|
260 \o \c xs:string* |
|
261 |
|
262 \row |
|
263 \o QVariant::Url |
|
264 \o \c xs:string |
|
265 |
|
266 \row |
|
267 \o QVariant::Date |
|
268 \o \c xs:date. |
|
269 |
|
270 \row |
|
271 \o QVariant::DateTime |
|
272 \o \c xs:dateTime |
|
273 |
|
274 \row |
|
275 \o QVariant::Time. |
|
276 \o \c xs:time. (see \l{Binding To Time}{Binding To |
|
277 QVariant::Time} below) |
|
278 |
|
279 \row |
|
280 \o QVariantList |
|
281 \o (see \l{Binding To QVariantList}{Binding To QVariantList} |
|
282 below) |
|
283 |
|
284 \endtable |
|
285 |
|
286 A type not shown in the table is not supported and will cause |
|
287 undefined XQuery behavior or a $variable binding error, depending |
|
288 on the context in the XQuery where the variable is used. |
|
289 |
|
290 \target Binding To Time |
|
291 \section3 Binding To QVariant::Time |
|
292 |
|
293 Because the instance of QTime used in QVariant::Time does not |
|
294 include a zone offset, an instance of QVariant::Time should not be |
|
295 bound to an XQuery variable of type \c xs:time, unless the QTime is |
|
296 UTC. When binding a non-UTC QTime to an XQuery variable, it should |
|
297 first be passed as a string, or converted to a QDateTime with an arbitrary |
|
298 date, and then bound to an XQuery variable of type \c xs:dateTime. |
|
299 |
|
300 \target Binding To QVariantList |
|
301 \section3 Binding To QVariantList |
|
302 |
|
303 A QVariantList can be bound to an XQuery $variable. All the |
|
304 \l{QVariant}s in the list must be of the same atomic type, and the |
|
305 $variable the variant list is bound to must be of that same atomic |
|
306 type. If the QVariants in the list are not all of the same atomic |
|
307 type, the XQuery behavior is undefined. |
|
308 |
|
309 \section2 Interpreting XQuery results |
|
310 |
|
311 When the results of an XQuery are returned in a sequence of \l |
|
312 {QXmlResultItems} {result items}, atomic values in the sequence |
|
313 are treated as instances of QVariant. Suppose that instead of |
|
314 serializing the results of the XQuery as XML, we process the |
|
315 results programatically. Modify the standard QtXmlPatterns code |
|
316 sequence to call the overload of QXmlQuery::evaluateTo() that |
|
317 populates a sequence of \l {QXmlResultItems} {result items} with |
|
318 the XQuery results: |
|
319 |
|
320 \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 5 |
|
321 |
|
322 Iterate through the \l {QXmlResultItems} {result items} and test |
|
323 each QXmlItem to see if it is an atomic value or a node. If it is |
|
324 an atomic value, convert it to a QVariant with \l {QXmlItem::} |
|
325 {toAtomicValue()} and switch on its \l {QVariant::type()} {variant |
|
326 type} to handle all the atomic values your XQuery might return. |
|
327 The following table shows the QVariant type to expect for each |
|
328 atomic value type (or QXmlName): |
|
329 |
|
330 \table |
|
331 |
|
332 \header |
|
333 \o XQuery result item type |
|
334 \o QVariant type returned |
|
335 |
|
336 \row |
|
337 \o \c xs:QName |
|
338 \o QXmlName (see \l{Handling QXmlNames}{Handling QXmlNames} |
|
339 below) |
|
340 |
|
341 \row |
|
342 \o \c xs:integer |
|
343 \o QVariant::LongLong |
|
344 |
|
345 \row |
|
346 \o \c xs:string |
|
347 \o QVariant::String |
|
348 |
|
349 \row |
|
350 \o \c xs:string* |
|
351 \o QVariant::StringList |
|
352 |
|
353 \row |
|
354 \o \c xs:double |
|
355 \o QVariant::Double |
|
356 |
|
357 \row |
|
358 \o \c xs:float |
|
359 \o QVariant::Double |
|
360 |
|
361 \row |
|
362 \o \c xs:boolean |
|
363 \o QVariant::Bool |
|
364 |
|
365 \row |
|
366 \o \c xs:decimal |
|
367 \o QVariant::Double |
|
368 |
|
369 \row |
|
370 \o \c xs:hexBinary |
|
371 \o QVariant::ByteArray |
|
372 |
|
373 \row |
|
374 \o \c xs:base64Binary |
|
375 \o QVariant::ByteArray |
|
376 |
|
377 \row |
|
378 \o \c xs:gYear |
|
379 \o QVariant::DateTime |
|
380 |
|
381 \row |
|
382 \o \c xs:gYearMonth |
|
383 \o QVariant::DateTime |
|
384 |
|
385 \row |
|
386 \o \c xs:gMonthDay |
|
387 \o QVariant::DateTime |
|
388 |
|
389 \row |
|
390 \o \c xs:gDay |
|
391 \o QVariant::DateTime |
|
392 |
|
393 \row |
|
394 \o \c xs:gMonth |
|
395 \o QVariant::DateTime |
|
396 |
|
397 \row |
|
398 \o \c xs:anyURI |
|
399 \o QVariant::Url |
|
400 |
|
401 \row |
|
402 \o \c xs:untypedAtomic |
|
403 \o QVariant::String |
|
404 |
|
405 \row |
|
406 \o \c xs:ENTITY |
|
407 \o QVariant::String |
|
408 |
|
409 \row |
|
410 \o \c xs:date |
|
411 \o QVariant::DateTime |
|
412 |
|
413 \row |
|
414 \o \c xs:dateTime |
|
415 \o QVariant::DateTime |
|
416 |
|
417 \row |
|
418 \o \c xs:time |
|
419 \o (see \l{xstime-not-mapped}{No mapping for xs:time} below) |
|
420 |
|
421 \endtable |
|
422 |
|
423 \target Handling QXmlNames |
|
424 \section3 Handling QXmlNames |
|
425 |
|
426 If your XQuery can return atomic value items of type \c{xs:QName}, |
|
427 they will appear in your QXmlResultItems as instances of QXmlName. |
|
428 Since the QVariant class does not support the QXmlName class |
|
429 directly, extracting them from QXmlResultItems requires a bit of |
|
430 slight-of-hand using the \l{QMetaType} {Qt metatype system}. We |
|
431 must modify our example to use a couple of template functions, a |
|
432 friend of QMetaType (qMetaTypeId<T>()) and a friend of QVariant |
|
433 (qVariantValue<T>()): |
|
434 |
|
435 \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 6 |
|
436 |
|
437 To access the strings in a QXmlName returned by an |
|
438 \l{QXmlQuery::evaluateTo()} {XQuery evaluation}, the QXmlName must |
|
439 be accessed with the \l{QXmlNamePool} {name pool} from the |
|
440 instance of QXmlQuery that was used for the evaluation. |
|
441 |
|
442 \target xstime-not-mapped |
|
443 \section3 No mapping for xs:time |
|
444 |
|
445 An instance of \c xs:time can't be represented correctly as an |
|
446 instance of QVariant::Time, unless the \c xs:time is a UTC time. |
|
447 This is because xs:time has a zone offset (0 for UTC) in addition |
|
448 to the time value, which the QTime in QVariant::Time does not |
|
449 have. This means that if an XQuery tries to return an atomic value |
|
450 of type \c xs:time, an invalid QVariant will be returned. A query |
|
451 can return an atomic value of type xs:time by either converting it |
|
452 to an \c xs:dateTime with an arbitrary date, or to an \c xs:string. |
|
453 |
|
454 \section1 Using XQuery with Non-XML Data |
|
455 |
|
456 Although the XQuery language was designed for querying XML, with |
|
457 QtXmlPatterns one can use XQuery for querying any data that can |
|
458 be modeled to look like XML. Non-XML data is modeled to look like |
|
459 XML by loading it into a custom subclass of QAbstractXmlNodeModel, |
|
460 where it is then presented to the QtXmlPatterns XQuery engine via |
|
461 the same API the XQuery engine uses for querying XML. |
|
462 |
|
463 When QtXmlPatterns loads and queries XML files and produces XML |
|
464 output, it can always load the XML data into its default XML node |
|
465 model, where it can be traversed efficiently. The XQuery below |
|
466 traverses the product orders found in the XML file \e myOrders.xml |
|
467 to find all the skin care product orders and output them ordered |
|
468 by shipping date. |
|
469 |
|
470 \quotefile snippets/patternist/introAcneRemover.xq |
|
471 |
|
472 QtXmlPatterns can be used out of the box to perform this |
|
473 query, provided \e myOrders.xml actually contains well-formed XML. It |
|
474 can be loaded directly into the default XML node model and |
|
475 traversed. But suppose we want QtXmlPatterns to perform queries on |
|
476 the hierarchical structure of the local file system. The default |
|
477 XML node model in QtXmlPatterns is not suitable for navigating the |
|
478 file system, because there is no XML file to load that contains a |
|
479 description of it. Such an XML file, if it existed, might look |
|
480 something like this: |
|
481 |
|
482 \quotefile snippets/patternist/introFileHierarchy.xml |
|
483 |
|
484 The \l{File System Example}{File System Example} does exactly this. |
|
485 |
|
486 There is no such file to load into the default XML node model, but |
|
487 one can write a subclass of QAbstractXmlNodeModel to represent the |
|
488 file system. This custom XML node model, once populated with all |
|
489 the directory and file descriptors obtained directly from the |
|
490 system, presents the complete file system hierarchy to the query |
|
491 engine via the same API used by the default XML node model to |
|
492 present the contents of an XML file. In other words, once the |
|
493 custom XML node model is populated, it presents the file system to |
|
494 the query engine as if a description of it had been loaded into |
|
495 the default XML node model from an XML file like the one shown |
|
496 above. |
|
497 |
|
498 Now we can write an XQuery to find all the XML files and parse |
|
499 them to find the ones that don't contain well-formed XML. |
|
500 |
|
501 \quotefromfile snippets/patternist/introNavigateFS.xq |
|
502 \skipto <html> |
|
503 \printuntil |
|
504 |
|
505 Without QtXmlPatterns, there is no simple way to solve this kind |
|
506 of problem. You might do it by writing a C++ program to traverse |
|
507 the file system, sniff out all the XML files, and submit each one |
|
508 to an XML parser to test that it contains valid XML. The C++ code |
|
509 required to write that program will probably be more complex than |
|
510 the C++ code required to subclass QAbstractXmlNodeModel, but even |
|
511 if the two are comparable, your custom C++ program can be used |
|
512 only for that one task, while your custom XML node model can be |
|
513 used by any XQuery that must navigate the file system. |
|
514 |
|
515 The general approach to using XQuery to perform queries on non-XML |
|
516 data has been a three step process. In the first step, the data is |
|
517 loaded into a non-XML data model. In the second step, the non-XML |
|
518 data model is serialized as XML and output to XML (text) files. In |
|
519 the final step, an XML tool loads the XML files into a second, XML |
|
520 data model, where the XQueries can be performed. The development |
|
521 cost of implementing this process is often high, and the three |
|
522 step system that results is inefficient because the two data |
|
523 models must be built and maintained separately. |
|
524 |
|
525 With QtXmlPatterns, subclassing QAbstractXmlNodeModel eliminates |
|
526 the transformation required to convert the non-XML data model to |
|
527 the XML data model, because there is only ever one data model |
|
528 required. The non-XML data model presents the non-XML data to the |
|
529 query engine via the XML data model API. Also, since the query |
|
530 engine uses the API to access the QAbstractXmlNodeModel, the data |
|
531 model subclass can construct the elements, attributes and other |
|
532 data on demand, responding to the query's specific requests. This |
|
533 can greatly improve efficiency, because it means the entire model |
|
534 might not have to be built. For example, in the file system model |
|
535 above, it is not necessary to build an instance for a whole |
|
536 XML file representing the whole file system. Instead nodes are |
|
537 created on demand, which also likely is a small subset of the file |
|
538 system. |
|
539 |
|
540 Examples of other places where XQuery could be used in |
|
541 QtXmlPatterns to query non-XML data: |
|
542 |
|
543 \list |
|
544 |
|
545 \o The internal representation for word processor documents |
|
546 |
|
547 \o The set of dependencies for a software build system |
|
548 |
|
549 \o The hierarchy (or graph) that links a set of HTML documents |
|
550 from a web crawler |
|
551 |
|
552 \o The images and meta-data in an image collection |
|
553 |
|
554 \o The set of D-Bus interfaces available in a system |
|
555 |
|
556 \o A QObject hierarchy, as seen in the \l{QObject XML Model |
|
557 Example} {QObject XML Model example}. |
|
558 |
|
559 \endlist |
|
560 |
|
561 See the QAbstractXmlNodeModel documentation for information about |
|
562 how to implement custom XML node models. |
|
563 |
|
564 \section1 More on using QtXmlPatterns with non-XML Data |
|
565 |
|
566 Subclassing QAbstractXmlNodeModel to let the query engine access |
|
567 non-XML data by the same API it uses for XML is the feature that |
|
568 enables QtXmlPatterns to query non-XML data with XQuery. It allows |
|
569 XQuery to be used as a mapping layer between different non-XML |
|
570 node models or between a non-XML node model and the built-in XML |
|
571 node model. Once the subclass(es) of QAbstractXmlNodeModel have |
|
572 been written, XQuery can be used to select a set of elements from |
|
573 one node model, transform the selected elements, and then write |
|
574 them out, either as XML using QXmlQuery::evaluateTo() and QXmlSerializer, |
|
575 or as some other format using a subclass of QAbstractXmlReceiver. |
|
576 |
|
577 Consider a word processor application that must import and export |
|
578 data in several different formats. Rather than writing a lot of |
|
579 C++ code to convert each input format to an intermediate form, and |
|
580 more C++ code to convert the intermediate form back to each |
|
581 output format, one can implement a solution based on QtXmlPatterns |
|
582 that uses simple XQueries to transform each XML or non-XML format |
|
583 (e.g. MathFormula.xml below) to the intermediate form (e.g. the |
|
584 DocumentRepresentation node model class below), and more simple |
|
585 XQueries to transform the intermediate form back to each XML or |
|
586 non-XML format. |
|
587 |
|
588 \image patternist-wordProcessor.png |
|
589 |
|
590 Because CSV files are not XML, a subclass of QAbstractXmlNodeModel |
|
591 is used to present the CSV data to the XQuery engine as if it were |
|
592 XML. What are not shown are the subclasses of QAbstractXmlReceiver |
|
593 that would then send the selected elements into the |
|
594 DocumentRepresentation node model, and the subclasses of |
|
595 QAbstractXmlNodeModel that would ultimately write the output files |
|
596 in each format. |
|
597 |
|
598 \section1 Security Considerations |
|
599 |
|
600 \section2 Code Injection |
|
601 |
|
602 XQuery is vulnerable to |
|
603 \l{http://en.wikipedia.org/wiki/Code_injection} {code injection |
|
604 attacks} in the same way as the SQL language. If an XQuery is |
|
605 constructed by concatenating strings, and the strings come from |
|
606 user input, the constructed XQuery could be malevolent. The best |
|
607 way to prevent code injection attacks is to not construct XQueries |
|
608 from user-written strings, but only accept user data input using |
|
609 QVariant and variable bindings. See QXmlQuery::bindVariable(). |
|
610 |
|
611 The articles |
|
612 \l{http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html} |
|
613 {Avoid the dangers of XPath injection}, by Robi Sen and |
|
614 \l{http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf} |
|
615 {Blind XPath Injection}, by Amit Klein, discuss the XQuery code |
|
616 injection problem in more detail. |
|
617 |
|
618 \section2 Denial of Service Attacks |
|
619 |
|
620 Applications using QtXmlPatterns are subject to the same |
|
621 limitations of software as other systems. Generally, these can not |
|
622 be checked. This means QtXmlPatterns does not prevent rogue |
|
623 queries from consuming too many resources. For example, a query |
|
624 could take too much time to execute or try to transfer too much |
|
625 data. A query could also do too much recursion, which could crash |
|
626 the system. XQueries can do these things accidentally, but they |
|
627 can also be done as deliberate denial of service attacks. |
|
628 |
|
629 \section1 Features and Conformance |
|
630 |
|
631 \section2 XQuery 1.0 |
|
632 |
|
633 QtXmlPatterns aims at being a |
|
634 \l{http://www.w3.org/TR/xquery/#id-xquery-conformance} {conformant |
|
635 XQuery processor}. It adheres to |
|
636 \l{http://www.w3.org/TR/xquery/#id-minimal-conformance} {Minimal |
|
637 Conformance} and supports the |
|
638 \l{http://www.w3.org/TR/xquery/#id-serialization-feature} |
|
639 {Serialization Feature} and the |
|
640 \l{http://www.w3.org/TR/xquery/#id-full-axis-feature} {Full Axis |
|
641 Feature}. QtXmlPatterns currently passes 97% of the tests in the |
|
642 \l{http://www.w3.org/XML/Query/test-suite} {XML Query Test Suite}. |
|
643 Areas where conformance may be questionable and where behavior may |
|
644 be changed in future releases include: |
|
645 |
|
646 \list |
|
647 |
|
648 \o Some corner cases involving namespaces and element constructors |
|
649 are incorrect. |
|
650 |
|
651 \o XPath is a subset of XQuery and the implementation of |
|
652 QtXmlPatterns uses XPath 2.0 with XQuery 1.0. |
|
653 |
|
654 \endlist |
|
655 |
|
656 The specifications discusses conformance further: |
|
657 \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query |
|
658 Language}. W3C's XQuery testing effort can be of interest as |
|
659 well, \l{http://www.w3.org/XML/Query/test-suite/}{XML Query Test |
|
660 Suite}. |
|
661 |
|
662 Currently \c fn:collection() does not access any data set, and |
|
663 there is no API for providing data through the collection. As a |
|
664 result, evaluating \c fn:collection() returns the empty |
|
665 sequence. We intend to provide functionality for this in a future |
|
666 release of Qt. |
|
667 |
|
668 Only queries encoded in UTF-8 are supported. |
|
669 |
|
670 \section2 XSLT 2.0 |
|
671 |
|
672 Partial support for XSLT was introduced in Qt 4.5. Future |
|
673 releases of QtXmlPatterns will aim to support these XSLT |
|
674 features: |
|
675 |
|
676 \list |
|
677 \o Basic XSLT 2.0 processor |
|
678 \o Serialization feature |
|
679 \o Backwards Compatibility feature |
|
680 \endlist |
|
681 |
|
682 For details, see \l{http://www.w3.org/TR/xslt20/#conformance}{XSL |
|
683 Transformations (XSLT) Version 2.0, 21 Conformance}. |
|
684 |
|
685 \note In this release, XSLT support is considered experimental. |
|
686 |
|
687 Unsupported or partially supported XSLT features are documented |
|
688 in the following table. The implementation of XSLT in Qt 4.5 can |
|
689 be seen as XSLT 1.0 but with the data model of XPath 2.0 and |
|
690 XSLT 2.0, and using the using the functionality of XPath 2.0 and |
|
691 its accompanying function library. When QtXmlPatterns encounters |
|
692 an unsupported or partially support feature, it will either report |
|
693 a syntax error or silently continue, unless otherwise noted in the |
|
694 table. |
|
695 |
|
696 The implementation currently passes 42% of W3C's XSLT test suite, |
|
697 which focus on features introduced in XSLT 2.0. |
|
698 |
|
699 \table |
|
700 \header |
|
701 \o XSL Feature |
|
702 \o Support Status |
|
703 \row |
|
704 \o \c xsl:key and \c fn:key() |
|
705 \o not supported |
|
706 \row |
|
707 \o \c xsl:include |
|
708 \o not supported |
|
709 \row |
|
710 \o \c xsl:import |
|
711 \o not supported |
|
712 \row |
|
713 \o \c xsl:copy |
|
714 |
|
715 \o The \c copy-namespaces and \c inherit-namespaces attributes |
|
716 have no effect. For copied comments, attributes and |
|
717 processing instructions, the copy has the same node |
|
718 identity as the original. |
|
719 |
|
720 \row |
|
721 \o \c xsl:copy-of |
|
722 \o The \c copy-namespaces attribute has no effect. |
|
723 \row |
|
724 \o \c fn:format-number() |
|
725 \o not supported |
|
726 \row |
|
727 \o \c xsl:message |
|
728 \o not supported |
|
729 \row |
|
730 \o \c xsl:use-when |
|
731 \o not supported |
|
732 \row |
|
733 \o \c Tunnel Parameters |
|
734 \o not supported |
|
735 \row |
|
736 \o \c xsl:attribute-set |
|
737 \o not supported |
|
738 \row |
|
739 \o \c xsl:decimal-format |
|
740 \o not supported |
|
741 \row |
|
742 \o \c xsl:fallback |
|
743 \o not supported |
|
744 \row |
|
745 \o \c xsl:apply-imports |
|
746 \o not supported |
|
747 \row |
|
748 \o \c xsl:character-map |
|
749 \o not supported |
|
750 \row |
|
751 \o \c xsl:number |
|
752 \o not supported |
|
753 \row |
|
754 \o \c xsl:namespace-alias |
|
755 \o not supported |
|
756 \row |
|
757 \o \c xsl:output |
|
758 \o not supported |
|
759 \row |
|
760 \o \c xsl:output-character |
|
761 \o not supported |
|
762 \row |
|
763 \o \c xsl:preserve-space |
|
764 \o not supported |
|
765 \row |
|
766 \o \c xsl:result-document |
|
767 \o not supported |
|
768 \row |
|
769 \o Patterns |
|
770 \o Complex patterns or patterns with predicates have issues. |
|
771 \row |
|
772 \o \c 2.0 Compatibility Mode |
|
773 |
|
774 \o Stylesheets are interpreted as XSLT 2.0 stylesheets, even |
|
775 if the \c version attribute is in the XSLT source is |
|
776 1.0. In other words, the version attribute is ignored. |
|
777 |
|
778 \row |
|
779 \o Grouping |
|
780 |
|
781 \o \c fn:current-group(), \c fn:grouping-key() and \c |
|
782 xsl:for-each-group. |
|
783 |
|
784 \row |
|
785 \o Regexp elements |
|
786 \o \c xsl:analyze-string, \c xsl:matching-substring, |
|
787 \c xsl:non-matching-substring, and \c fn:regex-group() |
|
788 \row |
|
789 \o Date & Time formatting |
|
790 \o \c fn:format-dateTime(), \c fn:format-date() and fn:format-time(). |
|
791 |
|
792 \row |
|
793 \o XPath Conformance |
|
794 \o Since XPath is a subset of XSLT, its issues are in affect too. |
|
795 \endtable |
|
796 |
|
797 The QtXmlPatterns implementation of the XPath Data Model does not |
|
798 include entities (due to QXmlStreamReader not reporting them). |
|
799 This means that functions \c unparsed-entity-uri() and \c |
|
800 unparsed-entity-public-id() always return negatively. |
|
801 |
|
802 \section2 XPath 2.0 |
|
803 |
|
804 Since XPath 2.0 is a subset of XQuery 1.0, XPath 2.0 is |
|
805 supported. Areas where conformance may be questionable and, |
|
806 consequently, where behavior may be changed in future releases |
|
807 include: |
|
808 |
|
809 \list |
|
810 \o Regular expression support is currently not conformant |
|
811 but follows Qt's QRegExp standard syntax. |
|
812 |
|
813 \o Operators for \c xs:time, \c xs:date, and \c xs:dateTime |
|
814 are incomplete. |
|
815 |
|
816 \o Formatting of very large or very small \c xs:double, \c |
|
817 xs:float, and \c xs:decimal values may be incorrect. |
|
818 \endlist |
|
819 |
|
820 \section2 xml:id |
|
821 |
|
822 Processing of XML files supports \c xml:id. This allows elements |
|
823 that have an attribute named \c xml:id to be looked up efficiently |
|
824 with the \c fn:id() function. See |
|
825 \l{http://www.w3.org/TR/xml-id/}{xml:id Version 1.0} for details. |
|
826 |
|
827 \section2 XML Schema 1.0 |
|
828 |
|
829 There are two ways QtXmlPatterns can be used to validate schemas: |
|
830 You can use the C++ API in your Qt application using the classes |
|
831 QXmlSchema and QXmlSchemaValidator, or you can use the command line |
|
832 utility named xmlpatternsvalidator (located in the "bin" directory |
|
833 of your Qt build). |
|
834 |
|
835 The QtXmlPatterns implementation of XML Schema validation supports |
|
836 the schema specification version 1.0 in large parts. Known problems |
|
837 of the implementation and areas where conformancy may be questionable |
|
838 are: |
|
839 |
|
840 \list |
|
841 \o Large \c minOccurs or \c maxOccurs values or deeply nested ones |
|
842 require huge amount of memory which might cause the system to freeze. |
|
843 Such a schema should be rewritten to use \c unbounded as value instead |
|
844 of large numbers. This restriction will hopefully be fixed in a later release. |
|
845 \o Comparison of really small or large floating point values might lead to |
|
846 wrong results in some cases. However such numbers should not be relevant |
|
847 for day-to-day usage. |
|
848 \o Regular expression support is currently not conformant but follows |
|
849 Qt's QRegExp standard syntax. |
|
850 \o Identity constraint checks can not use the values of default or fixed |
|
851 attribute definitions. |
|
852 \endlist |
|
853 |
|
854 \section2 Resource Loading |
|
855 |
|
856 When QtXmlPatterns loads an XML resource, e.g., using the |
|
857 \c fn:doc() function, the following schemes are supported: |
|
858 |
|
859 \table |
|
860 \header |
|
861 \o Scheme Name |
|
862 \o Description |
|
863 \row |
|
864 \o \c file |
|
865 \o Local files. |
|
866 \row |
|
867 \o \c data |
|
868 |
|
869 \o The bytes are encoded in the URI itself. e.g., \c |
|
870 data:application/xml,%3Ce%2F%3E is \c <e/>. |
|
871 |
|
872 \row |
|
873 \o \c ftp |
|
874 \o Resources retrieved via FTP. |
|
875 \row |
|
876 \o \c http |
|
877 \o Resources retrieved via HTTP. |
|
878 \row |
|
879 \o \c https |
|
880 \o Resources retrieved via HTTPS. This will succeed if no SSL |
|
881 errors are encountered. |
|
882 \row |
|
883 \o \c qrc |
|
884 \o Qt Resource files. Expressing it as an empty scheme, :/..., |
|
885 is not supported. |
|
886 |
|
887 \endtable |
|
888 |
|
889 \section2 XML |
|
890 |
|
891 XML 1.0 and XML Namespaces 1.0 are supported, as opposed to the |
|
892 1.1 versions. When a strings is passed to a query as a QString, |
|
893 the characters must be XML 1.0 characters. Otherwise, the behavior |
|
894 is undefined. This is not checked. |
|
895 |
|
896 URIs are first passed to QAbstractUriResolver. Check |
|
897 QXmlQuery::setUriResolver() for possible rewrites. |
|
898 */ |
|
899 |
|
900 /*! |
|
901 \namespace QPatternist |
|
902 \brief The QPatternist namespace contains classes and functions required by the QtXmlPatterns module. |
|
903 \internal |
|
904 */ |