author | Eckhart Koeppen <eckhart.koppen@nokia.com> |
Thu, 08 Apr 2010 14:19:33 +0300 | |
branch | RCL_3 |
changeset 7 | 3f74d0d4af4c |
parent 0 | 1918ee327afb |
permissions | -rw-r--r-- |
7
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
1 |
<HTML> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
2 |
<TITLE>Canonical XML</TITLE> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
3 |
<BODY> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
4 |
<H1>Canonical XML</H1> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
5 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
6 |
This document defines a subset of XML called canonical XML. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
7 |
The intended use of canonical XML is in testing XML processors, |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
8 |
as a representation of the result of parsing an XML document. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
9 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
10 |
Every well-formed XML document has a unique structurally equivalent |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
11 |
canonical XML document. Two structurally equivalent XML |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
12 |
documents have a byte-for-byte identical canonical XML document. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
13 |
Canonicalizing an XML document requires only information that an XML |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
14 |
processor is required to make available to an application. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
15 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
16 |
A canonical XML document conforms to the following grammar: |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
17 |
<PRE> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
18 |
CanonXML ::= Pi* element Pi* |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
19 |
element ::= Stag (Datachar | Pi | element)* Etag |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
20 |
Stag ::= '<' Name Atts '>' |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
21 |
Etag ::= '</' Name '>' |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
22 |
Pi ::= '<?' Name ' ' (((Char - S) Char*)? - (Char* '?>' Char*)) '?>' |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
23 |
Atts ::= (' ' Name '=' '"' Datachar* '"')* |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
24 |
Datachar ::= '&amp;' | '&lt;' | '&gt;' | '&quot;' |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
25 |
| '&#9;'| '&#10;'| '&#13;' |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
26 |
| (Char - ('&' | '<' | '>' | '"' | #x9 | #xA | #xD)) |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
27 |
Name ::= (see XML spec) |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
28 |
Char ::= (see XML spec) |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
29 |
S ::= (see XML spec) |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
30 |
</PRE> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
31 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
32 |
Attributes are in lexicographical order (in Unicode bit order). |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
33 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
34 |
A canonical XML document is encoded in UTF-8. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
35 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
36 |
Ignorable white space is considered significant and is treated equivalently |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
37 |
to data. |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
38 |
<P> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
39 |
<ADDRESS> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
40 |
<A HREF="mailto:jjc@jclark.com">James Clark</A> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
41 |
</ADDRESS> |
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
42 |
|
3f74d0d4af4c
qt:70947f0f93d948bc89b3b43d00da758a51f1ef84
Eckhart Koeppen <eckhart.koppen@nokia.com>
parents:
0
diff
changeset
|
43 |
</BODY> |
0 | 44 |
</HTML> |