|
1 .\" This manpage has been automatically generated by docbook2man |
|
2 .\" from a DocBook document. This tool can be found at: |
|
3 .\" <http://shell.ipoline.com/~elmert/comp/docbook2X/> |
|
4 .\" Please send any bug reports, improvements, comments, patches, |
|
5 .\" etc. to Steve Cheng <steve@ggi-project.org>. |
|
6 .TH "XMLWF" "1" "22 April 2002" "" "" |
|
7 .SH NAME |
|
8 xmlwf \- Determines if an XML document is well-formed |
|
9 .SH SYNOPSIS |
|
10 |
|
11 \fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] |
|
12 |
|
13 .SH "DESCRIPTION" |
|
14 .PP |
|
15 \fBxmlwf\fR uses the Expat library to determine |
|
16 if an XML document is well-formed. It is non-validating. |
|
17 .PP |
|
18 If you do not specify any files on the command-line, |
|
19 and you have a recent version of xmlwf, the input |
|
20 file will be read from stdin. |
|
21 .SH "WELL-FORMED DOCUMENTS" |
|
22 .PP |
|
23 A well-formed document must adhere to the |
|
24 following rules: |
|
25 .TP 0.2i |
|
26 \(bu |
|
27 The file begins with an XML declaration. For instance, |
|
28 <?xml version="1.0" standalone="yes"?>. |
|
29 \fBNOTE:\fR xmlwf does not currently |
|
30 check for a valid XML declaration. |
|
31 .TP 0.2i |
|
32 \(bu |
|
33 Every start tag is either empty (<tag/>) |
|
34 or has a corresponding end tag. |
|
35 .TP 0.2i |
|
36 \(bu |
|
37 There is exactly one root element. This element must contain |
|
38 all other elements in the document. Only comments, white |
|
39 space, and processing instructions may come after the close |
|
40 of the root element. |
|
41 .TP 0.2i |
|
42 \(bu |
|
43 All elements nest properly. |
|
44 .TP 0.2i |
|
45 \(bu |
|
46 All attribute values are enclosed in quotes (either single |
|
47 or double). |
|
48 .PP |
|
49 If the document has a DTD, and it strictly complies with that |
|
50 DTD, then the document is also considered \fBvalid\fR. |
|
51 xmlwf is a non-validating parser -- it does not check the DTD. |
|
52 However, it does support external entities (see the -x option). |
|
53 .SH "OPTIONS" |
|
54 .PP |
|
55 When an option includes an argument, you may specify the argument either |
|
56 separate ("d output") or mashed ("-doutput"). xmlwf supports both. |
|
57 .TP |
|
58 \fB-c\fR |
|
59 If the input file is well-formed and xmlwf doesn't |
|
60 encounter any errors, the input file is simply copied to |
|
61 the output directory unchanged. |
|
62 This implies no namespaces (turns off -n) and |
|
63 requires -d to specify an output file. |
|
64 .TP |
|
65 \fB-d output-dir\fR |
|
66 Specifies a directory to contain transformed |
|
67 representations of the input files. |
|
68 By default, -d outputs a canonical representation |
|
69 (described below). |
|
70 You can select different output formats using -c and -m. |
|
71 |
|
72 The output filenames will |
|
73 be exactly the same as the input filenames or "STDIN" if the input is |
|
74 coming from STDIN. Therefore, you must be careful that the |
|
75 output file does not go into the same directory as the input |
|
76 file. Otherwise, xmlwf will delete the input file before |
|
77 it generates the output file (just like running |
|
78 cat < file > file in most shells). |
|
79 |
|
80 Two structurally equivalent XML documents have a byte-for-byte |
|
81 identical canonical XML representation. |
|
82 Note that ignorable white space is considered significant and |
|
83 is treated equivalently to data. |
|
84 More on canonical XML can be found at |
|
85 http://www.jclark.com/xml/canonxml.html . |
|
86 .TP |
|
87 \fB-e encoding\fR |
|
88 Specifies the character encoding for the document, overriding |
|
89 any document encoding declaration. xmlwf |
|
90 has four built-in encodings: |
|
91 US-ASCII, |
|
92 UTF-8, |
|
93 UTF-16, and |
|
94 ISO-8859-1. |
|
95 Also see the -w option. |
|
96 .TP |
|
97 \fB-m\fR |
|
98 Outputs some strange sort of XML file that completely |
|
99 describes the the input file, including character postitions. |
|
100 Requires -d to specify an output file. |
|
101 .TP |
|
102 \fB-n\fR |
|
103 Turns on namespace processing. (describe namespaces) |
|
104 -c disables namespaces. |
|
105 .TP |
|
106 \fB-p\fR |
|
107 Tells xmlwf to process external DTDs and parameter |
|
108 entities. |
|
109 |
|
110 Normally xmlwf never parses parameter entities. |
|
111 -p tells it to always parse them. |
|
112 -p implies -x. |
|
113 .TP |
|
114 \fB-r\fR |
|
115 Normally xmlwf memory-maps the XML file before parsing. |
|
116 -r turns off memory-mapping and uses normal file IO calls instead. |
|
117 Of course, memory-mapping is automatically turned off |
|
118 when reading from STDIN. |
|
119 .TP |
|
120 \fB-s\fR |
|
121 Prints an error if the document is not standalone. |
|
122 A document is standalone if it has no external subset and no |
|
123 references to parameter entities. |
|
124 .TP |
|
125 \fB-t\fR |
|
126 Turns on timings. This tells Expat to parse the entire file, |
|
127 but not perform any processing. |
|
128 This gives a fairly accurate idea of the raw speed of Expat itself |
|
129 without client overhead. |
|
130 -t turns off most of the output options (-d, -m -c, ...). |
|
131 .TP |
|
132 \fB-v\fR |
|
133 Prints the version of the Expat library being used, and then exits. |
|
134 .TP |
|
135 \fB-w\fR |
|
136 Enables Windows code pages. |
|
137 Normally, xmlwf will throw an error if it runs across |
|
138 an encoding that it is not equipped to handle itself. With |
|
139 -w, xmlwf will try to use a Windows code page. See |
|
140 also -e. |
|
141 .TP |
|
142 \fB-x\fR |
|
143 Turns on parsing external entities. |
|
144 |
|
145 Non-validating parsers are not required to resolve external |
|
146 entities, or even expand entities at all. |
|
147 Expat always expands internal entities (?), |
|
148 but external entity parsing must be enabled explicitly. |
|
149 |
|
150 External entities are simply entities that obtain their |
|
151 data from outside the XML file currently being parsed. |
|
152 |
|
153 This is an example of an internal entity: |
|
154 |
|
155 .nf |
|
156 <!ENTITY vers '1.0.2'> |
|
157 .fi |
|
158 |
|
159 And here are some examples of external entities: |
|
160 |
|
161 .nf |
|
162 <!ENTITY header SYSTEM "header-&vers;.xml"> (parsed) |
|
163 <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) |
|
164 .fi |
|
165 .TP |
|
166 \fB--\fR |
|
167 For some reason, xmlwf specifically ignores "--" |
|
168 anywhere it appears on the command line. |
|
169 .PP |
|
170 Older versions of xmlwf do not support reading from STDIN. |
|
171 .SH "OUTPUT" |
|
172 .PP |
|
173 If an input file is not well-formed, xmlwf outputs |
|
174 a single line describing the problem to STDOUT. |
|
175 If a file is well formed, xmlwf outputs nothing. |
|
176 Note that the result code is \fBnot\fR set. |
|
177 .SH "BUGS" |
|
178 .PP |
|
179 According to the W3C standard, an XML file without a |
|
180 declaration at the beginning is not considered well-formed. |
|
181 However, xmlwf allows this to pass. |
|
182 .PP |
|
183 xmlwf returns a 0 - noerr result, even if the file is |
|
184 not well-formed. There is no good way for a program to use |
|
185 xmlwf to quickly check a file -- it must parse xmlwf's STDOUT. |
|
186 .PP |
|
187 The errors should go to STDERR, not stdout. |
|
188 .PP |
|
189 There should be a way to get -d to send its output to STDOUT |
|
190 rather than forcing the user to send it to a file. |
|
191 .PP |
|
192 I have no idea why anyone would want to use the -d, -c |
|
193 and -m options. If someone could explain it to me, I'd |
|
194 like to add this information to this manpage. |
|
195 .SH "ALTERNATIVES" |
|
196 .PP |
|
197 Here are some XML validators on the web: |
|
198 |
|
199 .nf |
|
200 http://www.hcrc.ed.ac.uk/~richard/xml-check.html |
|
201 http://www.stg.brown.edu/service/xmlvalid/ |
|
202 http://www.scripting.com/frontier5/xml/code/xmlValidator.html |
|
203 http://www.xml.com/pub/a/tools/ruwf/check.html |