|
1 <?xml version="1.0" encoding="iso-8859-1"?> |
|
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" |
|
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
|
4 <html> |
|
5 <head> |
|
6 <!-- Copyright 1999,2000 Clark Cooper <coopercc@netheaven.com> |
|
7 All rights reserved. |
|
8 This is free software. You may distribute or modify according to |
|
9 the terms of the MIT/X License --> |
|
10 <title>Expat XML Parser</title> |
|
11 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> |
|
12 <meta http-equiv="Content-Style-Type" content="text/css" /> |
|
13 <link href="style.css" rel="stylesheet" type="text/css" /> |
|
14 </head> |
|
15 <body> |
|
16 <h1>Expat XML Parser</h1> |
|
17 |
|
18 <p>Expat is a library, written in C, for parsing XML documents. It's |
|
19 the underlying XML parser for the open source Mozilla project, Perl's |
|
20 <code>XML::Parser</code>, Python's <code>xml.parsers.expat</code>, and |
|
21 other open-source XML parsers.</p> |
|
22 |
|
23 <p>This library is the creation of James Clark, who's also given us |
|
24 groff (an nroff look-alike), Jade (an implemention of ISO's DSSSL |
|
25 stylesheet language for SGML), XP (a Java XML parser package), XT (a |
|
26 Java XSL engine). James was also the technical lead on the XML |
|
27 Working Group at W3C that produced the XML specification.</p> |
|
28 |
|
29 <p>This is free software, licensed under the <a |
|
30 href="../COPYING">MIT/X Consortium license</a>. You may download it |
|
31 from <a href="http://www.libexpat.org/">the Expat home page</a>. |
|
32 </p> |
|
33 |
|
34 <p>The bulk of this document was originally commissioned as an article by |
|
35 <a href="http://www.xml.com/">XML.com</a>. They graciously allowed |
|
36 Clark Cooper to retain copyright and to distribute it with Expat.</p> |
|
37 |
|
38 <hr /> |
|
39 <h2>Table of Contents</h2> |
|
40 <ul> |
|
41 <li><a href="#overview">Overview</a></li> |
|
42 <li><a href="#building">Building and Installing</a></li> |
|
43 <li><a href="#using">Using Expat</a></li> |
|
44 <li><a href="#reference">Reference</a> |
|
45 <ul> |
|
46 <li><a href="#creation">Parser Creation Functions</a> |
|
47 <ul> |
|
48 <li><a href="#XML_ParserCreate">XML_ParserCreate</a></li> |
|
49 <li><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></li> |
|
50 <li><a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a></li> |
|
51 <li><a href="#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></li> |
|
52 <li><a href="#XML_ParserFree">XML_ParserFree</a></li> |
|
53 <li><a href="#XML_ParserReset">XML_ParserReset</a></li> |
|
54 </ul> |
|
55 </li> |
|
56 <li><a href="#parsing">Parsing Functions</a> |
|
57 <ul> |
|
58 <li><a href="#XML_Parse">XML_Parse</a></li> |
|
59 <li><a href="#XML_ParseBuffer">XML_ParseBuffer</a></li> |
|
60 <li><a href="#XML_GetBuffer">XML_GetBuffer</a></li> |
|
61 </ul> |
|
62 </li> |
|
63 <li><a href="#setting">Handler Setting Functions</a> |
|
64 <ul> |
|
65 <li><a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a></li> |
|
66 <li><a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a></li> |
|
67 <li><a href="#XML_SetElementHandler">XML_SetElementHandler</a></li> |
|
68 <li><a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></li> |
|
69 <li><a href="#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a></li> |
|
70 <li><a href="#XML_SetCommentHandler">XML_SetCommentHandler</a></li> |
|
71 <li><a href="#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a></li> |
|
72 <li><a href="#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a></li> |
|
73 <li><a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a></li> |
|
74 <li><a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a></li> |
|
75 <li><a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></li> |
|
76 <li><a href="#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></li> |
|
77 <li><a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a></li> |
|
78 <li><a href="#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a></li> |
|
79 <li><a href="#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a></li> |
|
80 <li><a href="#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a></li> |
|
81 <li><a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></li> |
|
82 <li><a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></li> |
|
83 <li><a href="#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a></li> |
|
84 <li><a href="#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a></li> |
|
85 <li><a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a></li> |
|
86 <li><a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a></li> |
|
87 <li><a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a></li> |
|
88 <li><a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a></li> |
|
89 <li><a href="#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a></li> |
|
90 <li><a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a></li> |
|
91 <li><a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a></li> |
|
92 </ul> |
|
93 </li> |
|
94 <li><a href="#position">Parse Position and Error Reporting Functions</a> |
|
95 <ul> |
|
96 <li><a href="#XML_GetErrorCode">XML_GetErrorCode</a></li> |
|
97 <li><a href="#XML_ErrorString">XML_ErrorString</a></li> |
|
98 <li><a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></li> |
|
99 <li><a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></li> |
|
100 <li><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></li> |
|
101 <li><a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a></li> |
|
102 <li><a href="#XML_GetInputContext">XML_GetInputContext</a></li> |
|
103 </ul> |
|
104 </li> |
|
105 <li><a href="#miscellaneous">Miscellaneous Functions</a> |
|
106 <ul> |
|
107 <li><a href="#XML_SetUserData">XML_SetUserData</a></li> |
|
108 <li><a href="#XML_GetUserData">XML_GetUserData</a></li> |
|
109 <li><a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a></li> |
|
110 <li><a href="#XML_SetBase">XML_SetBase</a></li> |
|
111 <li><a href="#XML_GetBase">XML_GetBase</a></li> |
|
112 <li><a href="#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a></li> |
|
113 <li><a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a></li> |
|
114 <li><a href="#XML_SetEncoding">XML_SetEncoding</a></li> |
|
115 <li><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></li> |
|
116 <li><a href="#XML_UseForeignDTD">XML_UseForeignDTD</a></li> |
|
117 <li><a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></li> |
|
118 <li><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></li> |
|
119 <li><a href="#XML_ExpatVersion">XML_ExpatVersion</a></li> |
|
120 <li><a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a></li> |
|
121 <li><a href="#XML_GetFeatureList">XML_GetFeatureList</a></li> |
|
122 </ul> |
|
123 </li> |
|
124 </ul> |
|
125 </li> |
|
126 </ul> |
|
127 |
|
128 <hr /> |
|
129 <h2><a name="overview">Overview</a></h2> |
|
130 |
|
131 <p>Expat is a stream-oriented parser. You register callback (or |
|
132 handler) functions with the parser and then start feeding it the |
|
133 document. As the parser recognizes parts of the document, it will |
|
134 call the appropriate handler for that part (if you've registered one.) |
|
135 The document is fed to the parser in pieces, so you can start parsing |
|
136 before you have all the document. This also allows you to parse really |
|
137 huge documents that won't fit into memory.</p> |
|
138 |
|
139 <p>Expat can be intimidating due to the many kinds of handlers and |
|
140 options you can set. But you only need to learn four functions in |
|
141 order to do 90% of what you'll want to do with it:</p> |
|
142 |
|
143 <dl> |
|
144 |
|
145 <dt><code><a href= "#XML_ParserCreate" |
|
146 >XML_ParserCreate</a></code></dt> |
|
147 <dd>Create a new parser object.</dd> |
|
148 |
|
149 <dt><code><a href= "#XML_SetElementHandler" |
|
150 >XML_SetElementHandler</a></code></dt> |
|
151 <dd>Set handlers for start and end tags.</dd> |
|
152 |
|
153 <dt><code><a href= "#XML_SetCharacterDataHandler" |
|
154 >XML_SetCharacterDataHandler</a></code></dt> |
|
155 <dd>Set handler for text.</dd> |
|
156 |
|
157 <dt><code><a href= "#XML_Parse" |
|
158 >XML_Parse</a></code></dt> |
|
159 <dd>Pass a buffer full of document to the parser</dd> |
|
160 </dl> |
|
161 |
|
162 <p>These functions and others are described in the <a |
|
163 href="#reference">reference</a> part of this document. The reference |
|
164 section also describes in detail the parameters passed to the |
|
165 different types of handlers.</p> |
|
166 |
|
167 <p>Let's look at a very simple example program that only uses 3 of the |
|
168 above functions (it doesn't need to set a character handler.) The |
|
169 program <a href="../examples/outline.c">outline.c</a> prints an |
|
170 element outline, indenting child elements to distinguish them from the |
|
171 parent element that contains them. The start handler does all the |
|
172 work. It prints two indenting spaces for every level of ancestor |
|
173 elements, then it prints the element and attribute |
|
174 information. Finally it increments the global <code>Depth</code> |
|
175 variable.</p> |
|
176 |
|
177 <pre class="eg"> |
|
178 int Depth; |
|
179 |
|
180 void |
|
181 start(void *data, const char *el, const char **attr) { |
|
182 int i; |
|
183 |
|
184 for (i = 0; i < Depth; i++) |
|
185 printf(" "); |
|
186 |
|
187 printf("%s", el); |
|
188 |
|
189 for (i = 0; attr[i]; i += 2) { |
|
190 printf(" %s='%s'", attr[i], attr[i + 1]); |
|
191 } |
|
192 |
|
193 printf("\n"); |
|
194 Depth++; |
|
195 } /* End of start handler */ |
|
196 </pre> |
|
197 |
|
198 <p>The end tag simply does the bookkeeping work of decrementing |
|
199 <code>Depth</code>.</p> |
|
200 <pre class="eg"> |
|
201 void |
|
202 end(void *data, const char *el) { |
|
203 Depth--; |
|
204 } /* End of end handler */ |
|
205 </pre> |
|
206 |
|
207 <p>After creating the parser, the main program just has the job of |
|
208 shoveling the document to the parser so that it can do its work.</p> |
|
209 |
|
210 <hr /> |
|
211 <h2><a name="building">Building and Installing Expat</a></h2> |
|
212 |
|
213 <p>The Expat distribution comes as a compressed (with GNU gzip) tar |
|
214 file. You may download the latest version from <a href= |
|
215 "http://sourceforge.net/projects/expat/" >Source Forge</a>. After |
|
216 unpacking this, cd into the directory. Then follow either the Win32 |
|
217 directions or Unix directions below.</p> |
|
218 |
|
219 <h3>Building under Win32</h3> |
|
220 |
|
221 <p>If you're using the GNU compiler under cygwin, follow the Unix |
|
222 directions in the next section. Otherwise if you have Microsoft's |
|
223 Developer Studio installed, then from Windows Explorer double-click on |
|
224 "expat.dsp" in the lib directory and build and install in the usual |
|
225 manner.</p> |
|
226 |
|
227 <p>Alternatively, you may download the Win32 binary package that |
|
228 contains the "expat.h" include file and a pre-built DLL.</p> |
|
229 |
|
230 <h3>Building under Unix (or GNU)</h3> |
|
231 |
|
232 <p>First you'll need to run the configure shell script in order to |
|
233 configure the Makefiles and headers for your system.</p> |
|
234 |
|
235 <p>If you're happy with all the defaults that configure picks for you, |
|
236 and you have permission on your system to install into /usr/local, you |
|
237 can install Expat with this sequence of commands:</p> |
|
238 |
|
239 <pre class="eg"> |
|
240 ./configure |
|
241 make |
|
242 make install |
|
243 </pre> |
|
244 |
|
245 <p>There are some options that you can provide to this script, but the |
|
246 only one we'll mention here is the <code>--prefix</code> option. You |
|
247 can find out all the options available by running configure with just |
|
248 the <code>--help</code> option.</p> |
|
249 |
|
250 <p>By default, the configure script sets things up so that the library |
|
251 gets installed in <code>/usr/local/lib</code> and the associated |
|
252 header file in <code>/usr/local/include</code>. But if you were to |
|
253 give the option, <code>--prefix=/home/me/mystuff</code>, then the |
|
254 library and header would get installed in |
|
255 <code>/home/me/mystuff/lib</code> and |
|
256 <code>/home/me/mystuff/include</code> respectively.</p> |
|
257 |
|
258 <hr /> |
|
259 <h2><a name="using">Using Expat</a></h2> |
|
260 |
|
261 <h3>Compiling and Linking Against Expat</h3> |
|
262 |
|
263 <p>Unless you installed Expat in a location not expected by your |
|
264 compiler and linker, all you have to do to use Expat in your programs |
|
265 is to include the Expat header (<code>#include <expat.h></code>) |
|
266 in your files that make calls to it and to tell the linker that it |
|
267 needs to link against the Expat library. On Unix systems, this would |
|
268 usually be done with the <code>-lexpat</code> argument. Otherwise, |
|
269 you'll need to tell the compiler where to look for the Expat header |
|
270 and the linker where to find the Expat library. You may also need to |
|
271 take steps to tell the operating system where to find this libary at |
|
272 run time.</p> |
|
273 |
|
274 <p>On a Unix-based system, here's what a Makefile might look like when |
|
275 Expat is installed in a standard location:</p> |
|
276 |
|
277 <pre class="eg"> |
|
278 CC=cc |
|
279 LDFLAGS= |
|
280 LIBS= -lexpat |
|
281 xmlapp: xmlapp.o |
|
282 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) |
|
283 </pre> |
|
284 |
|
285 <p>If you installed Expat in, say, <code>/home/me/mystuff</code>, then |
|
286 the Makefile would look like this:</p> |
|
287 |
|
288 <pre class="eg"> |
|
289 CC=cc |
|
290 CFLAGS= -I/home/me/mystuff/include |
|
291 LDFLAGS= |
|
292 LIBS= -L/home/me/mystuff/lib -lexpat |
|
293 xmlapp: xmlapp.o |
|
294 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) |
|
295 </pre> |
|
296 |
|
297 <p>You'd also have to set the environment variable |
|
298 <code>LD_LIBRARY_PATH</code> to <code>/home/me/mystuff/lib</code> (or |
|
299 to <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if |
|
300 LD_LIBRARY_PATH already has some directories in it) in order to run |
|
301 your application.</p> |
|
302 |
|
303 <h3>Expat Basics</h3> |
|
304 |
|
305 <p>As we saw in the example in the overview, the first step in parsing |
|
306 an XML document with Expat is to create a parser object. There are <a |
|
307 href="#creation">three functions</a> in the Expat API for creating a |
|
308 parser object. However, only two of these (<code><a href= |
|
309 "#XML_ParserCreate" >XML_ParserCreate</a></code> and <code><a href= |
|
310 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>) can be used for |
|
311 constructing a parser for a top-level document. The object returned |
|
312 by these functions is an opaque pointer (i.e. "expat.h" declares it as |
|
313 void *) to data with further internal structure. In order to free the |
|
314 memory associated with this object you must call <code><a href= |
|
315 "#XML_ParserFree" >XML_ParserFree</a></code>. Note that if you have |
|
316 provided any <a href="userdata">user data</a> that gets stored in the |
|
317 parser, then your application is responsible for freeing it prior to |
|
318 calling <code>XML_ParserFree</code>.</p> |
|
319 |
|
320 <p>The objects returned by the parser creation functions are good for |
|
321 parsing only one XML document or external parsed entity. If your |
|
322 application needs to parse many XML documents, then it needs to create |
|
323 a parser object for each one. The best way to deal with this is to |
|
324 create a higher level object that contains all the default |
|
325 initialization you want for your parser objects.</p> |
|
326 |
|
327 <p>Walking through a document hierarchy with a stream oriented parser |
|
328 will require a good stack mechanism in order to keep track of current |
|
329 context. For instance, to answer the simple question, "What element |
|
330 does this text belong to?" requires a stack, since the parser may have |
|
331 descended into other elements that are children of the current one and |
|
332 has encountered this text on the way out.</p> |
|
333 |
|
334 <p>The things you're likely to want to keep on a stack are the |
|
335 currently opened element and it's attributes. You push this |
|
336 information onto the stack in the start handler and you pop it off in |
|
337 the end handler.</p> |
|
338 |
|
339 <p>For some tasks, it is sufficient to just keep information on what |
|
340 the depth of the stack is (or would be if you had one.) The outline |
|
341 program shown above presents one example. Another such task would be |
|
342 skipping over a complete element. When you see the start tag for the |
|
343 element you want to skip, you set a skip flag and record the depth at |
|
344 which the element started. When the end tag handler encounters the |
|
345 same depth, the skipped element has ended and the flag may be |
|
346 cleared. If you follow the convention that the root element starts at |
|
347 1, then you can use the same variable for skip flag and skip |
|
348 depth.</p> |
|
349 |
|
350 <pre class="eg"> |
|
351 void |
|
352 init_info(Parseinfo *info) { |
|
353 info->skip = 0; |
|
354 info->depth = 1; |
|
355 /* Other initializations here */ |
|
356 } /* End of init_info */ |
|
357 |
|
358 void |
|
359 rawstart(void *data, const char *el, const char **attr) { |
|
360 Parseinfo *inf = (Parseinfo *) data; |
|
361 |
|
362 if (! inf->skip) { |
|
363 if (should_skip(inf, el, attr)) { |
|
364 inf->skip = inf->depth; |
|
365 } |
|
366 else |
|
367 start(inf, el, attr); /* This does rest of start handling */ |
|
368 } |
|
369 |
|
370 inf->depth++; |
|
371 } /* End of rawstart */ |
|
372 |
|
373 void |
|
374 rawend(void *data, const char *el) { |
|
375 Parseinfo *inf = (Parseinfo *) data; |
|
376 |
|
377 inf->depth--; |
|
378 |
|
379 if (! inf->skip) |
|
380 end(inf, el); /* This does rest of end handling */ |
|
381 |
|
382 if (inf->skip == inf->depth) |
|
383 inf->skip = 0; |
|
384 } /* End rawend */ |
|
385 </pre> |
|
386 |
|
387 <p>Notice in the above example the difference in how depth is |
|
388 manipulated in the start and end handlers. The end tag handler should |
|
389 be the mirror image of the start tag handler. This is necessary to |
|
390 properly model containment. Since, in the start tag handler, we |
|
391 incremented depth <em>after</em> the main body of start tag code, then |
|
392 in the end handler, we need to manipulate it <em>before</em> the main |
|
393 body. If we'd decided to increment it first thing in the start |
|
394 handler, then we'd have had to decrement it last thing in the end |
|
395 handler.</p> |
|
396 |
|
397 <h3 id="userdata">Communicating between handlers</h3> |
|
398 |
|
399 <p>In order to be able to pass information between different handlers |
|
400 without using globals, you'll need to define a data structure to hold |
|
401 the shared variables. You can then tell Expat (with the <code><a href= |
|
402 "#XML_SetUserData" >XML_SetUserData</a></code> function) to pass a |
|
403 pointer to this structure to the handlers. This is typically the first |
|
404 argument received by most handlers.</p> |
|
405 |
|
406 <h3>XML Version</h3> |
|
407 |
|
408 <p>Expat is an XML 1.0 parser, and as such never complains based on |
|
409 the value of the <code>version</code> pseudo-attribute in the XML |
|
410 declaration, if present.</p> |
|
411 |
|
412 <p>If an application needs to check the version number (to support |
|
413 alternate processing), it should use the <code><a href= |
|
414 "#XML_SetXmlDeclHandler" >XML_SetXmlDeclHandler</a></code> function to |
|
415 set a handler that uses the information in the XML declaration to |
|
416 determine what to do. This example shows how to check that only a |
|
417 version number of <code>"1.0"</code> is accepted:</p> |
|
418 |
|
419 <pre class="eg"> |
|
420 static int wrong_version; |
|
421 static XML_Parser parser; |
|
422 |
|
423 static void |
|
424 xmldecl_handler(void *userData, |
|
425 const XML_Char *version, |
|
426 const XML_Char *encoding, |
|
427 int standalone) |
|
428 { |
|
429 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; |
|
430 |
|
431 int i; |
|
432 |
|
433 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { |
|
434 if (version[i] != Version_1_0[i]) { |
|
435 wrong_version = 1; |
|
436 /* also clear all other handlers: */ |
|
437 XML_SetCharacterDataHandler(parser, NULL); |
|
438 ... |
|
439 return; |
|
440 } |
|
441 } |
|
442 ... |
|
443 } |
|
444 </pre> |
|
445 |
|
446 <h3>Namespace Processing</h3> |
|
447 |
|
448 <p>When the parser is created using the <code><a href= |
|
449 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, function, Expat |
|
450 performs namespace processing. Under namespace processing, Expat |
|
451 consumes <code>xmlns</code> and <code>xmlns:...</code> attributes, |
|
452 which declare namespaces for the scope of the element in which they |
|
453 occur. This means that your start handler will not see these |
|
454 attributes. Your application can still be informed of these |
|
455 declarations by setting namespace declaration handlers with <a href= |
|
456 "#XML_SetNamespaceDeclHandler" |
|
457 ><code>XML_SetNamespaceDeclHandler</code></a>.</p> |
|
458 |
|
459 <p>Element type and attribute names that belong to a given namespace |
|
460 are passed to the appropriate handler in expanded form. By default |
|
461 this expanded form is a concatenation of the namespace URI, the |
|
462 separator character (which is the 2nd argument to <code><a href= |
|
463 "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>), and the local |
|
464 name (i.e. the part after the colon). Names with undeclared prefixes |
|
465 are passed through to the handlers unchanged, with the prefix and |
|
466 colon still attached. Unprefixed attribute names are never expanded, |
|
467 and unprefixed element names are only expanded when they are in the |
|
468 scope of a default namespace.</p> |
|
469 |
|
470 <p>However if <code><a href= "XML_SetReturnNSTriplet" |
|
471 >XML_SetReturnNSTriplet</a></code> has been called with a non-zero |
|
472 <code>do_nst</code> parameter, then the expanded form for names with |
|
473 an explicit prefix is a concatenation of: URI, separator, local name, |
|
474 separator, prefix.</p> |
|
475 |
|
476 <p>You can set handlers for the start of a namespace declaration and |
|
477 for the end of a scope of a declaration with the <code><a href= |
|
478 "#XML_SetNamespaceDeclHandler" >XML_SetNamespaceDeclHandler</a></code> |
|
479 function. The StartNamespaceDeclHandler is called prior to the start |
|
480 tag handler and the EndNamespaceDeclHandler is called before the |
|
481 corresponding end tag that ends the namespace's scope. The namespace |
|
482 start handler gets passed the prefix and URI for the namespace. For a |
|
483 default namespace declaration (xmlns='...'), the prefix will be null. |
|
484 The URI will be null for the case where the default namespace is being |
|
485 unset. The namespace end handler just gets the prefix for the closing |
|
486 scope.</p> |
|
487 |
|
488 <p>These handlers are called for each declaration. So if, for |
|
489 instance, a start tag had three namespace declarations, then the |
|
490 StartNamespaceDeclHandler would be called three times before the start |
|
491 tag handler is called, once for each declaration.</p> |
|
492 |
|
493 <h3>Character Encodings</h3> |
|
494 |
|
495 <p>While XML is based on Unicode, and every XML processor is required |
|
496 to recognized UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), |
|
497 other encodings may be declared in XML documents or entities. For the |
|
498 main document, an XML declaration may contain an encoding |
|
499 declaration:</p> |
|
500 <pre> |
|
501 <?xml version="1.0" encoding="ISO-8859-2"?> |
|
502 </pre> |
|
503 |
|
504 <p>External parsed entities may begin with a text declaration, which |
|
505 looks like an XML declaration with just an encoding declaration:</p> |
|
506 <pre> |
|
507 <?xml encoding="Big5"?> |
|
508 </pre> |
|
509 |
|
510 <p>With Expat, you may also specify an encoding at the time of |
|
511 creating a parser. This is useful when the encoding information may |
|
512 come from a source outside the document itself (like a higher level |
|
513 protocol.)</p> |
|
514 |
|
515 <p><a name="builtin_encodings"></a>There are four built-in encodings |
|
516 in Expat:</p> |
|
517 <ul> |
|
518 <li>UTF-8</li> |
|
519 <li>UTF-16</li> |
|
520 <li>ISO-8859-1</li> |
|
521 <li>US-ASCII</li> |
|
522 </ul> |
|
523 |
|
524 <p>Anything else discovered in an encoding declaration or in the |
|
525 protocol encoding specified in the parser constructor, triggers a call |
|
526 to the <code>UnknownEncodingHandler</code>. This handler gets passed |
|
527 the encoding name and a pointer to an <code>XML_Encoding</code> data |
|
528 structure. Your handler must fill in this structure and return 1 if it |
|
529 knows how to deal with the encoding. Otherwise the handler should |
|
530 return 0. The handler also gets passed a pointer to an optional |
|
531 application data structure that you may indicate when you set the |
|
532 handler.</p> |
|
533 |
|
534 <p>Expat places restrictions on character encodings that it can |
|
535 support by filling in the <code>XML_Encoding</code> structure. |
|
536 include file:</p> |
|
537 <ol> |
|
538 <li>Every ASCII character that can appear in a well-formed XML document |
|
539 must be represented by a single byte, and that byte must correspond to |
|
540 it's ASCII encoding (except for the characters $@\^'{}~)</li> |
|
541 <li>Characters must be encoded in 4 bytes or less.</li> |
|
542 <li>All characters encoded must have Unicode scalar values less than or |
|
543 equal to 65535 (0xFFFF)<em>This does not apply to the built-in support |
|
544 for UTF-16 and UTF-8</em></li> |
|
545 <li>No character may be encoded by more that one distinct sequence of |
|
546 bytes</li> |
|
547 </ol> |
|
548 |
|
549 <p><code>XML_Encoding</code> contains an array of integers that |
|
550 correspond to the 1st byte of an encoding sequence. If the value in |
|
551 the array for a byte is zero or positive, then the byte is a single |
|
552 byte encoding that encodes the Unicode scalar value contained in the |
|
553 array. A -1 in this array indicates a malformed byte. If the value is |
|
554 -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte |
|
555 sequence respectively. Multi-byte sequences are sent to the convert |
|
556 function pointed at in the <code>XML_Encoding</code> structure. This |
|
557 function should return the Unicode scalar value for the sequence or -1 |
|
558 if the sequence is malformed.</p> |
|
559 |
|
560 <p>One pitfall that novice Expat users are likely to fall into is that |
|
561 although Expat may accept input in various encodings, the strings that |
|
562 it passes to the handlers are always encoded in UTF-8 or UTF-16 |
|
563 (depending on how Expat was compiled). Your application is responsible |
|
564 for any translation of these strings into other encodings.</p> |
|
565 |
|
566 <h3>Handling External Entity References</h3> |
|
567 |
|
568 <p>Expat does not read or parse external entities directly. Note that |
|
569 any external DTD is a special case of an external entity. If you've |
|
570 set no <code>ExternalEntityRefHandler</code>, then external entity |
|
571 references are silently ignored. Otherwise, it calls your handler with |
|
572 the information needed to read and parse the external entity.</p> |
|
573 |
|
574 <p>Your handler isn't actually responsible for parsing the entity, but |
|
575 it is responsible for creating a subsidiary parser with <code><a href= |
|
576 "#XML_ExternalEntityParserCreate" |
|
577 >XML_ExternalEntityParserCreate</a></code> that will do the job. This |
|
578 returns an instance of <code>XML_Parser</code> that has handlers and |
|
579 other data structures initialized from the parent parser. You may then |
|
580 use <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a |
|
581 href= "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this |
|
582 parser. Since external entities my refer to other external entities, |
|
583 your handler should be prepared to be called recursively.</p> |
|
584 |
|
585 <h3>Parsing DTDs</h3> |
|
586 |
|
587 <p>In order to parse parameter entities, before starting the parse, |
|
588 you must call <code><a href= "#XML_SetParamEntityParsing" |
|
589 >XML_SetParamEntityParsing</a></code> with one of the following |
|
590 arguments:</p> |
|
591 <dl> |
|
592 <dt><code>XML_PARAM_ENTITY_PARSING_NEVER</code></dt> |
|
593 <dd>Don't parse parameter entities or the external subset</dd> |
|
594 <dt><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></dt> |
|
595 <dd>Parse parameter entites and the external subset unless |
|
596 <code>standalone</code> was set to "yes" in the XML declaration.</dd> |
|
597 <dt><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></dt> |
|
598 <dd>Always parse parameter entities and the external subset</dd> |
|
599 </dl> |
|
600 |
|
601 <p>In order to read an external DTD, you also have to set an external |
|
602 entity reference handler as described above.</p> |
|
603 |
|
604 <hr /> |
|
605 <!-- ================================================================ --> |
|
606 |
|
607 <h2><a name="reference">Expat Reference</a></h2> |
|
608 |
|
609 <h3><a name="creation">Parser Creation</a></h3> |
|
610 |
|
611 <pre class="fcndec" id="XML_ParserCreate"> |
|
612 XML_Parser |
|
613 XML_ParserCreate(const XML_Char *encoding); |
|
614 </pre> |
|
615 <div class="fcndef"> |
|
616 Construct a new parser. If encoding is non-null, it specifies a |
|
617 character encoding to use for the document. This overrides the document |
|
618 encoding declaration. There are four built-in encodings: |
|
619 <ul> |
|
620 <li>US-ASCII</li> |
|
621 <li>UTF-8</li> |
|
622 <li>UTF-16</li> |
|
623 <li>ISO-8859-1</li> |
|
624 </ul> |
|
625 Any other value will invoke a call to the UnknownEncodingHandler. |
|
626 </div> |
|
627 |
|
628 <pre class="fcndec" id="XML_ParserCreateNS"> |
|
629 XML_Parser |
|
630 XML_ParserCreateNS(const XML_Char *encoding, |
|
631 XML_Char sep); |
|
632 </pre> |
|
633 <div class="fcndef"> |
|
634 Constructs a new parser that has namespace processing in effect. Namespace |
|
635 expanded element names and attribute names are returned as a concatenation |
|
636 of the namespace URI, <em>sep</em>, and the local part of the name. This |
|
637 means that you should pick a character for <em>sep</em> that can't be |
|
638 part of a legal URI.</div> |
|
639 |
|
640 <pre class="fcndec" id="XML_ParserCreate_MM"> |
|
641 XML_Parser |
|
642 XML_ParserCreate_MM(const XML_Char *encoding, |
|
643 const XML_Memory_Handling_Suite *ms, |
|
644 const XML_Char *sep); |
|
645 </pre> |
|
646 <pre class="signature"> |
|
647 typedef struct { |
|
648 void *(*malloc_fcn)(size_t size); |
|
649 void *(*realloc_fcn)(void *ptr, size_t size); |
|
650 void (*free_fcn)(void *ptr); |
|
651 } XML_Memory_Handling_Suite; |
|
652 </pre> |
|
653 <div class="fcndef"> |
|
654 <p>Construct a new parser using the suite of memory handling functions |
|
655 specified in <code>ms</code>. If <code>ms</code> is NULL, then use the |
|
656 standard set of memory management functions. If <code>sep</code> is |
|
657 non NULL, then namespace processing is enabled in the created parser |
|
658 and the character pointed at by sep is used as the separator between |
|
659 the namespace URI and the local part of the name.</p> |
|
660 </div> |
|
661 |
|
662 <pre class="fcndec" id="XML_ExternalEntityParserCreate"> |
|
663 XML_Parser |
|
664 XML_ExternalEntityParserCreate(XML_Parser p, |
|
665 const XML_Char *context, |
|
666 const XML_Char *encoding); |
|
667 </pre> |
|
668 <div class="fcndef"> |
|
669 Construct a new <code>XML_Parser</code> object for parsing an external |
|
670 general entity. Context is the context argument passed in a call to a |
|
671 ExternalEntityRefHandler. Other state information such as handlers, |
|
672 user data, namespace processing is inherited from the parser passed as |
|
673 the 1st argument. So you shouldn't need to call any of the behavior |
|
674 changing functions on this parser (unless you want it to act |
|
675 differently than the parent parser). |
|
676 </div> |
|
677 |
|
678 <pre class="fcndec" id="XML_ParserFree"> |
|
679 void |
|
680 XML_ParserFree(XML_Parser p); |
|
681 </pre> |
|
682 <div class="fcndef"> |
|
683 Free memory used by the parser. Your application is responsible for |
|
684 freeing any memory associated with <a href="#userdata">user data</a>. |
|
685 </div> |
|
686 |
|
687 <pre class="fcndec" id="XML_ParserReset"> |
|
688 XML_Bool |
|
689 XML_ParserReset(XML_Parser p); |
|
690 </pre> |
|
691 <div class="fcndef"> |
|
692 Clean up the memory structures maintained by the parser so that it may |
|
693 be used again. After this has been called, <code>parser</code> is |
|
694 ready to start parsing a new document. This function may not be used |
|
695 on a parser created using <code><a href= |
|
696 "#XML_ExternalEntityParserCreate" >XML_ExternalEntityParserCreate</a |
|
697 ></code>; it will return <code>XML_FALSE</code> in that case. Returns |
|
698 <code>XML_TRUE</code> on success. Your application is responsible for |
|
699 dealing with any memory associated with <a href="#userdata">user data</a>. |
|
700 </div> |
|
701 |
|
702 <h3><a name="parsing">Parsing</a></h3> |
|
703 |
|
704 <p>To state the obvious: the three parsing functions <code><a href= |
|
705 "#XML_Parse" >XML_Parse</a></code>, <code><a href= "#XML_ParseBuffer" |
|
706 >XML_ParseBuffer</a></code> and <code><a href= "#XML_GetBuffer" |
|
707 >>XML_GetBuffer</a></code> must not be |
|
708 called from within a handler unless they operate on a separate parser |
|
709 instance, that is, one that did not call the handler. For example, it |
|
710 is OK to call the parsing functions from within an |
|
711 <code>XML_ExternalEntityRefHandler</code>, if they apply to the parser |
|
712 created by <code><a href= "#XML_ExternalEntityParserCreate" |
|
713 >XML_ExternalEntityParserCreate</a></code>.</p> |
|
714 |
|
715 <pre class="fcndec" id="XML_Parse"> |
|
716 XML_Status |
|
717 XML_Parse(XML_Parser p, |
|
718 const char *s, |
|
719 int len, |
|
720 int isFinal); |
|
721 </pre> |
|
722 <pre class="signature"> |
|
723 enum XML_Status { |
|
724 XML_STATUS_ERROR = 0, |
|
725 XML_STATUS_OK = 1 |
|
726 }; |
|
727 </pre> |
|
728 <div class="fcndef"> |
|
729 Parse some more of the document. The string <code>s</code> is a buffer |
|
730 containing part (or perhaps all) of the document. The number of bytes of s |
|
731 that are part of the document is indicated by <code>len</code>. This means |
|
732 that <code>s</code> doesn't have to be null terminated. It also means that |
|
733 if <code>len</code> is larger than the number of bytes in the block of |
|
734 memory that <code>s</code> points at, then a memory fault is likely. The |
|
735 <code>isFinal</code> parameter informs the parser that this is the last |
|
736 piece of the document. Frequently, the last piece is empty (i.e. |
|
737 <code>len</code> is zero.) |
|
738 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. |
|
739 Otherwise it returns <code>XML_STATUS_OK</code> value. |
|
740 </div> |
|
741 |
|
742 <pre class="fcndec" id="XML_ParseBuffer"> |
|
743 XML_Status |
|
744 XML_ParseBuffer(XML_Parser p, |
|
745 int len, |
|
746 int isFinal); |
|
747 </pre> |
|
748 <div class="fcndef"> |
|
749 This is just like <code><a href= "#XML_Parse" >XML_Parse</a></code>, |
|
750 except in this case Expat provides the buffer. By obtaining the |
|
751 buffer from Expat with the <code><a href= "#XML_GetBuffer" |
|
752 >XML_GetBuffer</a></code> function, the application can avoid double |
|
753 copying of the input. |
|
754 </div> |
|
755 |
|
756 <pre class="fcndec" id="XML_GetBuffer"> |
|
757 void * |
|
758 XML_GetBuffer(XML_Parser p, |
|
759 int len); |
|
760 </pre> |
|
761 <div class="fcndef"> |
|
762 Obtain a buffer of size <code>len</code> to read a piece of the document |
|
763 into. A NULL value is returned if Expat can't allocate enough memory for |
|
764 this buffer. This has to be called prior to every call to |
|
765 <code><a href= "#XML_ParseBuffer" >XML_ParseBuffer</a></code>. A |
|
766 typical use would look like this: |
|
767 |
|
768 <pre class="eg"> |
|
769 for (;;) { |
|
770 int bytes_read; |
|
771 void *buff = XML_GetBuffer(p, BUFF_SIZE); |
|
772 if (buff == NULL) { |
|
773 /* handle error */ |
|
774 } |
|
775 |
|
776 bytes_read = read(docfd, buff, BUFF_SIZE); |
|
777 if (bytes_read < 0) { |
|
778 /* handle error */ |
|
779 } |
|
780 |
|
781 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { |
|
782 /* handle parse error */ |
|
783 } |
|
784 |
|
785 if (bytes_read == 0) |
|
786 break; |
|
787 } |
|
788 </pre> |
|
789 </div> |
|
790 |
|
791 <h3><a name="setting">Handler Setting</a></h3> |
|
792 |
|
793 <p>Although handlers are typically set prior to parsing and left alone, an |
|
794 application may choose to set or change the handler for a parsing event |
|
795 while the parse is in progress. For instance, your application may choose |
|
796 to ignore all text not descended from a <code>para</code> element. One |
|
797 way it could do this is to set the character handler when a para start tag |
|
798 is seen, and unset it for the corresponding end tag.</p> |
|
799 |
|
800 <p>A handler may be <em>unset</em> by providing a NULL pointer to the |
|
801 appropriate handler setter. None of the handler setting functions have |
|
802 a return value.</p> |
|
803 |
|
804 <p>Your handlers will be receiving strings in arrays of type |
|
805 <code>XML_Char</code>. This type is defined in expat.h as <code>char |
|
806 *</code> and contains bytes encoding UTF-8. Note that you'll receive |
|
807 them in this form independent of the original encoding of the |
|
808 document.</p> |
|
809 |
|
810 <div class="handler"> |
|
811 <pre class="setter" id="XML_SetStartElementHandler"> |
|
812 XML_SetStartElementHandler(XML_Parser p, |
|
813 XML_StartElementHandler start); |
|
814 </pre> |
|
815 <pre class="signature"> |
|
816 typedef void |
|
817 (*XML_StartElementHandler)(void *userData, |
|
818 const XML_Char *name, |
|
819 const XML_Char **atts); |
|
820 </pre> |
|
821 <p>Set handler for start (and empty) tags. Attributes are passed to the start |
|
822 handler as a pointer to a vector of char pointers. Each attribute seen in |
|
823 a start (or empty) tag occupies 2 consecutive places in this vector: the |
|
824 attribute name followed by the attribute value. These pairs are terminated |
|
825 by a null pointer.</p> |
|
826 <p>Note that an empty tag generates a call to both start and end handlers |
|
827 (in that order).</p> |
|
828 </div> |
|
829 |
|
830 <div class="handler"> |
|
831 <pre class="setter" id="XML_SetEndElementHandler"> |
|
832 XML_SetEndElementHandler(XML_Parser p, |
|
833 XML_EndElementHandler); |
|
834 </pre> |
|
835 <pre class="signature"> |
|
836 typedef void |
|
837 (*XML_EndElementHandler)(void *userData, |
|
838 const XML_Char *name); |
|
839 </pre> |
|
840 <p>Set handler for end (and empty) tags. As noted above, an empty tag |
|
841 generates a call to both start and end handlers.</p> |
|
842 </div> |
|
843 |
|
844 <div class="handler"> |
|
845 <pre class="setter" id="XML_SetElementHandler"> |
|
846 XML_SetElementHandler(XML_Parser p, |
|
847 XML_StartElementHandler start, |
|
848 XML_EndElementHandler end); |
|
849 </pre> |
|
850 <p>Set handlers for start and end tags with one call.</p> |
|
851 </div> |
|
852 |
|
853 <div class="handler"> |
|
854 <pre class="setter" id="XML_SetCharacterDataHandler"> |
|
855 XML_SetCharacterDataHandler(XML_Parser p, |
|
856 XML_CharacterDataHandler charhndl) |
|
857 </pre> |
|
858 <pre class="signature"> |
|
859 typedef void |
|
860 (*XML_CharacterDataHandler)(void *userData, |
|
861 const XML_Char *s, |
|
862 int len); |
|
863 </pre> |
|
864 <p>Set a text handler. The string your handler receives |
|
865 is <em>NOT nul-terminated</em>. You have to use the length argument |
|
866 to deal with the end of the string. A single block of contiguous text |
|
867 free of markup may still result in a sequence of calls to this handler. |
|
868 In other words, if you're searching for a pattern in the text, it may |
|
869 be split across calls to this handler.</p> |
|
870 </div> |
|
871 |
|
872 <div class="handler"> |
|
873 <pre class="setter" id="XML_SetProcessingInstructionHandler"> |
|
874 XML_SetProcessingInstructionHandler(XML_Parser p, |
|
875 XML_ProcessingInstructionHandler proc) |
|
876 </pre> |
|
877 <pre class="signature"> |
|
878 typedef void |
|
879 (*XML_ProcessingInstructionHandler)(void *userData, |
|
880 const XML_Char *target, |
|
881 const XML_Char *data); |
|
882 |
|
883 </pre> |
|
884 <p>Set a handler for processing instructions. The target is the first word |
|
885 in the processing instruction. The data is the rest of the characters in |
|
886 it after skipping all whitespace after the initial word.</p> |
|
887 </div> |
|
888 |
|
889 <div class="handler"> |
|
890 <pre class="setter" id="XML_SetCommentHandler"> |
|
891 XML_SetCommentHandler(XML_Parser p, |
|
892 XML_CommentHandler cmnt) |
|
893 </pre> |
|
894 <pre class="signature"> |
|
895 typedef void |
|
896 (*XML_CommentHandler)(void *userData, |
|
897 const XML_Char *data); |
|
898 </pre> |
|
899 <p>Set a handler for comments. The data is all text inside the comment |
|
900 delimiters.</p> |
|
901 </div> |
|
902 |
|
903 <div class="handler"> |
|
904 <pre class="setter" id="XML_SetStartCdataSectionHandler"> |
|
905 XML_SetStartCdataSectionHandler(XML_Parser p, |
|
906 XML_StartCdataSectionHandler start); |
|
907 </pre> |
|
908 <pre class="signature"> |
|
909 typedef void |
|
910 (*XML_StartCdataSectionHandler)(void *userData); |
|
911 </pre> |
|
912 <p>Set a handler that gets called at the beginning of a CDATA section.</p> |
|
913 </div> |
|
914 |
|
915 <div class="handler"> |
|
916 <pre class="setter" id="XML_SetEndCdataSectionHandler"> |
|
917 XML_SetEndCdataSectionHandler(XML_Parser p, |
|
918 XML_EndCdataSectionHandler end); |
|
919 </pre> |
|
920 <pre class="signature"> |
|
921 typedef void |
|
922 (*XML_EndCdataSectionHandler)(void *userData); |
|
923 </pre> |
|
924 <p>Set a handler that gets called at the end of a CDATA section.</p> |
|
925 </div> |
|
926 |
|
927 <div class="handler"> |
|
928 <pre class="setter" id="XML_SetCdataSectionHandler"> |
|
929 XML_SetCdataSectionHandler(XML_Parser p, |
|
930 XML_StartCdataSectionHandler start, |
|
931 XML_EndCdataSectionHandler end) |
|
932 </pre> |
|
933 <p>Sets both CDATA section handlers with one call.</p> |
|
934 </div> |
|
935 |
|
936 <div class="handler"> |
|
937 <pre class="setter" id="XML_SetDefaultHandler"> |
|
938 XML_SetDefaultHandler(XML_Parser p, |
|
939 XML_DefaultHandler hndl) |
|
940 </pre> |
|
941 <pre class="signature"> |
|
942 typedef void |
|
943 (*XML_DefaultHandler)(void *userData, |
|
944 const XML_Char *s, |
|
945 int len); |
|
946 </pre> |
|
947 |
|
948 <p>Sets a handler for any characters in the document which wouldn't |
|
949 otherwise be handled. This includes both data for which no handlers |
|
950 can be set (like some kinds of DTD declarations) and data which could |
|
951 be reported but which currently has no handler set. The characters |
|
952 are passed exactly as they were present in the XML document except |
|
953 that they will be encoded in UTF-8 or UTF-16. Line boundaries are not |
|
954 normalized. Note that a byte order mark character is not passed to the |
|
955 default handler. There are no guarantees about how characters are |
|
956 divided between calls to the default handler: for example, a comment |
|
957 might be split between multiple calls. Setting the handler with |
|
958 this call has the side effect of turning off expansion of references |
|
959 to internally defined general entities. Instead these references are |
|
960 passed to the default handler.</p> |
|
961 |
|
962 <p>See also <code><a |
|
963 href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> |
|
964 </div> |
|
965 |
|
966 <div class="handler"> |
|
967 <pre class="setter" id="XML_SetDefaultHandlerExpand"> |
|
968 XML_SetDefaultHandlerExpand(XML_Parser p, |
|
969 XML_DefaultHandler hndl) |
|
970 </pre> |
|
971 <pre class="signature"> |
|
972 typedef void |
|
973 (*XML_DefaultHandler)(void *userData, |
|
974 const XML_Char *s, |
|
975 int len); |
|
976 </pre> |
|
977 <p>This sets a default handler, but doesn't inhibit the expansion of |
|
978 internal entity references. The entity reference will not be passed |
|
979 to the default handler.</p> |
|
980 |
|
981 <p>See also <code><a |
|
982 href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> |
|
983 </div> |
|
984 |
|
985 <div class="handler"> |
|
986 <pre class="setter" id="XML_SetExternalEntityRefHandler"> |
|
987 XML_SetExternalEntityRefHandler(XML_Parser p, |
|
988 XML_ExternalEntityRefHandler hndl) |
|
989 </pre> |
|
990 <pre class="signature"> |
|
991 typedef int |
|
992 (*XML_ExternalEntityRefHandler)(XML_Parser p, |
|
993 const XML_Char *context, |
|
994 const XML_Char *base, |
|
995 const XML_Char *systemId, |
|
996 const XML_Char *publicId); |
|
997 </pre> |
|
998 <p>Set an external entity reference handler. This handler is also |
|
999 called for processing an external DTD subset if parameter entity parsing |
|
1000 is in effect. (See <a href="#XML_SetParamEntityParsing"> |
|
1001 <code>XML_SetParamEntityParsing</code></a>.)</p> |
|
1002 |
|
1003 |
|
1004 <p>The base parameter is the base to use for relative system identifiers. |
|
1005 It is set by <a href="#XML_SetBase">XML_SetBase</a> and may be null. The |
|
1006 public id parameter is the public id given in the entity declaration and |
|
1007 may be null. The system id is the system identifier specified in the entity |
|
1008 declaration and is never null.</p> |
|
1009 |
|
1010 <p>There are a couple of ways in which this handler differs from others. |
|
1011 First, this handler returns an integer. A non-zero value should be returned |
|
1012 for successful handling of the external entity reference. Returning a zero |
|
1013 indicates failure, and causes the calling parser to return |
|
1014 an <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error.</p> |
|
1015 |
|
1016 <p>Second, instead of having userData as its first argument, it receives the |
|
1017 parser that encountered the entity reference. This, along with the context |
|
1018 parameter, may be used as arguments to a call to |
|
1019 <a href="#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a>. |
|
1020 Using the returned parser, the body of the external entity can be recursively |
|
1021 parsed.</p> |
|
1022 |
|
1023 <p>Since this handler may be called recursively, it should not be saving |
|
1024 information into global or static variables.</p> |
|
1025 </div> |
|
1026 |
|
1027 <div class="handler"> |
|
1028 <pre class="setter" id="XML_SetSkippedEntityHandler"> |
|
1029 XML_SetSkippedEntityHandler(XML_Parser p, |
|
1030 XML_SkippedEntityHandler handler) |
|
1031 </pre> |
|
1032 <pre class="signature"> |
|
1033 typedef void |
|
1034 (*XML_SkippedEntityHandler)(void *userData, |
|
1035 const XML_Char *entityName, |
|
1036 int is_parameter_entity); |
|
1037 </pre> |
|
1038 <p>Set a skipped entity handler. This is called in two situations:</p> |
|
1039 <ol> |
|
1040 <li>An entity reference is encountered for which no declaration |
|
1041 has been read <em>and</em> this is not an error.</li> |
|
1042 <li>An internal entity reference is read, but not expanded, because |
|
1043 <a href="#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> |
|
1044 has been called.</li> |
|
1045 </ol> |
|
1046 <p>The <code>is_parameter_entity</code> argument will be non-zero for |
|
1047 a parameter entity and zero for a general entity.</p> <p>Note: skipped |
|
1048 parameter entities in declarations and skipped general entities in |
|
1049 attribute values cannot be reported, because the event would be out of |
|
1050 sync with the reporting of the declarations or attribute values</p> |
|
1051 </div> |
|
1052 |
|
1053 <div class="handler"> |
|
1054 <pre class="setter" id="XML_SetUnknownEncodingHandler"> |
|
1055 XML_SetUnknownEncodingHandler(XML_Parser p, |
|
1056 XML_UnknownEncodingHandler enchandler, |
|
1057 void *encodingHandlerData) |
|
1058 </pre> |
|
1059 <pre class="signature"> |
|
1060 typedef int |
|
1061 (*XML_UnknownEncodingHandler)(void *encodingHandlerData, |
|
1062 const XML_Char *name, |
|
1063 XML_Encoding *info); |
|
1064 |
|
1065 typedef struct { |
|
1066 int map[256]; |
|
1067 void *data; |
|
1068 int (*convert)(void *data, const char *s); |
|
1069 void (*release)(void *data); |
|
1070 } XML_Encoding; |
|
1071 </pre> |
|
1072 <p>Set a handler to deal with encodings other than the |
|
1073 <a href="#builtin_encodings">built in set</a>. This should be done before |
|
1074 <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a href= |
|
1075 "#XML_ParseBuffer" >XML_ParseBuffer</a></code> have been called on the |
|
1076 given parser.</p> |
|
1077 <p>If the handler knows how to deal with an encoding with the given |
|
1078 name, it should fill in the <code>info</code> data structure and return |
|
1079 1. Otherwise it should return 0. The handler will be called at most |
|
1080 once per parsed (external) entity. The optional application data |
|
1081 pointer <code>encodingHandlerData</code> will be passed back to the |
|
1082 handler.</p> |
|
1083 |
|
1084 <p>The map array contains information for every possible possible leading |
|
1085 byte in a byte sequence. If the corresponding value is >= 0, then it's |
|
1086 a single byte sequence and the byte encodes that Unicode value. If the |
|
1087 value is -1, then that byte is invalid as the initial byte in a sequence. |
|
1088 If the value is -n, where n is an integer > 1, then n is the number of |
|
1089 bytes in the sequence and the actual conversion is accomplished by a |
|
1090 call to the function pointed at by convert. This function may return -1 |
|
1091 if the sequence itself is invalid. The convert pointer may be null if |
|
1092 there are only single byte codes. The data parameter passed to the convert |
|
1093 function is the data pointer from <code>XML_Encoding</code>. The |
|
1094 string s is <em>NOT</em> nul-terminated and points at the sequence of |
|
1095 bytes to be converted.</p> |
|
1096 |
|
1097 <p>The function pointed at by <code>release</code> is called by the |
|
1098 parser when it is finished with the encoding. It may be NULL.</p> |
|
1099 </div> |
|
1100 |
|
1101 <div class="handler"> |
|
1102 <pre class="setter" id="XML_SetStartNamespaceDeclHandler"> |
|
1103 XML_SetStartNamespaceDeclHandler(XML_Parser p, |
|
1104 XML_StartNamespaceDeclHandler start); |
|
1105 </pre> |
|
1106 <pre class="signature"> |
|
1107 typedef void |
|
1108 (*XML_StartNamespaceDeclHandler)(void *userData, |
|
1109 const XML_Char *prefix, |
|
1110 const XML_Char *uri); |
|
1111 </pre> |
|
1112 <p>Set a handler to be called when a namespace is declared. Namespace |
|
1113 declarations occur inside start tags. But the namespace declaration start |
|
1114 handler is called before the start tag handler for each namespace declared |
|
1115 in that start tag.</p> |
|
1116 </div> |
|
1117 |
|
1118 <div class="handler"> |
|
1119 <pre class="setter" id="XML_SetEndNamespaceDeclHandler"> |
|
1120 XML_SetEndNamespaceDeclHandler(XML_Parser p, |
|
1121 XML_EndNamespaceDeclHandler end); |
|
1122 </pre> |
|
1123 <pre class="signature"> |
|
1124 typedef void |
|
1125 (*XML_EndNamespaceDeclHandler)(void *userData, |
|
1126 const XML_Char *prefix); |
|
1127 </pre> |
|
1128 <p>Set a handler to be called when leaving the scope of a namespace |
|
1129 declaration. This will be called, for each namespace declaration, |
|
1130 after the handler for the end tag of the element in which the |
|
1131 namespace was declared.</p> |
|
1132 </div> |
|
1133 |
|
1134 <div class="handler"> |
|
1135 <pre class="setter" id="XML_SetNamespaceDeclHandler"> |
|
1136 XML_SetNamespaceDeclHandler(XML_Parser p, |
|
1137 XML_StartNamespaceDeclHandler start, |
|
1138 XML_EndNamespaceDeclHandler end) |
|
1139 </pre> |
|
1140 <p>Sets both namespace declaration handlers with a single call</p> |
|
1141 </div> |
|
1142 |
|
1143 <div class="handler"> |
|
1144 <pre class="setter" id="XML_SetXmlDeclHandler"> |
|
1145 XML_SetXmlDeclHandler(XML_Parser p, |
|
1146 XML_XmlDeclHandler xmldecl); |
|
1147 </pre> |
|
1148 <pre class="signature"> |
|
1149 typedef void |
|
1150 (*XML_XmlDeclHandler) (void *userData, |
|
1151 const XML_Char *version, |
|
1152 const XML_Char *encoding, |
|
1153 int standalone); |
|
1154 </pre> |
|
1155 <p>Sets a handler that is called for XML declarations and also for |
|
1156 text declarations discovered in external entities. The way to |
|
1157 distinguish is that the <code>version</code> parameter will be NULL |
|
1158 for text declarations. The <code>encoding</code> parameter may be NULL |
|
1159 for an XML declaration. The <code>standalone</code> argument will |
|
1160 contain -1, 0, or 1 indicating respectively that there was no |
|
1161 standalone parameter in the declaration, that it was given as no, or |
|
1162 that it was given as yes.</p> |
|
1163 </div> |
|
1164 |
|
1165 <div class="handler"> |
|
1166 <pre class="setter" id="XML_SetStartDoctypeDeclHandler"> |
|
1167 XML_SetStartDoctypeDeclHandler(XML_Parser p, |
|
1168 XML_StartDoctypeDeclHandler start); |
|
1169 </pre> |
|
1170 <pre class="signature"> |
|
1171 typedef void |
|
1172 (*XML_StartDoctypeDeclHandler)(void *userData, |
|
1173 const XML_Char *doctypeName, |
|
1174 const XML_Char *sysid, |
|
1175 const XML_Char *pubid, |
|
1176 int has_internal_subset); |
|
1177 </pre> |
|
1178 <p>Set a handler that is called at the start of a DOCTYPE declaration, |
|
1179 before any external or internal subset is parsed. Both <code>sysid</code> |
|
1180 and <code>pubid</code> may be NULL. The <code>has_internal_subset</code> |
|
1181 will be non-zero if the DOCTYPE declaration has an internal subset.</p> |
|
1182 </div> |
|
1183 |
|
1184 <div class="handler"> |
|
1185 <pre class="setter" id="XML_SetEndDoctypeDeclHandler"> |
|
1186 XML_SetEndDoctypeDeclHandler(XML_Parser p, |
|
1187 XML_EndDoctypeDeclHandler end); |
|
1188 </pre> |
|
1189 <pre class="signature"> |
|
1190 typedef void |
|
1191 (*XML_EndDoctypeDeclHandler)(void *userData); |
|
1192 </pre> |
|
1193 <p>Set a handler that is called at the end of a DOCTYPE declaration, |
|
1194 after parsing any external subset.</p> |
|
1195 </div> |
|
1196 |
|
1197 <div class="handler"> |
|
1198 <pre class="setter" id="XML_SetDoctypeDeclHandler"> |
|
1199 XML_SetDoctypeDeclHandler(XML_Parser p, |
|
1200 XML_StartDoctypeDeclHandler start, |
|
1201 XML_EndDoctypeDeclHandler end); |
|
1202 </pre> |
|
1203 <p>Set both doctype handlers with one call.</p> |
|
1204 </div> |
|
1205 |
|
1206 <div class="handler"> |
|
1207 <pre class="setter" id="XML_SetElementDeclHandler"> |
|
1208 XML_SetElementDeclHandler(XML_Parser p, |
|
1209 XML_ElementDeclHandler eldecl); |
|
1210 </pre> |
|
1211 <pre class="signature"> |
|
1212 typedef void |
|
1213 (*XML_ElementDeclHandler)(void *userData, |
|
1214 const XML_Char *name, |
|
1215 XML_Content *model); |
|
1216 </pre> |
|
1217 <pre class="signature"> |
|
1218 enum XML_Content_Type { |
|
1219 XML_CTYPE_EMPTY = 1, |
|
1220 XML_CTYPE_ANY, |
|
1221 XML_CTYPE_MIXED, |
|
1222 XML_CTYPE_NAME, |
|
1223 XML_CTYPE_CHOICE, |
|
1224 XML_CTYPE_SEQ |
|
1225 }; |
|
1226 |
|
1227 enum XML_Content_Quant { |
|
1228 XML_CQUANT_NONE, |
|
1229 XML_CQUANT_OPT, |
|
1230 XML_CQUANT_REP, |
|
1231 XML_CQUANT_PLUS |
|
1232 }; |
|
1233 |
|
1234 typedef struct XML_cp XML_Content; |
|
1235 |
|
1236 struct XML_cp { |
|
1237 enum XML_Content_Type type; |
|
1238 enum XML_Content_Quant quant; |
|
1239 const XML_Char * name; |
|
1240 unsigned int numchildren; |
|
1241 XML_Content * children; |
|
1242 }; |
|
1243 </pre> |
|
1244 <p>Sets a handler for element declarations in a DTD. The handler gets |
|
1245 called with the name of the element in the declaration and a pointer |
|
1246 to a structure that contains the element model. It is the |
|
1247 application's responsibility to free this data structure.</p> |
|
1248 |
|
1249 <p>The <code>model</code> argument is the root of a tree of |
|
1250 <code>XML_Content</code> nodes. If <code>type</code> equals |
|
1251 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then |
|
1252 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other |
|
1253 fields will be zero or NULL. If <code>type</code> is |
|
1254 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be |
|
1255 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and |
|
1256 <code>numchildren</code> will contain the number of elements that are |
|
1257 allowed to be mixed in and <code>children</code> points to an array of |
|
1258 <code>XML_Content</code> structures that will all have type |
|
1259 XML_CTYPE_NAME with no quantification. Only the root node can be type |
|
1260 <code>XML_CTYPE_EMPTY</code>, <code>XML_CTYPE_ANY</code>, or |
|
1261 <code>XML_CTYPE_MIXED</code>.</p> |
|
1262 |
|
1263 <p>For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field |
|
1264 points to the name and the <code>numchildren</code> and |
|
1265 <code>children</code> fields will be zero and NULL. The |
|
1266 <code>quant</code> field will indicate any quantifiers placed on the |
|
1267 name.</p> |
|
1268 |
|
1269 <p>Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> |
|
1270 indicate a choice or sequence respectively. The |
|
1271 <code>numchildren</code> field indicates how many nodes in the choice |
|
1272 or sequence and <code>children</code> points to the nodes.</p> |
|
1273 </div> |
|
1274 |
|
1275 <div class="handler"> |
|
1276 <pre class="setter" id="XML_SetAttlistDeclHandler"> |
|
1277 XML_SetAttlistDeclHandler(XML_Parser p, |
|
1278 XML_AttlistDeclHandler attdecl); |
|
1279 </pre> |
|
1280 <pre class="signature"> |
|
1281 typedef void |
|
1282 (*XML_AttlistDeclHandler) (void *userData, |
|
1283 const XML_Char *elname, |
|
1284 const XML_Char *attname, |
|
1285 const XML_Char *att_type, |
|
1286 const XML_Char *dflt, |
|
1287 int isrequired); |
|
1288 </pre> |
|
1289 <p>Set a handler for attlist declarations in the DTD. This handler is |
|
1290 called for <em>each</em> attribute. So a single attlist declaration |
|
1291 with multiple attributes declared will generate multiple calls to this |
|
1292 handler. The <code>elname</code> parameter returns the name of the |
|
1293 element for which the attribute is being declared. The attribute name |
|
1294 is in the <code>attname</code> parameter. The attribute type is in the |
|
1295 <code>att_type</code> parameter. It is the string representing the |
|
1296 type in the declaration with whitespace removed.</p> |
|
1297 |
|
1298 <p>The <code>dflt</code> parameter holds the default value. It will be |
|
1299 NULL in the case of "#IMPLIED" or "#REQUIRED" attributes. You can |
|
1300 distinguish these two cases by checking the <code>isrequired</code> |
|
1301 parameter, which will be true in the case of "#REQUIRED" attributes. |
|
1302 Attributes which are "#FIXED" will have also have a true |
|
1303 <code>isrequired</code>, but they will have the non-NULL fixed value |
|
1304 in the <code>dflt</code> parameter.</p> |
|
1305 </div> |
|
1306 |
|
1307 <div class="handler"> |
|
1308 <pre class="setter" id="XML_SetEntityDeclHandler"> |
|
1309 XML_SetEntityDeclHandler(XML_Parser p, |
|
1310 XML_EntityDeclHandler handler); |
|
1311 </pre> |
|
1312 <pre class="signature"> |
|
1313 typedef void |
|
1314 (*XML_EntityDeclHandler) (void *userData, |
|
1315 const XML_Char *entityName, |
|
1316 int is_parameter_entity, |
|
1317 const XML_Char *value, |
|
1318 int value_length, |
|
1319 const XML_Char *base, |
|
1320 const XML_Char *systemId, |
|
1321 const XML_Char *publicId, |
|
1322 const XML_Char *notationName); |
|
1323 </pre> |
|
1324 <p>Sets a handler that will be called for all entity declarations. |
|
1325 The <code>is_parameter_entity</code> argument will be non-zero in the |
|
1326 case of parameter entities and zero otherwise.</p> |
|
1327 |
|
1328 <p>For internal entities (<code><!ENTITY foo "bar"></code>), |
|
1329 <code>value</code> will be non-NULL and <code>systemId</code>, |
|
1330 <code>publicId</code>, and <code>notationName</code> will all be NULL. |
|
1331 The value string is <em>not</em> NULL terminated; the length is |
|
1332 provided in the <code>value_length</code> parameter. Do not use |
|
1333 <code>value_length</code> to test for internal entities, since it is |
|
1334 legal to have zero-length values. Instead check for whether or not |
|
1335 <code>value</code> is NULL.</p> <p>The <code>notationName</code> |
|
1336 argument will have a non-NULL value only for unparsed entity |
|
1337 declarations.</p> |
|
1338 </div> |
|
1339 |
|
1340 <div class="handler"> |
|
1341 <pre class="setter" id="XML_SetUnparsedEntityDeclHandler"> |
|
1342 XML_SetUnparsedEntityDeclHandler(XML_Parser p, |
|
1343 XML_UnparsedEntityDeclHandler h) |
|
1344 </pre> |
|
1345 <pre class="signature"> |
|
1346 typedef void |
|
1347 (*XML_UnparsedEntityDeclHandler)(void *userData, |
|
1348 const XML_Char *entityName, |
|
1349 const XML_Char *base, |
|
1350 const XML_Char *systemId, |
|
1351 const XML_Char *publicId, |
|
1352 const XML_Char *notationName); |
|
1353 </pre> |
|
1354 <p>Set a handler that receives declarations of unparsed entities. These |
|
1355 are entity declarations that have a notation (NDATA) field:</p> |
|
1356 |
|
1357 <div id="eg"><pre> |
|
1358 <!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> |
|
1359 </pre></div> |
|
1360 <p>This handler is obsolete and is provided for backwards |
|
1361 compatibility. Use instead <a href= "#XML_SetEntityDeclHandler" |
|
1362 >XML_SetEntityDeclHandler</a>.</p> |
|
1363 </div> |
|
1364 |
|
1365 <div class="handler"> |
|
1366 <pre class="setter" id="XML_SetNotationDeclHandler"> |
|
1367 XML_SetNotationDeclHandler(XML_Parser p, |
|
1368 XML_NotationDeclHandler h) |
|
1369 </pre> |
|
1370 <pre class="signature"> |
|
1371 typedef void |
|
1372 (*XML_NotationDeclHandler)(void *userData, |
|
1373 const XML_Char *notationName, |
|
1374 const XML_Char *base, |
|
1375 const XML_Char *systemId, |
|
1376 const XML_Char *publicId); |
|
1377 </pre> |
|
1378 <p>Set a handler that receives notation declarations.</p> |
|
1379 </div> |
|
1380 |
|
1381 <div class="handler"> |
|
1382 <pre class="setter" id="XML_SetNotStandaloneHandler"> |
|
1383 XML_SetNotStandaloneHandler(XML_Parser p, |
|
1384 XML_NotStandaloneHandler h) |
|
1385 </pre> |
|
1386 <pre class="signature"> |
|
1387 typedef int |
|
1388 (*XML_NotStandaloneHandler)(void *userData); |
|
1389 </pre> |
|
1390 <p>Set a handler that is called if the document is not "standalone". |
|
1391 This happens when there is an external subset or a reference to a |
|
1392 parameter entity, but does not have standalone set to "yes" in an XML |
|
1393 declaration. If this handler returns 0, then the parser will throw an |
|
1394 <code>XML_ERROR_NOT_STANDALONE</code> error.</p> |
|
1395 </div> |
|
1396 |
|
1397 <h3><a name="position">Parse position and error reporting functions</a></h3> |
|
1398 |
|
1399 <p>These are the functions you'll want to call when the parse |
|
1400 functions return 0 (i.e. a parse error has ocurred), although the |
|
1401 position reporting functions are useful outside of errors. The |
|
1402 position reported is the byte position (in the original document or |
|
1403 entity encoding) of the first of the sequence of characters that |
|
1404 generated the current event (or the error that caused the parse |
|
1405 functions to return 0.)</p> |
|
1406 |
|
1407 <p>The position reporting functions are accurate only outside of the |
|
1408 DTD. In other words, they usually return bogus information when |
|
1409 called from within a DTD declaration handler.</p> |
|
1410 |
|
1411 <pre class="fcndec" id="XML_GetErrorCode"> |
|
1412 enum XML_Error |
|
1413 XML_GetErrorCode(XML_Parser p); |
|
1414 </pre> |
|
1415 <div class="fcndef"> |
|
1416 Return what type of error has occurred. |
|
1417 </div> |
|
1418 |
|
1419 <pre class="fcndec" id="XML_ErrorString"> |
|
1420 const XML_LChar * |
|
1421 XML_ErrorString(int code); |
|
1422 </pre> |
|
1423 <div class="fcndef"> |
|
1424 Return a string describing the error corresponding to code. |
|
1425 The code should be one of the enums that can be returned from |
|
1426 <code><a href= "#XML_GetErrorCode" >XML_GetErrorCode</a></code>. |
|
1427 </div> |
|
1428 |
|
1429 <pre class="fcndec" id="XML_GetCurrentByteIndex"> |
|
1430 long |
|
1431 XML_GetCurrentByteIndex(XML_Parser p); |
|
1432 </pre> |
|
1433 <div class="fcndef"> |
|
1434 Return the byte offset of the position. |
|
1435 </div> |
|
1436 |
|
1437 <pre class="fcndec" id="XML_GetCurrentLineNumber"> |
|
1438 int |
|
1439 XML_GetCurrentLineNumber(XML_Parser p); |
|
1440 </pre> |
|
1441 <div class="fcndef"> |
|
1442 Return the line number of the position. |
|
1443 </div> |
|
1444 |
|
1445 <pre class="fcndec" id="XML_GetCurrentColumnNumber"> |
|
1446 int |
|
1447 XML_GetCurrentColumnNumber(XML_Parser p); |
|
1448 </pre> |
|
1449 <div class="fcndef"> |
|
1450 Return the offset, from the beginning of the current line, of |
|
1451 the position. |
|
1452 </div> |
|
1453 |
|
1454 <pre class="fcndec" id="XML_GetCurrentByteCount"> |
|
1455 int |
|
1456 XML_GetCurrentByteCount(XML_Parser p); |
|
1457 </pre> |
|
1458 <div class="fcndef"> |
|
1459 Return the number of bytes in the current event. Returns |
|
1460 <code>0</code> if the event is inside a reference to an internal |
|
1461 entity and for the end-tag event for empty element tags (the later can |
|
1462 be used to distinguish empty-element tags from empty elements using |
|
1463 separate start and end tags). |
|
1464 </div> |
|
1465 |
|
1466 <pre class="fcndec" id="XML_GetInputContext"> |
|
1467 const char * |
|
1468 XML_GetInputContext(XML_Parser p, |
|
1469 int *offset, |
|
1470 int *size); |
|
1471 </pre> |
|
1472 <div class="fcndef"> |
|
1473 |
|
1474 <p>Returns the parser's input buffer, sets the integer pointed at by |
|
1475 <code>offset</code> to the offset within this buffer of the current |
|
1476 parse position, and set the integer pointed at by <code>size</code> to |
|
1477 the size of the returned buffer.</p> |
|
1478 |
|
1479 <p>This should only be called from within a handler during an active |
|
1480 parse and the returned buffer should only be referred to from within |
|
1481 the handler that made the call. This input buffer contains the |
|
1482 untranslated bytes of the input.</p> |
|
1483 |
|
1484 <p>Only a limited amount of context is kept, so if the event |
|
1485 triggering a call spans over a very large amount of input, the actual |
|
1486 parse position may be before the beginning of the buffer.</p> |
|
1487 </div> |
|
1488 |
|
1489 <h3><a name="miscellaneous">Miscellaneous functions</a></h3> |
|
1490 |
|
1491 <p>The functions in this section either obtain state information from |
|
1492 the parser or can be used to dynamicly set parser options.</p> |
|
1493 |
|
1494 <pre class="fcndec" id="XML_SetUserData"> |
|
1495 void |
|
1496 XML_SetUserData(XML_Parser p, |
|
1497 void *userData); |
|
1498 </pre> |
|
1499 <div class="fcndef"> |
|
1500 This sets the user data pointer that gets passed to handlers. It |
|
1501 overwrites any previous value for this pointer. Note that the |
|
1502 application is responsible for freeing the memory associated with |
|
1503 <code>userData</code> when it is finished with the parser. So if you |
|
1504 call this when there's already a pointer there, and you haven't freed |
|
1505 the memory associated with it, then you've probably just leaked |
|
1506 memory. |
|
1507 </div> |
|
1508 |
|
1509 <pre class="fcndec" id="XML_GetUserData"> |
|
1510 void * |
|
1511 XML_GetUserData(XML_Parser p); |
|
1512 </pre> |
|
1513 <div class="fcndef"> |
|
1514 This returns the user data pointer that gets passed to handlers. |
|
1515 It is actually implemented as a macro. |
|
1516 </div> |
|
1517 |
|
1518 <pre class="fcndec" id="XML_UseParserAsHandlerArg"> |
|
1519 void |
|
1520 XML_UseParserAsHandlerArg(XML_Parser p); |
|
1521 </pre> |
|
1522 <div class="fcndef"> |
|
1523 After this is called, handlers receive the parser in the userData |
|
1524 argument. The userData information can still be obtained using the |
|
1525 <code><a href= "#XML_GetUserData" >XML_GetUserData</a></code> |
|
1526 function. |
|
1527 </div> |
|
1528 |
|
1529 <pre class="fcndec" id="XML_SetBase"> |
|
1530 int |
|
1531 XML_SetBase(XML_Parser p, |
|
1532 const XML_Char *base); |
|
1533 </pre> |
|
1534 <div class="fcndef"> |
|
1535 Set the base to be used for resolving relative URIs in system |
|
1536 identifiers. The return value is 0 if there's no memory to store |
|
1537 base, otherwise it's non-zero. |
|
1538 </div> |
|
1539 |
|
1540 <pre class="fcndec" id="XML_GetBase"> |
|
1541 const XML_Char * |
|
1542 XML_GetBase(XML_Parser p); |
|
1543 </pre> |
|
1544 <div class="fcndef"> |
|
1545 Return the base for resolving relative URIs. |
|
1546 </div> |
|
1547 |
|
1548 <pre class="fcndec" id="XML_GetSpecifiedAttributeCount"> |
|
1549 int |
|
1550 XML_GetSpecifiedAttributeCount(XML_Parser p); |
|
1551 </pre> |
|
1552 <div class="fcndef"> |
|
1553 When attributes are reported to the start handler in the atts vector, |
|
1554 attributes that were explicitly set in the element occur before any |
|
1555 attributes that receive their value from default information in an |
|
1556 ATTLIST declaration. This function returns the number of attributes |
|
1557 that were explicitly set times two, thus giving the offset in the |
|
1558 <code>atts</code> array passed to the start tag handler of the first |
|
1559 attribute set due to defaults. It supplies information for the last |
|
1560 call to a start handler. If called inside a start handler, then that |
|
1561 means the current call. |
|
1562 </div> |
|
1563 |
|
1564 <pre class="fcndec" id="XML_GetIdAttributeIndex"> |
|
1565 int |
|
1566 XML_GetIdAttributeIndex(XML_Parser p); |
|
1567 </pre> |
|
1568 <div class="fcndef"> |
|
1569 Returns the index of the ID attribute passed in the atts array in the |
|
1570 last call to <code><a href= "#XML_StartElementHandler" |
|
1571 >XML_StartElementHandler</a></code>, or -1 if there is no ID |
|
1572 attribute. If called inside a start handler, then that means the |
|
1573 current call. |
|
1574 </div> |
|
1575 |
|
1576 <pre class="fcndec" id="XML_SetEncoding"> |
|
1577 int |
|
1578 XML_SetEncoding(XML_Parser p, |
|
1579 const XML_Char *encoding); |
|
1580 </pre> |
|
1581 <div class="fcndef"> |
|
1582 Set the encoding to be used by the parser. It is equivalent to |
|
1583 passing a non-null encoding argument to the parser creation functions. |
|
1584 It must not be called after <code><a href= "#XML_Parse" |
|
1585 >XML_Parse</a></code> or <code><a href= "#XML_ParseBuffer" |
|
1586 >XML_ParseBuffer</a></code> have been called on the given parser. |
|
1587 </div> |
|
1588 |
|
1589 <pre class="fcndec" id="XML_SetParamEntityParsing"> |
|
1590 int |
|
1591 XML_SetParamEntityParsing(XML_Parser p, |
|
1592 enum XML_ParamEntityParsing code); |
|
1593 </pre> |
|
1594 <div class="fcndef"> |
|
1595 This enables parsing of parameter entities, including the external |
|
1596 parameter entity that is the external DTD subset, according to |
|
1597 <code>code</code>. |
|
1598 The choices for <code>code</code> are: |
|
1599 <ul> |
|
1600 <li><code>XML_PARAM_ENTITY_PARSING_NEVER</code></li> |
|
1601 <li><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></li> |
|
1602 <li><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></li> |
|
1603 </ul> |
|
1604 </div> |
|
1605 |
|
1606 <pre class="fcndec" id="XML_UseForeignDTD"> |
|
1607 enum XML_Error |
|
1608 XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); |
|
1609 </pre> |
|
1610 <div class="fcndef"> |
|
1611 <p>This function allows an application to provide an external subset |
|
1612 for the document type declaration for documents which do not specify |
|
1613 an external subset of their own. For documents which specify an |
|
1614 external subset in their DOCTYPE declaration, the application-provided |
|
1615 subset will be ignored. If the document does not contain a DOCTYPE |
|
1616 declaration at all and <code>useDTD</code> is true, the |
|
1617 application-provided subset will be parsed, but the |
|
1618 <code>startDoctypeDeclHandler</code> and |
|
1619 <code>endDoctypeDeclHandler</code> functions, if set, will not be |
|
1620 called. The setting of parameter entity parsing, controlled using |
|
1621 <code><a href= "#XML_SetParamEntityParsing" |
|
1622 >XML_SetParamEntityParsing</a></code>, will be honored.</p> |
|
1623 |
|
1624 <p>The application-provided external subset is read by calling the |
|
1625 external entity reference handler set via <code><a href= |
|
1626 "#XML_SetExternalEntityRefHandler" |
|
1627 >XML_SetExternalEntityRefHandler</a></code> with both |
|
1628 <code>publicId</code> and <code>systemId</code> set to NULL.</p> |
|
1629 |
|
1630 <p>If this function is called after parsing has begun, it returns |
|
1631 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores |
|
1632 <code>useDTD</code>. If called when Expat has been compiled without |
|
1633 DTD support, it returns |
|
1634 <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, it |
|
1635 returns <code>XML_ERROR_NONE</code>.</p> |
|
1636 </div> |
|
1637 |
|
1638 <pre class="fcndec" id="XML_SetReturnNSTriplet"> |
|
1639 void |
|
1640 XML_SetReturnNSTriplet(XML_Parser parser, |
|
1641 int do_nst); |
|
1642 </pre> |
|
1643 <div class="fcndef"> |
|
1644 <p> |
|
1645 This function only has an effect when using a parser created with |
|
1646 <code><a href= "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, |
|
1647 i.e. when namespace processing is in effect. The <code>do_nst</code> |
|
1648 sets whether or not prefixes are returned with names qualified with a |
|
1649 namespace prefix. If this function is called with <code>do_nst</code> |
|
1650 non-zero, then afterwards namespace qualified names (that is qualified |
|
1651 with a prefix as opposed to belonging to a default namespace) are |
|
1652 returned as a triplet with the three parts separated by the namespace |
|
1653 separator specified when the parser was created. The order of |
|
1654 returned parts is URI, local name, and prefix.</p> <p>If |
|
1655 <code>do_nst</code> is zero, then namespaces are reported in the |
|
1656 default manner, URI then local_name separated by the namespace |
|
1657 separator.</p> |
|
1658 </div> |
|
1659 |
|
1660 <pre class="fcndec" id="XML_DefaultCurrent"> |
|
1661 void |
|
1662 XML_DefaultCurrent(XML_Parser parser); |
|
1663 </pre> |
|
1664 <div class="fcndef"> |
|
1665 This can be called within a handler for a start element, end element, |
|
1666 processing instruction or character data. It causes the corresponding |
|
1667 markup to be passed to the default handler set by <code><a |
|
1668 href="#XML_SetDefaultHandler" >XML_SetDefaultHandler</a></code> or |
|
1669 <code><a href="#XML_SetDefaultHandlerExpand" |
|
1670 >XML_SetDefaultHandlerExpand</a></code>. It does nothing if there is |
|
1671 not a default handler. |
|
1672 </div> |
|
1673 |
|
1674 <pre class="fcndec" id="XML_ExpatVersion"> |
|
1675 XML_LChar * |
|
1676 XML_ExpatVersion(); |
|
1677 </pre> |
|
1678 <div class="fcndef"> |
|
1679 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). |
|
1680 </div> |
|
1681 |
|
1682 <pre class="fcndec" id="XML_ExpatVersionInfo"> |
|
1683 struct XML_Expat_Version |
|
1684 XML_ExpatVersionInfo(); |
|
1685 </pre> |
|
1686 <pre class="signature"> |
|
1687 typedef struct { |
|
1688 int major; |
|
1689 int minor; |
|
1690 int micro; |
|
1691 } XML_Expat_Version; |
|
1692 </pre> |
|
1693 <div class="fcndef"> |
|
1694 Return the library version information as a structure. |
|
1695 Some macros are also defined that support compile-time tests of the |
|
1696 library version: |
|
1697 <ul> |
|
1698 <li><code>XML_MAJOR_VERSION</code></li> |
|
1699 <li><code>XML_MINOR_VERSION</code></li> |
|
1700 <li><code>XML_MICRO_VERSION</code></li> |
|
1701 </ul> |
|
1702 Testing these constants is currently the best way to determine if |
|
1703 particular parts of the Expat API are available. |
|
1704 </div> |
|
1705 |
|
1706 <pre class="fcndec" id="XML_GetFeatureList"> |
|
1707 const XML_Feature * |
|
1708 XML_GetFeatureList(); |
|
1709 </pre> |
|
1710 <pre class="signature"> |
|
1711 enum XML_FeatureEnum { |
|
1712 XML_FEATURE_END = 0, |
|
1713 XML_FEATURE_UNICODE, |
|
1714 XML_FEATURE_UNICODE_WCHAR_T, |
|
1715 XML_FEATURE_DTD, |
|
1716 XML_FEATURE_CONTEXT_BYTES, |
|
1717 XML_FEATURE_MIN_SIZE, |
|
1718 XML_FEATURE_SIZEOF_XML_CHAR, |
|
1719 XML_FEATURE_SIZEOF_XML_LCHAR |
|
1720 }; |
|
1721 |
|
1722 typedef struct { |
|
1723 enum XML_FeatureEnum feature; |
|
1724 XML_LChar *name; |
|
1725 long int value; |
|
1726 } XML_Feature; |
|
1727 </pre> |
|
1728 <div class="fcndef"> |
|
1729 <p>Returns a list of "feature" records, providing details on how |
|
1730 Expat was configured at compile time. Most applications should not |
|
1731 need to worry about this, but this information is otherwise not |
|
1732 available from Expat. This function allows code that does need to |
|
1733 check these features to do so at runtime.</p> |
|
1734 |
|
1735 <p>The return value is an array of <code>XML_Feature</code>, |
|
1736 terminated by a record with a <code>feature</code> of |
|
1737 <code>XML_FEATURE_END</code> and <code>name</code> of NULL, |
|
1738 identifying the feature-test macros Expat was compiled with. Since an |
|
1739 application that requires this kind of information needs to determine |
|
1740 the type of character the <code>name</code> points to, records for the |
|
1741 <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and |
|
1742 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the |
|
1743 beginning of the list, followed by <code>XML_FEATURE_UNICODE</code> |
|
1744 and <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at |
|
1745 all.</p> |
|
1746 |
|
1747 <p>Some features have an associated value. If there isn't an |
|
1748 associated value, the <code>value</code> field is set to 0. At this |
|
1749 time, the following features have been defined to have values:</p> |
|
1750 |
|
1751 <dl> |
|
1752 <dt><code>XML_FEATURE_SIZEOF_XML_CHAR</code></dt> |
|
1753 <dd>The number of bytes occupied by one <code>XML_Char</code> |
|
1754 character.</dd> |
|
1755 <dt><code>XML_FEATURE_SIZEOF_XML_LCHAR</code></dt> |
|
1756 <dd>The number of bytes occupied by one <code>XML_LChar</code> |
|
1757 character.</dd> |
|
1758 <dt><code>XML_FEATURE_CONTEXT_BYTES</code></dt> |
|
1759 <dd>The maximum number of characters of context which can be |
|
1760 reported by <code><a href= "#XML_GetInputContext" |
|
1761 >XML_GetInputContext</a></code>.</dd> |
|
1762 </dl> |
|
1763 </div> |
|
1764 |
|
1765 <hr /> |
|
1766 <p><a href="http://validator.w3.org/check/referer"><img |
|
1767 src="valid-xhtml10.png" alt="Valid XHTML 1.0!" |
|
1768 height="31" width="88" class="noborder" /></a></p> |
|
1769 </body> |
|
1770 </html> |