|
1 /****************************************************************************** |
|
2 * |
|
3 * |
|
4 * |
|
5 * Copyright (C) 1997-2008 by Dimitri van Heesch. |
|
6 * |
|
7 * Permission to use, copy, modify, and distribute this software and its |
|
8 * documentation under the terms of the GNU General Public License is hereby |
|
9 * granted. No representations are made about the suitability of this software |
|
10 * for any purpose. It is provided "as is" without express or implied warranty. |
|
11 * See the GNU General Public License for more details. |
|
12 * |
|
13 * Documents produced by Doxygen are derivative works derived from the |
|
14 * input used in their production; they are not affected by this license. |
|
15 * |
|
16 */ |
|
17 /*! \page arch Doxygen's Internals |
|
18 |
|
19 <h3>Doxygen's internals</h3> |
|
20 |
|
21 <B>Note that this section is still under construction!</B> |
|
22 |
|
23 The following picture shows how source files are processed by doxygen. |
|
24 |
|
25 \image html archoverview.gif "Data flow overview" |
|
26 \image latex archoverview.eps "Data flow overview" width=14cm |
|
27 |
|
28 The following sections explain the steps above in more detail. |
|
29 |
|
30 <h3>Config parser</h3> |
|
31 |
|
32 The configuration file that controls the settings of a project is parsed |
|
33 and the settings are stored in the singleton class \c Config |
|
34 in <code>src/config.h</code>. The parser itself is written using \c flex |
|
35 and can be found in <code>src/config.l</code>. This parser is also used |
|
36 directly by \c doxywizard, so it is put in a separate library. |
|
37 |
|
38 Each configuration option has one of 5 possible types: \c String, |
|
39 \c List, \c Enum, \c Int, or \c Bool. The values of these options are |
|
40 available through the global functions \c Config_getXXX(), where \c XXX is the |
|
41 type of the option. The argument of these function is a string naming |
|
42 the option as it appears in the configuration file. For instance: |
|
43 \c Config_getBool("GENERATE_TESTLIST") returns a reference to a boolean |
|
44 value that is \c TRUE if the test list was enabled in the config file. |
|
45 |
|
46 The function \c readConfiguration() in \c src/doxygen.cpp |
|
47 reads the command line options and then calls the configuration parser. |
|
48 |
|
49 <h3>C Preprocessor</h3> |
|
50 |
|
51 The input files mentioned in the config file are (by default) fed to the |
|
52 C Preprocessor (after being piped through a user defined filter if available). |
|
53 |
|
54 The way the preprocessor works differs somewhat from a standard C Preprocessor. |
|
55 By default it does not do macro expansion, although it can be configured to |
|
56 expand all macros. Typical usage is to only expand a user specified set |
|
57 of macros. This is to allow macro names to appear in the type of |
|
58 function parameters for instance. |
|
59 |
|
60 Another difference is that the preprocessor parses, but not actually includes |
|
61 code when it encounters a \#include (with the exception of \#include |
|
62 found inside { ... } blocks). The reasons behind this deviation from |
|
63 the standard is to prevent feeding multiple definitions of the |
|
64 same functions/classes to doxygen's parser. If all source files would |
|
65 include a common header file for instance, the class and type |
|
66 definitions (and their documentation) would be present in each |
|
67 translation unit. |
|
68 |
|
69 The preprocessor is written using \c flex and can be found in |
|
70 \c src/pre.l. For condition blocks (\#if) evaluation of constant expressions |
|
71 is needed. For this a \c yacc based parser is used, which can be found |
|
72 in \c src/constexp.y and \c src/constexp.l. |
|
73 |
|
74 The preprocessor is invoked for each file using the \c preprocessFile() |
|
75 function declared in \c src/pre.h, and will append the preprocessed result |
|
76 to a character buffer. The format of the character buffer is |
|
77 |
|
78 \verbatim |
|
79 0x06 file name 1 |
|
80 0x06 preprocessed contents of file 1 |
|
81 ... |
|
82 0x06 file name n |
|
83 0x06 preprocessed contents of file n |
|
84 \endverbatim |
|
85 |
|
86 <h3>Language parser</h3> |
|
87 |
|
88 The preprocessed input buffer is fed to the language parser, which is |
|
89 implemented as a big state machine using \c flex. It can be found |
|
90 in the file \c src/scanner.l. There is one parser for all |
|
91 languages (C/C++/Java/IDL). The state variables \c insideIDL |
|
92 and \c insideJava are uses at some places for language specific choices. |
|
93 |
|
94 The task of the parser is to convert the input buffer into a tree of entries |
|
95 (basically an abstract syntax tree). An entry is defined in \c src/entry.h |
|
96 and is a blob of loosely structured information. The most important field |
|
97 is \c section which specifies the kind of information contained in the entry. |
|
98 |
|
99 Possible improvements for future versions: |
|
100 - Use one scanner/parser per language instead of one big scanner. |
|
101 - Move the first pass parsing of documentation blocks to a separate module. |
|
102 - Parse defines (these are currently gathered by the preprocessor, and |
|
103 ignored by the language parser). |
|
104 |
|
105 <h3>Data organizer</h3> |
|
106 |
|
107 This step consists of many smaller steps, that build |
|
108 dictionaries of the extracted classes, files, namespaces, |
|
109 variables, functions, packages, pages, and groups. Besides building |
|
110 dictionaries, during this step relations (such as inheritance relations), |
|
111 between the extracted entities are computed. |
|
112 |
|
113 Each step has a function defined in \c src/doxygen.cpp, which operates |
|
114 on the tree of entries, built during language parsing. Look at the |
|
115 "Gathering information" part of \c parseInput() for details. |
|
116 |
|
117 The result of this step is a number of dictionaries, which can be |
|
118 found in the Doxygen "namespace" defined in \c src/doxygen.h. Most |
|
119 elements of these dictionaries are derived from the class \c Definition; |
|
120 The class \c MemberDef, for instance, holds all information for a member. |
|
121 An instance of such a class can be part of a file ( class \c FileDef ), |
|
122 a class ( class \c ClassDef ), a namespace ( class \c NamespaceDef ), |
|
123 a group ( class \c GroupDef ), or a Java package ( class \c PackageDef ). |
|
124 |
|
125 <h3>Tag file parser</h3> |
|
126 |
|
127 If tag files are specified in the configuration file, these are parsed |
|
128 by a SAX based XML parser, which can be found in \c src/tagreader.cpp. |
|
129 The result of parsing a tag file is the insertion of \c Entry objects in the |
|
130 entry tree. The field \c Entry::tagInfo is used to mark the entry as |
|
131 external, and holds information about the tag file. |
|
132 |
|
133 <h3>Documentation parser</h3> |
|
134 |
|
135 Special comment blocks are stored as strings in the entities that they |
|
136 document. There is a string for the brief description and a string |
|
137 for the detailed description. The documentation parser reads these |
|
138 strings and executes the commands it finds in it (this is the second pass |
|
139 in parsing the documentation). It writes the result directly to the output |
|
140 generators. |
|
141 |
|
142 The parser is written in C++ and can be found in src/docparser.cpp. The |
|
143 tokens that are eaten by the parser come from src/doctokenizer.l. |
|
144 Code fragments found in the comment blocks are passed on to the source parser. |
|
145 |
|
146 The main entry point for the documentation parser is \c validatingParseDoc() |
|
147 declared in \c src/docparser.h. For simple texts with special |
|
148 commands \c validatingParseText() is used. |
|
149 |
|
150 <h3>Source parser</h3> |
|
151 |
|
152 If source browsing is enabled or if code fragments are encountered in the |
|
153 documentation, the source parser is invoked. |
|
154 |
|
155 The code parser tries to cross-reference to source code it parses with |
|
156 documented entities. It also does syntax highlighting of the sources. The |
|
157 output is directly written to the output generators. |
|
158 |
|
159 The main entry point for the code parser is \c parseCode() |
|
160 declared in \c src/code.h. |
|
161 |
|
162 <h3>Output generators</h3> |
|
163 |
|
164 After data is gathered and cross-referenced, doxygen generates |
|
165 output in various formats. For this it uses the methods provided by |
|
166 the abstract class \c OutputGenerator. In order to generate output |
|
167 for multiple formats at once, the methods of \c OutputList are called |
|
168 instead. This class maintains a list of concrete output generators, |
|
169 where each method called is delegated to all generators in the list. |
|
170 |
|
171 To allow small deviations in what is written to the output for each |
|
172 concrete output generator, it is possible to temporarily disable certain |
|
173 generators. The OutputList class contains various \c disable() and \c enable() |
|
174 methods for this. The methods \c OutputList::pushGeneratorState() and |
|
175 \c OutputList::popGeneratorState() are used to temporarily save the |
|
176 set of enabled/disabled output generators on a stack. |
|
177 |
|
178 The XML is generated directly from the gathered data structures. In the |
|
179 future XML will be used as an intermediate language (IL). The output |
|
180 generators will then use this IL as a starting point to generate the |
|
181 specific output formats. The advantage of having an IL is that various |
|
182 independently developed tools written in various languages, |
|
183 could extract information from the XML output. Possible tools could be: |
|
184 - an interactive source browser |
|
185 - a class diagram generator |
|
186 - computing code metrics. |
|
187 |
|
188 <h3>Debugging</h3> |
|
189 |
|
190 Since doxygen uses a lot of \c flex code it is important to understand |
|
191 how \c flex works (for this one should read the man page) |
|
192 and to understand what it is doing when \c flex is parsing some input. |
|
193 Fortunately, when flex is used with the -d option it outputs what rules |
|
194 matched. This makes it quite easy to follow what is going on for a |
|
195 particular input fragment. |
|
196 |
|
197 To make it easier to toggle debug information for a given flex file I |
|
198 wrote the following perl script, which automatically adds or removes -d |
|
199 from the correct line in the Makefile: |
|
200 |
|
201 \verbatim |
|
202 #!/usr/local/bin/perl |
|
203 |
|
204 $file = shift @ARGV; |
|
205 print "Toggle debugging mode for $file\n"; |
|
206 |
|
207 # add or remove the -d flex flag in the makefile |
|
208 unless (rename "Makefile.libdoxygen","Makefile.libdoxygen.old") { |
|
209 print STDERR "Error: cannot rename Makefile.libdoxygen!\n"; |
|
210 exit 1; |
|
211 } |
|
212 if (open(F,"<Makefile.libdoxygen.old")) { |
|
213 unless (open(G,">Makefile.libdoxygen")) { |
|
214 print STDERR "Error: opening file Makefile.libdoxygen for writing\n"; |
|
215 exit 1; |
|
216 } |
|
217 print "Processing Makefile.libdoxygen...\n"; |
|
218 while (<F>) { |
|
219 if ( s/\(LEX\) -P([a-zA-Z]+)YY -t $file/(LEX) -d -P\1YY -t $file/g ) { |
|
220 print "Enabling debug info for $file\n"; |
|
221 } |
|
222 elsif ( s/\(LEX\) -d -P([a-zA-Z]+)YY -t $file/(LEX) -P\1YY -t $file/g ) { |
|
223 print "Disabling debug info for $file\n"; |
|
224 } |
|
225 print G "$_"; |
|
226 } |
|
227 close F; |
|
228 unlink "Makefile.libdoxygen.old"; |
|
229 } |
|
230 else { |
|
231 print STDERR "Warning file Makefile.libdoxygen.old does not exist!\n"; |
|
232 } |
|
233 |
|
234 # touch the file |
|
235 $now = time; |
|
236 utime $now, $now, $file |
|
237 \endverbatim |
|
238 |
|
239 */ |
|
240 |
|
241 |