Orb/Doxygen/doc/arch.doc
changeset 0 42188c7ea2d9
equal deleted inserted replaced
-1:000000000000 0:42188c7ea2d9
       
     1 /******************************************************************************
       
     2  *
       
     3  * 
       
     4  *
       
     5  * Copyright (C) 1997-2008 by Dimitri van Heesch.
       
     6  *
       
     7  * Permission to use, copy, modify, and distribute this software and its
       
     8  * documentation under the terms of the GNU General Public License is hereby 
       
     9  * granted. No representations are made about the suitability of this software 
       
    10  * for any purpose. It is provided "as is" without express or implied warranty.
       
    11  * See the GNU General Public License for more details.
       
    12  *
       
    13  * Documents produced by Doxygen are derivative works derived from the
       
    14  * input used in their production; they are not affected by this license.
       
    15  *
       
    16  */
       
    17 /*! \page arch Doxygen's Internals
       
    18 
       
    19 <h3>Doxygen's internals</h3>
       
    20 
       
    21 <B>Note that this section is still under construction!</B>
       
    22 
       
    23 The following picture shows how source files are processed by doxygen.
       
    24 
       
    25 \image html archoverview.gif "Data flow overview"
       
    26 \image latex archoverview.eps "Data flow overview" width=14cm
       
    27 
       
    28 The following sections explain the steps above in more detail.
       
    29 
       
    30 <h3>Config parser</h3>
       
    31 
       
    32 The configuration file that controls the settings of a project is parsed
       
    33 and the settings are stored in the singleton class \c Config 
       
    34 in <code>src/config.h</code>. The parser itself is written using \c flex 
       
    35 and can be found in <code>src/config.l</code>. This parser is also used 
       
    36 directly by \c doxywizard, so it is put in a separate library.
       
    37 
       
    38 Each configuration option has one of 5 possible types: \c String, 
       
    39 \c List, \c Enum, \c Int, or \c Bool. The values of these options are
       
    40 available through the global functions \c Config_getXXX(), where \c XXX is the
       
    41 type of the option. The argument of these function is a string naming
       
    42 the option as it appears in the configuration file. For instance: 
       
    43 \c Config_getBool("GENERATE_TESTLIST") returns a reference to a boolean
       
    44 value that is \c TRUE if the test list was enabled in the config file. 
       
    45 
       
    46 The function \c readConfiguration() in \c src/doxygen.cpp 
       
    47 reads the command line options and then calls the configuration parser.
       
    48 
       
    49 <h3>C Preprocessor</h3>
       
    50 
       
    51 The input files mentioned in the config file are (by default) fed to the
       
    52 C Preprocessor (after being piped through a user defined filter if available).
       
    53 
       
    54 The way the preprocessor works differs somewhat from a standard C Preprocessor.
       
    55 By default it does not do macro expansion, although it can be configured to
       
    56 expand all macros. Typical usage is to only expand a user specified set
       
    57 of macros. This is to allow macro names to appear in the type of 
       
    58 function parameters for instance.
       
    59 
       
    60 Another difference is that the preprocessor parses, but not actually includes 
       
    61 code when it encounters a \#include (with the exception of \#include 
       
    62 found inside { ... } blocks). The reasons behind this deviation from 
       
    63 the standard is to prevent feeding multiple definitions of the 
       
    64 same functions/classes to doxygen's parser. If all source files would 
       
    65 include a common header file for instance, the class and type 
       
    66 definitions (and their documentation) would be present in each 
       
    67 translation unit. 
       
    68 
       
    69 The preprocessor is written using \c flex and can be found in
       
    70 \c src/pre.l. For condition blocks (\#if) evaluation of constant expressions 
       
    71 is needed. For this a \c yacc based parser is used, which can be found 
       
    72 in \c src/constexp.y and \c src/constexp.l.
       
    73 
       
    74 The preprocessor is invoked for each file using the \c preprocessFile() 
       
    75 function declared in \c src/pre.h, and will append the preprocessed result 
       
    76 to a character buffer. The format of the character buffer is
       
    77 
       
    78 \verbatim
       
    79 0x06 file name 1 
       
    80 0x06 preprocessed contents of file 1
       
    81 ...
       
    82 0x06 file name n
       
    83 0x06 preprocessed contents of file n
       
    84 \endverbatim
       
    85 
       
    86 <h3>Language parser</h3>
       
    87 
       
    88 The preprocessed input buffer is fed to the language parser, which is 
       
    89 implemented as a big state machine using \c flex. It can be found 
       
    90 in the file \c src/scanner.l. There is one parser for all 
       
    91 languages (C/C++/Java/IDL). The state variables \c insideIDL 
       
    92 and \c insideJava are uses at some places for language specific choices. 
       
    93 
       
    94 The task of the parser is to convert the input buffer into a tree of entries 
       
    95 (basically an abstract syntax tree). An entry is defined in \c src/entry.h 
       
    96 and is a blob of loosely structured information. The most important field 
       
    97 is \c section which specifies the kind of information contained in the entry.
       
    98  
       
    99 Possible improvements for future versions:
       
   100  - Use one scanner/parser per language instead of one big scanner.
       
   101  - Move the first pass parsing of documentation blocks to a separate module.
       
   102  - Parse defines (these are currently gathered by the preprocessor, and
       
   103    ignored by the language parser).
       
   104 
       
   105 <h3>Data organizer</h3>
       
   106 
       
   107 This step consists of many smaller steps, that build 
       
   108 dictionaries of the extracted classes, files, namespaces, 
       
   109 variables, functions, packages, pages, and groups. Besides building
       
   110 dictionaries, during this step relations (such as inheritance relations),
       
   111 between the extracted entities are computed.
       
   112 
       
   113 Each step has a function defined in \c src/doxygen.cpp, which operates
       
   114 on the tree of entries, built during language parsing. Look at the
       
   115 "Gathering information" part of \c parseInput() for details.
       
   116 
       
   117 The result of this step is a number of dictionaries, which can be
       
   118 found in the Doxygen "namespace" defined in \c src/doxygen.h. Most
       
   119 elements of these dictionaries are derived from the class \c Definition;
       
   120 The class \c MemberDef, for instance, holds all information for a member. 
       
   121 An instance of such a class can be part of a file ( class \c FileDef ), 
       
   122 a class ( class \c ClassDef ), a namespace ( class \c NamespaceDef ), 
       
   123 a group ( class \c GroupDef ), or a Java package ( class \c PackageDef ).
       
   124 
       
   125 <h3>Tag file parser</h3>
       
   126 
       
   127 If tag files are specified in the configuration file, these are parsed
       
   128 by a SAX based XML parser, which can be found in \c src/tagreader.cpp. 
       
   129 The result of parsing a tag file is the insertion of \c Entry objects in the
       
   130 entry tree. The field \c Entry::tagInfo is used to mark the entry as
       
   131 external, and holds information about the tag file.
       
   132 
       
   133 <h3>Documentation parser</h3>
       
   134 
       
   135 Special comment blocks are stored as strings in the entities that they
       
   136 document. There is a string for the brief description and a string
       
   137 for the detailed description. The documentation parser reads these
       
   138 strings and executes the commands it finds in it (this is the second pass
       
   139 in parsing the documentation). It writes the result directly to the output 
       
   140 generators.
       
   141 
       
   142 The parser is written in C++ and can be found in src/docparser.cpp. The
       
   143 tokens that are eaten by the parser come from src/doctokenizer.l.
       
   144 Code fragments found in the comment blocks are passed on to the source parser.
       
   145 
       
   146 The main entry point for the documentation parser is \c validatingParseDoc()
       
   147 declared in \c src/docparser.h.  For simple texts with special 
       
   148 commands \c validatingParseText() is used.
       
   149 
       
   150 <h3>Source parser</h3>
       
   151 
       
   152 If source browsing is enabled or if code fragments are encountered in the
       
   153 documentation, the source parser is invoked.
       
   154 
       
   155 The code parser tries to cross-reference to source code it parses with
       
   156 documented entities. It also does syntax highlighting of the sources. The
       
   157 output is directly written to the output generators.
       
   158 
       
   159 The main entry point for the code parser is \c parseCode() 
       
   160 declared in \c src/code.h.
       
   161 
       
   162 <h3>Output generators</h3>
       
   163 
       
   164 After data is gathered and cross-referenced, doxygen generates 
       
   165 output in various formats. For this it uses the methods provided by 
       
   166 the abstract class \c OutputGenerator. In order to generate output
       
   167 for multiple formats at once, the methods of \c OutputList are called
       
   168 instead. This class maintains a list of concrete output generators,
       
   169 where each method called is delegated to all generators in the list.
       
   170 
       
   171 To allow small deviations in what is written to the output for each
       
   172 concrete output generator, it is possible to temporarily disable certain
       
   173 generators. The OutputList class contains various \c disable() and \c enable()
       
   174 methods for this. The methods \c OutputList::pushGeneratorState() and 
       
   175 \c OutputList::popGeneratorState() are used to temporarily save the
       
   176 set of enabled/disabled output generators on a stack. 
       
   177 
       
   178 The XML is generated directly from the gathered data structures. In the
       
   179 future XML will be used as an intermediate language (IL). The output
       
   180 generators will then use this IL as a starting point to generate the
       
   181 specific output formats. The advantage of having an IL is that various
       
   182 independently developed tools written in various languages, 
       
   183 could extract information from the XML output. Possible tools could be:
       
   184 - an interactive source browser
       
   185 - a class diagram generator
       
   186 - computing code metrics.
       
   187 
       
   188 <h3>Debugging</h3>
       
   189 
       
   190 Since doxygen uses a lot of \c flex code it is important to understand
       
   191 how \c flex works (for this one should read the man page) 
       
   192 and to understand what it is doing when \c flex is parsing some input. 
       
   193 Fortunately, when flex is used with the -d option it outputs what rules
       
   194 matched. This makes it quite easy to follow what is going on for a 
       
   195 particular input fragment. 
       
   196 
       
   197 To make it easier to toggle debug information for a given flex file I
       
   198 wrote the following perl script, which automatically adds or removes -d
       
   199 from the correct line in the Makefile:
       
   200 
       
   201 \verbatim
       
   202 #!/usr/local/bin/perl 
       
   203 
       
   204 $file = shift @ARGV;
       
   205 print "Toggle debugging mode for $file\n";
       
   206 
       
   207 # add or remove the -d flex flag in the makefile
       
   208 unless (rename "Makefile.libdoxygen","Makefile.libdoxygen.old") {
       
   209   print STDERR "Error: cannot rename Makefile.libdoxygen!\n";
       
   210   exit 1;
       
   211 }
       
   212 if (open(F,"<Makefile.libdoxygen.old")) {
       
   213   unless (open(G,">Makefile.libdoxygen")) {
       
   214     print STDERR "Error: opening file Makefile.libdoxygen for writing\n";
       
   215     exit 1; 
       
   216   }
       
   217   print "Processing Makefile.libdoxygen...\n";
       
   218   while (<F>) {
       
   219     if ( s/\(LEX\) -P([a-zA-Z]+)YY -t $file/(LEX) -d -P\1YY -t $file/g ) {
       
   220       print "Enabling debug info for $file\n";
       
   221     }
       
   222     elsif ( s/\(LEX\) -d -P([a-zA-Z]+)YY -t $file/(LEX) -P\1YY -t $file/g ) {
       
   223       print "Disabling debug info for $file\n";
       
   224     }
       
   225     print G "$_";
       
   226   }
       
   227   close F;
       
   228   unlink "Makefile.libdoxygen.old";
       
   229 }
       
   230 else {
       
   231   print STDERR "Warning file Makefile.libdoxygen.old does not exist!\n"; 
       
   232 }
       
   233 
       
   234 # touch the file
       
   235 $now = time;
       
   236 utime $now, $now, $file
       
   237 \endverbatim
       
   238 
       
   239 */
       
   240 
       
   241