src/3rdparty/ptmalloc/README
changeset 0 1918ee327afb
equal deleted inserted replaced
-1:000000000000 0:1918ee327afb
       
     1 ptmalloc3 - a multi-thread malloc implementation
       
     2 ================================================
       
     3 
       
     4 Wolfram Gloger (wg@malloc.de)
       
     5 
       
     6 Jan 2006
       
     7 
       
     8 
       
     9 Thanks
       
    10 ======
       
    11 
       
    12 This release was partly funded by Pixar Animation Studios.  I would
       
    13 like to thank David Baraff of Pixar for his support and Doug Lea
       
    14 (dl@cs.oswego.edu) for the great original malloc implementation.
       
    15 
       
    16 
       
    17 Introduction
       
    18 ============
       
    19 
       
    20 This package is a modified version of Doug Lea's malloc-2.8.3
       
    21 implementation (available seperately from ftp://g.oswego.edu/pub/misc)
       
    22 that I adapted for multiple threads, while trying to avoid lock
       
    23 contention as much as possible.
       
    24 
       
    25 As part of the GNU C library, the source files may be available under
       
    26 the GNU Library General Public License (see the comments in the
       
    27 files).  But as part of this stand-alone package, the code is also
       
    28 available under the (probably less restrictive) conditions described
       
    29 in the file 'COPYRIGHT'.  In any case, there is no warranty whatsoever
       
    30 for this package.
       
    31 
       
    32 The current distribution should be available from:
       
    33 
       
    34 http://www.malloc.de/malloc/ptmalloc3.tar.gz
       
    35 
       
    36 
       
    37 Compilation
       
    38 ===========
       
    39 
       
    40 It should be possible to build ptmalloc3 on any UN*X-like system that
       
    41 implements the sbrk(), mmap(), munmap() and mprotect() calls.  Since
       
    42 there are now several source files, a library (libptmalloc3.a) is
       
    43 generated.  See the Makefile for examples of the compile-time options.
       
    44 
       
    45 Note that support for non-ANSI compilers is no longer there.
       
    46 
       
    47 Several example targets are provided in the Makefile:
       
    48 
       
    49  o Posix threads (pthreads), compile with "make posix"
       
    50 
       
    51  o Posix threads with explicit initialization, compile with
       
    52    "make posix-explicit" (known to be required on HPUX)
       
    53 
       
    54  o Posix threads without "tsd data hack" (see below), compile with
       
    55    "make posix-with-tsd"
       
    56 
       
    57  o Solaris threads, compile with "make solaris"
       
    58 
       
    59  o SGI sproc() threads, compile with "make sproc"
       
    60 
       
    61  o no threads, compile with "make nothreads" (currently out of order?)
       
    62 
       
    63 For Linux:
       
    64 
       
    65  o make "linux-pthread" (almost the same as "make posix") or
       
    66    make "linux-shared"
       
    67 
       
    68 Note that some compilers need special flags for multi-threaded code,
       
    69 e.g. with Solaris cc with Posix threads, one should use:
       
    70 
       
    71 % make posix SYS_FLAGS='-mt'
       
    72 
       
    73 Some additional targets, ending in `-libc', are also provided in the
       
    74 Makefile, to compare performance of the test programs to the case when
       
    75 linking with the standard malloc implementation in libc.
       
    76 
       
    77 A potential problem remains: If any of the system-specific functions
       
    78 for getting/setting thread-specific data or for locking a mutex call
       
    79 one of the malloc-related functions internally, the implementation
       
    80 cannot work at all due to infinite recursion.  One example seems to be
       
    81 Solaris 2.4.  I would like to hear if this problem occurs on other
       
    82 systems, and whether similar workarounds could be applied.
       
    83 
       
    84 For Posix threads, too, an optional hack like that has been integrated
       
    85 (activated when defining USE_TSD_DATA_HACK) which depends on
       
    86 `pthread_t' being convertible to an integral type (which is of course
       
    87 not generally guaranteed).  USE_TSD_DATA_HACK is now the default
       
    88 because I haven't yet found a non-glibc pthreads system where this
       
    89 hack is _not_ needed.
       
    90 
       
    91 *NEW* and _important_: In (currently) one place in the ptmalloc3
       
    92 source, a write memory barrier is needed, named
       
    93 atomic_write_barrier().  This macro needs to be defined at the end of
       
    94 malloc-machine.h.  For gcc, a fallback in the form of a full memory
       
    95 barrier is already defined, but you may need to add another definition
       
    96 if you don't use gcc.
       
    97 
       
    98 Usage
       
    99 =====
       
   100 
       
   101 Just link libptmalloc3 into your application.
       
   102 
       
   103 Some wicked systems (e.g. HPUX apparently) won't let malloc call _any_
       
   104 thread-related functions before main().  On these systems,
       
   105 USE_STARTER=2 must be defined during compilation (see "make
       
   106 posix-explicit" above) and the global initialization function
       
   107 ptmalloc_init() must be called explicitly, preferably at the start of
       
   108 main().
       
   109 
       
   110 Otherwise, when using ptmalloc3, no special precautions are necessary.
       
   111 
       
   112 Link order is important
       
   113 =======================
       
   114 
       
   115 On some systems, when overriding malloc and linking against shared
       
   116 libraries, the link order becomes very important.  E.g., when linking
       
   117 C++ programs on Solaris with Solaris threads [this is probably now
       
   118 obsolete], don't rely on libC being included by default, but instead
       
   119 put `-lthread' behind `-lC' on the command line:
       
   120 
       
   121   CC ... libptmalloc3.a -lC -lthread
       
   122 
       
   123 This is because there are global constructors in libC that need
       
   124 malloc/ptmalloc, which in turn needs to have the thread library to be
       
   125 already initialized.
       
   126 
       
   127 Debugging hooks
       
   128 ===============
       
   129 
       
   130 All calls to malloc(), realloc(), free() and memalign() are routed
       
   131 through the global function pointers __malloc_hook, __realloc_hook,
       
   132 __free_hook and __memalign_hook if they are not NULL (see the malloc.h
       
   133 header file for declarations of these pointers).  Therefore the malloc
       
   134 implementation can be changed at runtime, if care is taken not to call
       
   135 free() or realloc() on pointers obtained with a different
       
   136 implementation than the one currently in effect.  (The easiest way to
       
   137 guarantee this is to set up the hooks before any malloc call, e.g.
       
   138 with a function pointed to by the global variable
       
   139 __malloc_initialize_hook).
       
   140 
       
   141 You can now also tune other malloc parameters (normally adjused via
       
   142 mallopt() calls from the application) with environment variables:
       
   143 
       
   144     MALLOC_TRIM_THRESHOLD_    for deciding to shrink the heap (in bytes)
       
   145 
       
   146     MALLOC_GRANULARITY_       The unit for allocating and deallocating
       
   147     MALLOC_TOP_PAD_           memory from the system.  The default
       
   148                               is 64k and this parameter _must_ be a
       
   149                               power of 2.
       
   150 
       
   151     MALLOC_MMAP_THRESHOLD_    min. size for chunks allocated via
       
   152                               mmap() (in bytes)
       
   153 
       
   154 Tests
       
   155 =====
       
   156 
       
   157 Two testing applications, t-test1 and t-test2, are included in this
       
   158 source distribution.  Both perform pseudo-random sequences of
       
   159 allocations/frees, and can be given numeric arguments (all arguments
       
   160 are optional):
       
   161 
       
   162 % t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins>
       
   163 
       
   164     n-total = total number of threads executed (default 10)
       
   165     n-parallel = number of threads running in parallel (2)
       
   166     n-allocs = number of malloc()'s / free()'s per thread (10000)
       
   167     size-max = max. size requested with malloc() in bytes (10000)
       
   168     bins = number of bins to maintain
       
   169 
       
   170 The first test `t-test1' maintains a completely seperate pool of
       
   171 allocated bins for each thread, and should therefore show full
       
   172 parallelism.  On the other hand, `t-test2' creates only a single pool
       
   173 of bins, and each thread randomly allocates/frees any bin.  Some lock
       
   174 contention is to be expected in this case, as the threads frequently
       
   175 cross each others arena.
       
   176 
       
   177 Performance results from t-test1 should be quite repeatable, while the
       
   178 behaviour of t-test2 depends on scheduling variations.
       
   179 
       
   180 Conclusion
       
   181 ==========
       
   182 
       
   183 I'm always interested in performance data and feedback, just send mail
       
   184 to ptmalloc@malloc.de.
       
   185 
       
   186 Good luck!