ptmalloc3 - a multi-thread malloc implementation+ −
================================================+ −
+ −
Wolfram Gloger (wg@malloc.de)+ −
+ −
Jan 2006+ −
+ −
+ −
Thanks+ −
======+ −
+ −
This release was partly funded by Pixar Animation Studios. I would+ −
like to thank David Baraff of Pixar for his support and Doug Lea+ −
(dl@cs.oswego.edu) for the great original malloc implementation.+ −
+ −
+ −
Introduction+ −
============+ −
+ −
This package is a modified version of Doug Lea's malloc-2.8.3+ −
implementation (available seperately from ftp://g.oswego.edu/pub/misc)+ −
that I adapted for multiple threads, while trying to avoid lock+ −
contention as much as possible.+ −
+ −
As part of the GNU C library, the source files may be available under+ −
the GNU Library General Public License (see the comments in the+ −
files). But as part of this stand-alone package, the code is also+ −
available under the (probably less restrictive) conditions described+ −
in the file 'COPYRIGHT'. In any case, there is no warranty whatsoever+ −
for this package.+ −
+ −
The current distribution should be available from:+ −
+ −
http://www.malloc.de/malloc/ptmalloc3.tar.gz+ −
+ −
+ −
Compilation+ −
===========+ −
+ −
It should be possible to build ptmalloc3 on any UN*X-like system that+ −
implements the sbrk(), mmap(), munmap() and mprotect() calls. Since+ −
there are now several source files, a library (libptmalloc3.a) is+ −
generated. See the Makefile for examples of the compile-time options.+ −
+ −
Note that support for non-ANSI compilers is no longer there.+ −
+ −
Several example targets are provided in the Makefile:+ −
+ −
o Posix threads (pthreads), compile with "make posix"+ −
+ −
o Posix threads with explicit initialization, compile with+ −
"make posix-explicit" (known to be required on HPUX)+ −
+ −
o Posix threads without "tsd data hack" (see below), compile with+ −
"make posix-with-tsd"+ −
+ −
o Solaris threads, compile with "make solaris"+ −
+ −
o SGI sproc() threads, compile with "make sproc"+ −
+ −
o no threads, compile with "make nothreads" (currently out of order?)+ −
+ −
For Linux:+ −
+ −
o make "linux-pthread" (almost the same as "make posix") or+ −
make "linux-shared"+ −
+ −
Note that some compilers need special flags for multi-threaded code,+ −
e.g. with Solaris cc with Posix threads, one should use:+ −
+ −
% make posix SYS_FLAGS='-mt'+ −
+ −
Some additional targets, ending in `-libc', are also provided in the+ −
Makefile, to compare performance of the test programs to the case when+ −
linking with the standard malloc implementation in libc.+ −
+ −
A potential problem remains: If any of the system-specific functions+ −
for getting/setting thread-specific data or for locking a mutex call+ −
one of the malloc-related functions internally, the implementation+ −
cannot work at all due to infinite recursion. One example seems to be+ −
Solaris 2.4. I would like to hear if this problem occurs on other+ −
systems, and whether similar workarounds could be applied.+ −
+ −
For Posix threads, too, an optional hack like that has been integrated+ −
(activated when defining USE_TSD_DATA_HACK) which depends on+ −
`pthread_t' being convertible to an integral type (which is of course+ −
not generally guaranteed). USE_TSD_DATA_HACK is now the default+ −
because I haven't yet found a non-glibc pthreads system where this+ −
hack is _not_ needed.+ −
+ −
*NEW* and _important_: In (currently) one place in the ptmalloc3+ −
source, a write memory barrier is needed, named+ −
atomic_write_barrier(). This macro needs to be defined at the end of+ −
malloc-machine.h. For gcc, a fallback in the form of a full memory+ −
barrier is already defined, but you may need to add another definition+ −
if you don't use gcc.+ −
+ −
Usage+ −
=====+ −
+ −
Just link libptmalloc3 into your application.+ −
+ −
Some wicked systems (e.g. HPUX apparently) won't let malloc call _any_+ −
thread-related functions before main(). On these systems,+ −
USE_STARTER=2 must be defined during compilation (see "make+ −
posix-explicit" above) and the global initialization function+ −
ptmalloc_init() must be called explicitly, preferably at the start of+ −
main().+ −
+ −
Otherwise, when using ptmalloc3, no special precautions are necessary.+ −
+ −
Link order is important+ −
=======================+ −
+ −
On some systems, when overriding malloc and linking against shared+ −
libraries, the link order becomes very important. E.g., when linking+ −
C++ programs on Solaris with Solaris threads [this is probably now+ −
obsolete], don't rely on libC being included by default, but instead+ −
put `-lthread' behind `-lC' on the command line:+ −
+ −
CC ... libptmalloc3.a -lC -lthread+ −
+ −
This is because there are global constructors in libC that need+ −
malloc/ptmalloc, which in turn needs to have the thread library to be+ −
already initialized.+ −
+ −
Debugging hooks+ −
===============+ −
+ −
All calls to malloc(), realloc(), free() and memalign() are routed+ −
through the global function pointers __malloc_hook, __realloc_hook,+ −
__free_hook and __memalign_hook if they are not NULL (see the malloc.h+ −
header file for declarations of these pointers). Therefore the malloc+ −
implementation can be changed at runtime, if care is taken not to call+ −
free() or realloc() on pointers obtained with a different+ −
implementation than the one currently in effect. (The easiest way to+ −
guarantee this is to set up the hooks before any malloc call, e.g.+ −
with a function pointed to by the global variable+ −
__malloc_initialize_hook).+ −
+ −
You can now also tune other malloc parameters (normally adjused via+ −
mallopt() calls from the application) with environment variables:+ −
+ −
MALLOC_TRIM_THRESHOLD_ for deciding to shrink the heap (in bytes)+ −
+ −
MALLOC_GRANULARITY_ The unit for allocating and deallocating+ −
MALLOC_TOP_PAD_ memory from the system. The default+ −
is 64k and this parameter _must_ be a+ −
power of 2.+ −
+ −
MALLOC_MMAP_THRESHOLD_ min. size for chunks allocated via+ −
mmap() (in bytes)+ −
+ −
Tests+ −
=====+ −
+ −
Two testing applications, t-test1 and t-test2, are included in this+ −
source distribution. Both perform pseudo-random sequences of+ −
allocations/frees, and can be given numeric arguments (all arguments+ −
are optional):+ −
+ −
% t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins>+ −
+ −
n-total = total number of threads executed (default 10)+ −
n-parallel = number of threads running in parallel (2)+ −
n-allocs = number of malloc()'s / free()'s per thread (10000)+ −
size-max = max. size requested with malloc() in bytes (10000)+ −
bins = number of bins to maintain+ −
+ −
The first test `t-test1' maintains a completely seperate pool of+ −
allocated bins for each thread, and should therefore show full+ −
parallelism. On the other hand, `t-test2' creates only a single pool+ −
of bins, and each thread randomly allocates/frees any bin. Some lock+ −
contention is to be expected in this case, as the threads frequently+ −
cross each others arena.+ −
+ −
Performance results from t-test1 should be quite repeatable, while the+ −
behaviour of t-test2 depends on scheduling variations.+ −
+ −
Conclusion+ −
==========+ −
+ −
I'm always interested in performance data and feedback, just send mail+ −
to ptmalloc@malloc.de.+ −
+ −
Good luck!+ −