0
|
1 |
ptmalloc3 - a multi-thread malloc implementation
|
|
2 |
================================================
|
|
3 |
|
|
4 |
Wolfram Gloger (wg@malloc.de)
|
|
5 |
|
|
6 |
Jan 2006
|
|
7 |
|
|
8 |
|
|
9 |
Thanks
|
|
10 |
======
|
|
11 |
|
|
12 |
This release was partly funded by Pixar Animation Studios. I would
|
|
13 |
like to thank David Baraff of Pixar for his support and Doug Lea
|
|
14 |
(dl@cs.oswego.edu) for the great original malloc implementation.
|
|
15 |
|
|
16 |
|
|
17 |
Introduction
|
|
18 |
============
|
|
19 |
|
|
20 |
This package is a modified version of Doug Lea's malloc-2.8.3
|
|
21 |
implementation (available seperately from ftp://g.oswego.edu/pub/misc)
|
|
22 |
that I adapted for multiple threads, while trying to avoid lock
|
|
23 |
contention as much as possible.
|
|
24 |
|
|
25 |
As part of the GNU C library, the source files may be available under
|
|
26 |
the GNU Library General Public License (see the comments in the
|
|
27 |
files). But as part of this stand-alone package, the code is also
|
|
28 |
available under the (probably less restrictive) conditions described
|
|
29 |
in the file 'COPYRIGHT'. In any case, there is no warranty whatsoever
|
|
30 |
for this package.
|
|
31 |
|
|
32 |
The current distribution should be available from:
|
|
33 |
|
|
34 |
http://www.malloc.de/malloc/ptmalloc3.tar.gz
|
|
35 |
|
|
36 |
|
|
37 |
Compilation
|
|
38 |
===========
|
|
39 |
|
|
40 |
It should be possible to build ptmalloc3 on any UN*X-like system that
|
|
41 |
implements the sbrk(), mmap(), munmap() and mprotect() calls. Since
|
|
42 |
there are now several source files, a library (libptmalloc3.a) is
|
|
43 |
generated. See the Makefile for examples of the compile-time options.
|
|
44 |
|
|
45 |
Note that support for non-ANSI compilers is no longer there.
|
|
46 |
|
|
47 |
Several example targets are provided in the Makefile:
|
|
48 |
|
|
49 |
o Posix threads (pthreads), compile with "make posix"
|
|
50 |
|
|
51 |
o Posix threads with explicit initialization, compile with
|
|
52 |
"make posix-explicit" (known to be required on HPUX)
|
|
53 |
|
|
54 |
o Posix threads without "tsd data hack" (see below), compile with
|
|
55 |
"make posix-with-tsd"
|
|
56 |
|
|
57 |
o Solaris threads, compile with "make solaris"
|
|
58 |
|
|
59 |
o SGI sproc() threads, compile with "make sproc"
|
|
60 |
|
|
61 |
o no threads, compile with "make nothreads" (currently out of order?)
|
|
62 |
|
|
63 |
For Linux:
|
|
64 |
|
|
65 |
o make "linux-pthread" (almost the same as "make posix") or
|
|
66 |
make "linux-shared"
|
|
67 |
|
|
68 |
Note that some compilers need special flags for multi-threaded code,
|
|
69 |
e.g. with Solaris cc with Posix threads, one should use:
|
|
70 |
|
|
71 |
% make posix SYS_FLAGS='-mt'
|
|
72 |
|
|
73 |
Some additional targets, ending in `-libc', are also provided in the
|
|
74 |
Makefile, to compare performance of the test programs to the case when
|
|
75 |
linking with the standard malloc implementation in libc.
|
|
76 |
|
|
77 |
A potential problem remains: If any of the system-specific functions
|
|
78 |
for getting/setting thread-specific data or for locking a mutex call
|
|
79 |
one of the malloc-related functions internally, the implementation
|
|
80 |
cannot work at all due to infinite recursion. One example seems to be
|
|
81 |
Solaris 2.4. I would like to hear if this problem occurs on other
|
|
82 |
systems, and whether similar workarounds could be applied.
|
|
83 |
|
|
84 |
For Posix threads, too, an optional hack like that has been integrated
|
|
85 |
(activated when defining USE_TSD_DATA_HACK) which depends on
|
|
86 |
`pthread_t' being convertible to an integral type (which is of course
|
|
87 |
not generally guaranteed). USE_TSD_DATA_HACK is now the default
|
|
88 |
because I haven't yet found a non-glibc pthreads system where this
|
|
89 |
hack is _not_ needed.
|
|
90 |
|
|
91 |
*NEW* and _important_: In (currently) one place in the ptmalloc3
|
|
92 |
source, a write memory barrier is needed, named
|
|
93 |
atomic_write_barrier(). This macro needs to be defined at the end of
|
|
94 |
malloc-machine.h. For gcc, a fallback in the form of a full memory
|
|
95 |
barrier is already defined, but you may need to add another definition
|
|
96 |
if you don't use gcc.
|
|
97 |
|
|
98 |
Usage
|
|
99 |
=====
|
|
100 |
|
|
101 |
Just link libptmalloc3 into your application.
|
|
102 |
|
|
103 |
Some wicked systems (e.g. HPUX apparently) won't let malloc call _any_
|
|
104 |
thread-related functions before main(). On these systems,
|
|
105 |
USE_STARTER=2 must be defined during compilation (see "make
|
|
106 |
posix-explicit" above) and the global initialization function
|
|
107 |
ptmalloc_init() must be called explicitly, preferably at the start of
|
|
108 |
main().
|
|
109 |
|
|
110 |
Otherwise, when using ptmalloc3, no special precautions are necessary.
|
|
111 |
|
|
112 |
Link order is important
|
|
113 |
=======================
|
|
114 |
|
|
115 |
On some systems, when overriding malloc and linking against shared
|
|
116 |
libraries, the link order becomes very important. E.g., when linking
|
|
117 |
C++ programs on Solaris with Solaris threads [this is probably now
|
|
118 |
obsolete], don't rely on libC being included by default, but instead
|
|
119 |
put `-lthread' behind `-lC' on the command line:
|
|
120 |
|
|
121 |
CC ... libptmalloc3.a -lC -lthread
|
|
122 |
|
|
123 |
This is because there are global constructors in libC that need
|
|
124 |
malloc/ptmalloc, which in turn needs to have the thread library to be
|
|
125 |
already initialized.
|
|
126 |
|
|
127 |
Debugging hooks
|
|
128 |
===============
|
|
129 |
|
|
130 |
All calls to malloc(), realloc(), free() and memalign() are routed
|
|
131 |
through the global function pointers __malloc_hook, __realloc_hook,
|
|
132 |
__free_hook and __memalign_hook if they are not NULL (see the malloc.h
|
|
133 |
header file for declarations of these pointers). Therefore the malloc
|
|
134 |
implementation can be changed at runtime, if care is taken not to call
|
|
135 |
free() or realloc() on pointers obtained with a different
|
|
136 |
implementation than the one currently in effect. (The easiest way to
|
|
137 |
guarantee this is to set up the hooks before any malloc call, e.g.
|
|
138 |
with a function pointed to by the global variable
|
|
139 |
__malloc_initialize_hook).
|
|
140 |
|
|
141 |
You can now also tune other malloc parameters (normally adjused via
|
|
142 |
mallopt() calls from the application) with environment variables:
|
|
143 |
|
|
144 |
MALLOC_TRIM_THRESHOLD_ for deciding to shrink the heap (in bytes)
|
|
145 |
|
|
146 |
MALLOC_GRANULARITY_ The unit for allocating and deallocating
|
|
147 |
MALLOC_TOP_PAD_ memory from the system. The default
|
|
148 |
is 64k and this parameter _must_ be a
|
|
149 |
power of 2.
|
|
150 |
|
|
151 |
MALLOC_MMAP_THRESHOLD_ min. size for chunks allocated via
|
|
152 |
mmap() (in bytes)
|
|
153 |
|
|
154 |
Tests
|
|
155 |
=====
|
|
156 |
|
|
157 |
Two testing applications, t-test1 and t-test2, are included in this
|
|
158 |
source distribution. Both perform pseudo-random sequences of
|
|
159 |
allocations/frees, and can be given numeric arguments (all arguments
|
|
160 |
are optional):
|
|
161 |
|
|
162 |
% t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins>
|
|
163 |
|
|
164 |
n-total = total number of threads executed (default 10)
|
|
165 |
n-parallel = number of threads running in parallel (2)
|
|
166 |
n-allocs = number of malloc()'s / free()'s per thread (10000)
|
|
167 |
size-max = max. size requested with malloc() in bytes (10000)
|
|
168 |
bins = number of bins to maintain
|
|
169 |
|
|
170 |
The first test `t-test1' maintains a completely seperate pool of
|
|
171 |
allocated bins for each thread, and should therefore show full
|
|
172 |
parallelism. On the other hand, `t-test2' creates only a single pool
|
|
173 |
of bins, and each thread randomly allocates/frees any bin. Some lock
|
|
174 |
contention is to be expected in this case, as the threads frequently
|
|
175 |
cross each others arena.
|
|
176 |
|
|
177 |
Performance results from t-test1 should be quite repeatable, while the
|
|
178 |
behaviour of t-test2 depends on scheduling variations.
|
|
179 |
|
|
180 |
Conclusion
|
|
181 |
==========
|
|
182 |
|
|
183 |
I'm always interested in performance data and feedback, just send mail
|
|
184 |
to ptmalloc@malloc.de.
|
|
185 |
|
|
186 |
Good luck!
|