|
1 ptmalloc3 - a multi-thread malloc implementation |
|
2 ================================================ |
|
3 |
|
4 Wolfram Gloger (wg@malloc.de) |
|
5 |
|
6 Jan 2006 |
|
7 |
|
8 |
|
9 Thanks |
|
10 ====== |
|
11 |
|
12 This release was partly funded by Pixar Animation Studios. I would |
|
13 like to thank David Baraff of Pixar for his support and Doug Lea |
|
14 (dl@cs.oswego.edu) for the great original malloc implementation. |
|
15 |
|
16 |
|
17 Introduction |
|
18 ============ |
|
19 |
|
20 This package is a modified version of Doug Lea's malloc-2.8.3 |
|
21 implementation (available seperately from ftp://g.oswego.edu/pub/misc) |
|
22 that I adapted for multiple threads, while trying to avoid lock |
|
23 contention as much as possible. |
|
24 |
|
25 As part of the GNU C library, the source files may be available under |
|
26 the GNU Library General Public License (see the comments in the |
|
27 files). But as part of this stand-alone package, the code is also |
|
28 available under the (probably less restrictive) conditions described |
|
29 in the file 'COPYRIGHT'. In any case, there is no warranty whatsoever |
|
30 for this package. |
|
31 |
|
32 The current distribution should be available from: |
|
33 |
|
34 http://www.malloc.de/malloc/ptmalloc3.tar.gz |
|
35 |
|
36 |
|
37 Compilation |
|
38 =========== |
|
39 |
|
40 It should be possible to build ptmalloc3 on any UN*X-like system that |
|
41 implements the sbrk(), mmap(), munmap() and mprotect() calls. Since |
|
42 there are now several source files, a library (libptmalloc3.a) is |
|
43 generated. See the Makefile for examples of the compile-time options. |
|
44 |
|
45 Note that support for non-ANSI compilers is no longer there. |
|
46 |
|
47 Several example targets are provided in the Makefile: |
|
48 |
|
49 o Posix threads (pthreads), compile with "make posix" |
|
50 |
|
51 o Posix threads with explicit initialization, compile with |
|
52 "make posix-explicit" (known to be required on HPUX) |
|
53 |
|
54 o Posix threads without "tsd data hack" (see below), compile with |
|
55 "make posix-with-tsd" |
|
56 |
|
57 o Solaris threads, compile with "make solaris" |
|
58 |
|
59 o SGI sproc() threads, compile with "make sproc" |
|
60 |
|
61 o no threads, compile with "make nothreads" (currently out of order?) |
|
62 |
|
63 For Linux: |
|
64 |
|
65 o make "linux-pthread" (almost the same as "make posix") or |
|
66 make "linux-shared" |
|
67 |
|
68 Note that some compilers need special flags for multi-threaded code, |
|
69 e.g. with Solaris cc with Posix threads, one should use: |
|
70 |
|
71 % make posix SYS_FLAGS='-mt' |
|
72 |
|
73 Some additional targets, ending in `-libc', are also provided in the |
|
74 Makefile, to compare performance of the test programs to the case when |
|
75 linking with the standard malloc implementation in libc. |
|
76 |
|
77 A potential problem remains: If any of the system-specific functions |
|
78 for getting/setting thread-specific data or for locking a mutex call |
|
79 one of the malloc-related functions internally, the implementation |
|
80 cannot work at all due to infinite recursion. One example seems to be |
|
81 Solaris 2.4. I would like to hear if this problem occurs on other |
|
82 systems, and whether similar workarounds could be applied. |
|
83 |
|
84 For Posix threads, too, an optional hack like that has been integrated |
|
85 (activated when defining USE_TSD_DATA_HACK) which depends on |
|
86 `pthread_t' being convertible to an integral type (which is of course |
|
87 not generally guaranteed). USE_TSD_DATA_HACK is now the default |
|
88 because I haven't yet found a non-glibc pthreads system where this |
|
89 hack is _not_ needed. |
|
90 |
|
91 *NEW* and _important_: In (currently) one place in the ptmalloc3 |
|
92 source, a write memory barrier is needed, named |
|
93 atomic_write_barrier(). This macro needs to be defined at the end of |
|
94 malloc-machine.h. For gcc, a fallback in the form of a full memory |
|
95 barrier is already defined, but you may need to add another definition |
|
96 if you don't use gcc. |
|
97 |
|
98 Usage |
|
99 ===== |
|
100 |
|
101 Just link libptmalloc3 into your application. |
|
102 |
|
103 Some wicked systems (e.g. HPUX apparently) won't let malloc call _any_ |
|
104 thread-related functions before main(). On these systems, |
|
105 USE_STARTER=2 must be defined during compilation (see "make |
|
106 posix-explicit" above) and the global initialization function |
|
107 ptmalloc_init() must be called explicitly, preferably at the start of |
|
108 main(). |
|
109 |
|
110 Otherwise, when using ptmalloc3, no special precautions are necessary. |
|
111 |
|
112 Link order is important |
|
113 ======================= |
|
114 |
|
115 On some systems, when overriding malloc and linking against shared |
|
116 libraries, the link order becomes very important. E.g., when linking |
|
117 C++ programs on Solaris with Solaris threads [this is probably now |
|
118 obsolete], don't rely on libC being included by default, but instead |
|
119 put `-lthread' behind `-lC' on the command line: |
|
120 |
|
121 CC ... libptmalloc3.a -lC -lthread |
|
122 |
|
123 This is because there are global constructors in libC that need |
|
124 malloc/ptmalloc, which in turn needs to have the thread library to be |
|
125 already initialized. |
|
126 |
|
127 Debugging hooks |
|
128 =============== |
|
129 |
|
130 All calls to malloc(), realloc(), free() and memalign() are routed |
|
131 through the global function pointers __malloc_hook, __realloc_hook, |
|
132 __free_hook and __memalign_hook if they are not NULL (see the malloc.h |
|
133 header file for declarations of these pointers). Therefore the malloc |
|
134 implementation can be changed at runtime, if care is taken not to call |
|
135 free() or realloc() on pointers obtained with a different |
|
136 implementation than the one currently in effect. (The easiest way to |
|
137 guarantee this is to set up the hooks before any malloc call, e.g. |
|
138 with a function pointed to by the global variable |
|
139 __malloc_initialize_hook). |
|
140 |
|
141 You can now also tune other malloc parameters (normally adjused via |
|
142 mallopt() calls from the application) with environment variables: |
|
143 |
|
144 MALLOC_TRIM_THRESHOLD_ for deciding to shrink the heap (in bytes) |
|
145 |
|
146 MALLOC_GRANULARITY_ The unit for allocating and deallocating |
|
147 MALLOC_TOP_PAD_ memory from the system. The default |
|
148 is 64k and this parameter _must_ be a |
|
149 power of 2. |
|
150 |
|
151 MALLOC_MMAP_THRESHOLD_ min. size for chunks allocated via |
|
152 mmap() (in bytes) |
|
153 |
|
154 Tests |
|
155 ===== |
|
156 |
|
157 Two testing applications, t-test1 and t-test2, are included in this |
|
158 source distribution. Both perform pseudo-random sequences of |
|
159 allocations/frees, and can be given numeric arguments (all arguments |
|
160 are optional): |
|
161 |
|
162 % t-test[12] <n-total> <n-parallel> <n-allocs> <size-max> <bins> |
|
163 |
|
164 n-total = total number of threads executed (default 10) |
|
165 n-parallel = number of threads running in parallel (2) |
|
166 n-allocs = number of malloc()'s / free()'s per thread (10000) |
|
167 size-max = max. size requested with malloc() in bytes (10000) |
|
168 bins = number of bins to maintain |
|
169 |
|
170 The first test `t-test1' maintains a completely seperate pool of |
|
171 allocated bins for each thread, and should therefore show full |
|
172 parallelism. On the other hand, `t-test2' creates only a single pool |
|
173 of bins, and each thread randomly allocates/frees any bin. Some lock |
|
174 contention is to be expected in this case, as the threads frequently |
|
175 cross each others arena. |
|
176 |
|
177 Performance results from t-test1 should be quite repeatable, while the |
|
178 behaviour of t-test2 depends on scheduling variations. |
|
179 |
|
180 Conclusion |
|
181 ========== |
|
182 |
|
183 I'm always interested in performance data and feedback, just send mail |
|
184 to ptmalloc@malloc.de. |
|
185 |
|
186 Good luck! |