|
1 This document describes some caveats about the use of Valgrind with |
|
2 Python. Valgrind is used periodically by Python developers to try |
|
3 to ensure there are no memory leaks or invalid memory reads/writes. |
|
4 |
|
5 If you don't want to read about the details of using Valgrind, there |
|
6 are still two things you must do to suppress the warnings. First, |
|
7 you must use a suppressions file. One is supplied in |
|
8 Misc/valgrind-python.supp. Second, you must do one of the following: |
|
9 |
|
10 * Uncomment Py_USING_MEMORY_DEBUGGER in Objects/obmalloc.c, |
|
11 then rebuild Python |
|
12 * Uncomment the lines in Misc/valgrind-python.supp that |
|
13 suppress the warnings for PyObject_Free and PyObject_Realloc |
|
14 |
|
15 If you want to use Valgrind more effectively and catch even more |
|
16 memory leaks, you will need to configure python --without-pymalloc. |
|
17 PyMalloc allocates a few blocks in big chunks and most object |
|
18 allocations don't call malloc, they use chunks doled about by PyMalloc |
|
19 from the big blocks. This means Valgrind can't detect |
|
20 many allocations (and frees), except for those that are forwarded |
|
21 to the system malloc. Note: configuring python --without-pymalloc |
|
22 makes Python run much slower, especially when running under Valgrind. |
|
23 You may need to run the tests in batches under Valgrind to keep |
|
24 the memory usage down to allow the tests to complete. It seems to take |
|
25 about 5 times longer to run --without-pymalloc. |
|
26 |
|
27 Apr 15, 2006: |
|
28 test_ctypes causes Valgrind 3.1.1 to fail (crash). |
|
29 test_socket_ssl should be skipped when running valgrind. |
|
30 The reason is that it purposely uses uninitialized memory. |
|
31 This causes many spurious warnings, so it's easier to just skip it. |
|
32 |
|
33 |
|
34 Details: |
|
35 -------- |
|
36 Python uses its own small-object allocation scheme on top of malloc, |
|
37 called PyMalloc. |
|
38 |
|
39 Valgrind may show some unexpected results when PyMalloc is used. |
|
40 Starting with Python 2.3, PyMalloc is used by default. You can disable |
|
41 PyMalloc when configuring python by adding the --without-pymalloc option. |
|
42 If you disable PyMalloc, most of the information in this document and |
|
43 the supplied suppressions file will not be useful. As discussed above, |
|
44 disabling PyMalloc can catch more problems. |
|
45 |
|
46 If you use valgrind on a default build of Python, you will see |
|
47 many errors like: |
|
48 |
|
49 ==6399== Use of uninitialised value of size 4 |
|
50 ==6399== at 0x4A9BDE7E: PyObject_Free (obmalloc.c:711) |
|
51 ==6399== by 0x4A9B8198: dictresize (dictobject.c:477) |
|
52 |
|
53 These are expected and not a problem. Tim Peters explains |
|
54 the situation: |
|
55 |
|
56 PyMalloc needs to know whether an arbitrary address is one |
|
57 that's managed by it, or is managed by the system malloc. |
|
58 The current scheme allows this to be determined in constant |
|
59 time, regardless of how many memory areas are under pymalloc's |
|
60 control. |
|
61 |
|
62 The memory pymalloc manages itself is in one or more "arenas", |
|
63 each a large contiguous memory area obtained from malloc. |
|
64 The base address of each arena is saved by pymalloc |
|
65 in a vector. Each arena is carved into "pools", and a field at |
|
66 the start of each pool contains the index of that pool's arena's |
|
67 base address in that vector. |
|
68 |
|
69 Given an arbitrary address, pymalloc computes the pool base |
|
70 address corresponding to it, then looks at "the index" stored |
|
71 near there. If the index read up is out of bounds for the |
|
72 vector of arena base addresses pymalloc maintains, then |
|
73 pymalloc knows for certain that this address is not under |
|
74 pymalloc's control. Otherwise the index is in bounds, and |
|
75 pymalloc compares |
|
76 |
|
77 the arena base address stored at that index in the vector |
|
78 |
|
79 to |
|
80 |
|
81 the arbitrary address pymalloc is investigating |
|
82 |
|
83 pymalloc controls this arbitrary address if and only if it lies |
|
84 in the arena the address's pool's index claims it lies in. |
|
85 |
|
86 It doesn't matter whether the memory pymalloc reads up ("the |
|
87 index") is initialized. If it's not initialized, then |
|
88 whatever trash gets read up will lead pymalloc to conclude |
|
89 (correctly) that the address isn't controlled by it, either |
|
90 because the index is out of bounds, or the index is in bounds |
|
91 but the arena it represents doesn't contain the address. |
|
92 |
|
93 This determination has to be made on every call to one of |
|
94 pymalloc's free/realloc entry points, so its speed is critical |
|
95 (Python allocates and frees dynamic memory at a ferocious rate |
|
96 -- everything in Python, from integers to "stack frames", |
|
97 lives in the heap). |