|
1 ________________________________________________________________________ |
|
2 |
|
3 PYBENCH - A Python Benchmark Suite |
|
4 ________________________________________________________________________ |
|
5 |
|
6 Extendable suite of of low-level benchmarks for measuring |
|
7 the performance of the Python implementation |
|
8 (interpreter, compiler or VM). |
|
9 |
|
10 pybench is a collection of tests that provides a standardized way to |
|
11 measure the performance of Python implementations. It takes a very |
|
12 close look at different aspects of Python programs and let's you |
|
13 decide which factors are more important to you than others, rather |
|
14 than wrapping everything up in one number, like the other performance |
|
15 tests do (e.g. pystone which is included in the Python Standard |
|
16 Library). |
|
17 |
|
18 pybench has been used in the past by several Python developers to |
|
19 track down performance bottlenecks or to demonstrate the impact of |
|
20 optimizations and new features in Python. |
|
21 |
|
22 The command line interface for pybench is the file pybench.py. Run |
|
23 this script with option '--help' to get a listing of the possible |
|
24 options. Without options, pybench will simply execute the benchmark |
|
25 and then print out a report to stdout. |
|
26 |
|
27 |
|
28 Micro-Manual |
|
29 ------------ |
|
30 |
|
31 Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run |
|
32 the benchmark suite using default settings and 'pybench.py -f <file>' |
|
33 to have it store the results in a file too. |
|
34 |
|
35 It is usually a good idea to run pybench.py multiple times to see |
|
36 whether the environment, timers and benchmark run-times are suitable |
|
37 for doing benchmark tests. |
|
38 |
|
39 You can use the comparison feature of pybench.py ('pybench.py -c |
|
40 <file>') to check how well the system behaves in comparison to a |
|
41 reference run. |
|
42 |
|
43 If the differences are well below 10% for each test, then you have a |
|
44 system that is good for doing benchmark testings. Of you get random |
|
45 differences of more than 10% or significant differences between the |
|
46 values for minimum and average time, then you likely have some |
|
47 background processes running which cause the readings to become |
|
48 inconsistent. Examples include: web-browsers, email clients, RSS |
|
49 readers, music players, backup programs, etc. |
|
50 |
|
51 If you are only interested in a few tests of the whole suite, you can |
|
52 use the filtering option, e.g. 'pybench.py -t string' will only |
|
53 run/show the tests that have 'string' in their name. |
|
54 |
|
55 This is the current output of pybench.py --help: |
|
56 |
|
57 """ |
|
58 ------------------------------------------------------------------------ |
|
59 PYBENCH - a benchmark test suite for Python interpreters/compilers. |
|
60 ------------------------------------------------------------------------ |
|
61 |
|
62 Synopsis: |
|
63 pybench.py [option] files... |
|
64 |
|
65 Options and default settings: |
|
66 -n arg number of rounds (10) |
|
67 -f arg save benchmark to file arg () |
|
68 -c arg compare benchmark with the one in file arg () |
|
69 -s arg show benchmark in file arg, then exit () |
|
70 -w arg set warp factor to arg (10) |
|
71 -t arg run only tests with names matching arg () |
|
72 -C arg set the number of calibration runs to arg (20) |
|
73 -d hide noise in comparisons (0) |
|
74 -v verbose output (not recommended) (0) |
|
75 --with-gc enable garbage collection (0) |
|
76 --with-syscheck use default sys check interval (0) |
|
77 --timer arg use given timer (time.time) |
|
78 -h show this help text |
|
79 --help show this help text |
|
80 --debug enable debugging |
|
81 --copyright show copyright |
|
82 --examples show examples of usage |
|
83 |
|
84 Version: |
|
85 2.0 |
|
86 |
|
87 The normal operation is to run the suite and display the |
|
88 results. Use -f to save them for later reuse or comparisons. |
|
89 |
|
90 Available timers: |
|
91 |
|
92 time.time |
|
93 time.clock |
|
94 systimes.processtime |
|
95 |
|
96 Examples: |
|
97 |
|
98 python2.1 pybench.py -f p21.pybench |
|
99 python2.5 pybench.py -f p25.pybench |
|
100 python pybench.py -s p25.pybench -c p21.pybench |
|
101 """ |
|
102 |
|
103 License |
|
104 ------- |
|
105 |
|
106 See LICENSE file. |
|
107 |
|
108 |
|
109 Sample output |
|
110 ------------- |
|
111 |
|
112 """ |
|
113 ------------------------------------------------------------------------------- |
|
114 PYBENCH 2.0 |
|
115 ------------------------------------------------------------------------------- |
|
116 * using Python 2.4.2 |
|
117 * disabled garbage collection |
|
118 * system check interval set to maximum: 2147483647 |
|
119 * using timer: time.time |
|
120 |
|
121 Calibrating tests. Please wait... |
|
122 |
|
123 Running 10 round(s) of the suite at warp factor 10: |
|
124 |
|
125 * Round 1 done in 6.388 seconds. |
|
126 * Round 2 done in 6.485 seconds. |
|
127 * Round 3 done in 6.786 seconds. |
|
128 ... |
|
129 * Round 10 done in 6.546 seconds. |
|
130 |
|
131 ------------------------------------------------------------------------------- |
|
132 Benchmark: 2006-06-12 12:09:25 |
|
133 ------------------------------------------------------------------------------- |
|
134 |
|
135 Rounds: 10 |
|
136 Warp: 10 |
|
137 Timer: time.time |
|
138 |
|
139 Machine Details: |
|
140 Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 |
|
141 Processor: x86_64 |
|
142 |
|
143 Python: |
|
144 Executable: /usr/local/bin/python |
|
145 Version: 2.4.2 |
|
146 Compiler: GCC 3.3.4 (pre 3.3.5 20040809) |
|
147 Bits: 64bit |
|
148 Build: Oct 1 2005 15:24:35 (#1) |
|
149 Unicode: UCS2 |
|
150 |
|
151 |
|
152 Test minimum average operation overhead |
|
153 ------------------------------------------------------------------------------- |
|
154 BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms |
|
155 BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms |
|
156 CompareFloats: 109ms 110ms 0.09us 0.361ms |
|
157 CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms |
|
158 CompareIntegers: 137ms 138ms 0.08us 0.542ms |
|
159 CompareInternedStrings: 124ms 127ms 0.08us 1.367ms |
|
160 CompareLongs: 100ms 104ms 0.10us 0.316ms |
|
161 CompareStrings: 111ms 115ms 0.12us 0.929ms |
|
162 CompareUnicode: 108ms 128ms 0.17us 0.693ms |
|
163 ConcatStrings: 142ms 155ms 0.31us 0.562ms |
|
164 ConcatUnicode: 119ms 127ms 0.42us 0.384ms |
|
165 CreateInstances: 123ms 128ms 1.14us 0.367ms |
|
166 CreateNewInstances: 121ms 126ms 1.49us 0.335ms |
|
167 CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms |
|
168 CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms |
|
169 DictCreation: 108ms 109ms 0.27us 0.361ms |
|
170 DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms |
|
171 DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms |
|
172 DictWithStringKeys: 114ms 117ms 0.10us 0.905ms |
|
173 ForLoops: 110ms 111ms 4.46us 0.063ms |
|
174 IfThenElse: 118ms 119ms 0.09us 0.685ms |
|
175 ListSlicing: 116ms 120ms 8.59us 0.103ms |
|
176 NestedForLoops: 125ms 137ms 0.09us 0.019ms |
|
177 NormalClassAttribute: 124ms 136ms 0.11us 0.457ms |
|
178 NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms |
|
179 PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms |
|
180 PythonMethodCalls: 140ms 149ms 0.66us 0.141ms |
|
181 Recursion: 156ms 166ms 3.32us 0.452ms |
|
182 SecondImport: 112ms 118ms 1.18us 0.180ms |
|
183 SecondPackageImport: 118ms 127ms 1.27us 0.180ms |
|
184 SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms |
|
185 SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms |
|
186 SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms |
|
187 SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms |
|
188 SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms |
|
189 SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms |
|
190 SimpleListManipulation: 103ms 113ms 0.10us 0.587ms |
|
191 SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms |
|
192 SmallLists: 105ms 116ms 0.17us 0.366ms |
|
193 SmallTuples: 108ms 128ms 0.24us 0.406ms |
|
194 SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms |
|
195 SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms |
|
196 StringMappings: 115ms 121ms 0.48us 0.405ms |
|
197 StringPredicates: 120ms 129ms 0.18us 2.064ms |
|
198 StringSlicing: 111ms 127ms 0.23us 0.781ms |
|
199 TryExcept: 125ms 126ms 0.06us 0.681ms |
|
200 TryRaiseExcept: 133ms 137ms 2.14us 0.361ms |
|
201 TupleSlicing: 117ms 120ms 0.46us 0.066ms |
|
202 UnicodeMappings: 156ms 160ms 4.44us 0.429ms |
|
203 UnicodePredicates: 117ms 121ms 0.22us 2.487ms |
|
204 UnicodeProperties: 115ms 153ms 0.38us 2.070ms |
|
205 UnicodeSlicing: 126ms 129ms 0.26us 0.689ms |
|
206 ------------------------------------------------------------------------------- |
|
207 Totals: 6283ms 6673ms |
|
208 """ |
|
209 ________________________________________________________________________ |
|
210 |
|
211 Writing New Tests |
|
212 ________________________________________________________________________ |
|
213 |
|
214 pybench tests are simple modules defining one or more pybench.Test |
|
215 subclasses. |
|
216 |
|
217 Writing a test essentially boils down to providing two methods: |
|
218 .test() which runs .rounds number of .operations test operations each |
|
219 and .calibrate() which does the same except that it doesn't actually |
|
220 execute the operations. |
|
221 |
|
222 |
|
223 Here's an example: |
|
224 ------------------ |
|
225 |
|
226 from pybench import Test |
|
227 |
|
228 class IntegerCounting(Test): |
|
229 |
|
230 # Version number of the test as float (x.yy); this is important |
|
231 # for comparisons of benchmark runs - tests with unequal version |
|
232 # number will not get compared. |
|
233 version = 1.0 |
|
234 |
|
235 # The number of abstract operations done in each round of the |
|
236 # test. An operation is the basic unit of what you want to |
|
237 # measure. The benchmark will output the amount of run-time per |
|
238 # operation. Note that in order to raise the measured timings |
|
239 # significantly above noise level, it is often required to repeat |
|
240 # sets of operations more than once per test round. The measured |
|
241 # overhead per test round should be less than 1 second. |
|
242 operations = 20 |
|
243 |
|
244 # Number of rounds to execute per test run. This should be |
|
245 # adjusted to a figure that results in a test run-time of between |
|
246 # 1-2 seconds (at warp 1). |
|
247 rounds = 100000 |
|
248 |
|
249 def test(self): |
|
250 |
|
251 """ Run the test. |
|
252 |
|
253 The test needs to run self.rounds executing |
|
254 self.operations number of operations each. |
|
255 |
|
256 """ |
|
257 # Init the test |
|
258 a = 1 |
|
259 |
|
260 # Run test rounds |
|
261 # |
|
262 # NOTE: Use xrange() for all test loops unless you want to face |
|
263 # a 20MB process ! |
|
264 # |
|
265 for i in xrange(self.rounds): |
|
266 |
|
267 # Repeat the operations per round to raise the run-time |
|
268 # per operation significantly above the noise level of the |
|
269 # for-loop overhead. |
|
270 |
|
271 # Execute 20 operations (a += 1): |
|
272 a += 1 |
|
273 a += 1 |
|
274 a += 1 |
|
275 a += 1 |
|
276 a += 1 |
|
277 a += 1 |
|
278 a += 1 |
|
279 a += 1 |
|
280 a += 1 |
|
281 a += 1 |
|
282 a += 1 |
|
283 a += 1 |
|
284 a += 1 |
|
285 a += 1 |
|
286 a += 1 |
|
287 a += 1 |
|
288 a += 1 |
|
289 a += 1 |
|
290 a += 1 |
|
291 a += 1 |
|
292 |
|
293 def calibrate(self): |
|
294 |
|
295 """ Calibrate the test. |
|
296 |
|
297 This method should execute everything that is needed to |
|
298 setup and run the test - except for the actual operations |
|
299 that you intend to measure. pybench uses this method to |
|
300 measure the test implementation overhead. |
|
301 |
|
302 """ |
|
303 # Init the test |
|
304 a = 1 |
|
305 |
|
306 # Run test rounds (without actually doing any operation) |
|
307 for i in xrange(self.rounds): |
|
308 |
|
309 # Skip the actual execution of the operations, since we |
|
310 # only want to measure the test's administration overhead. |
|
311 pass |
|
312 |
|
313 Registering a new test module |
|
314 ----------------------------- |
|
315 |
|
316 To register a test module with pybench, the classes need to be |
|
317 imported into the pybench.Setup module. pybench will then scan all the |
|
318 symbols defined in that module for subclasses of pybench.Test and |
|
319 automatically add them to the benchmark suite. |
|
320 |
|
321 |
|
322 Breaking Comparability |
|
323 ---------------------- |
|
324 |
|
325 If a change is made to any individual test that means it is no |
|
326 longer strictly comparable with previous runs, the '.version' class |
|
327 variable should be updated. Therefafter, comparisons with previous |
|
328 versions of the test will list as "n/a" to reflect the change. |
|
329 |
|
330 |
|
331 Version History |
|
332 --------------- |
|
333 |
|
334 2.0: rewrote parts of pybench which resulted in more repeatable |
|
335 timings: |
|
336 - made timer a parameter |
|
337 - changed the platform default timer to use high-resolution |
|
338 timers rather than process timers (which have a much lower |
|
339 resolution) |
|
340 - added option to select timer |
|
341 - added process time timer (using systimes.py) |
|
342 - changed to use min() as timing estimator (average |
|
343 is still taken as well to provide an idea of the difference) |
|
344 - garbage collection is turned off per default |
|
345 - sys check interval is set to the highest possible value |
|
346 - calibration is now a separate step and done using |
|
347 a different strategy that allows measuring the test |
|
348 overhead more accurately |
|
349 - modified the tests to each give a run-time of between |
|
350 100-200ms using warp 10 |
|
351 - changed default warp factor to 10 (from 20) |
|
352 - compared results with timeit.py and confirmed measurements |
|
353 - bumped all test versions to 2.0 |
|
354 - updated platform.py to the latest version |
|
355 - changed the output format a bit to make it look |
|
356 nicer |
|
357 - refactored the APIs somewhat |
|
358 1.3+: Steve Holden added the NewInstances test and the filtering |
|
359 option during the NeedForSpeed sprint; this also triggered a long |
|
360 discussion on how to improve benchmark timing and finally |
|
361 resulted in the release of 2.0 |
|
362 1.3: initial checkin into the Python SVN repository |
|
363 |
|
364 |
|
365 Have fun, |
|
366 -- |
|
367 Marc-Andre Lemburg |
|
368 mal@lemburg.com |