|
1 The most reliable way of running benchmarks is to do it in an otherwise idle |
|
2 system. On a busy system, the results will vary according to the other tasks |
|
3 demanding attention in the system. |
|
4 |
|
5 We have managed to obtain quite reliable results by doing the following on |
|
6 Linux (and you need root): |
|
7 |
|
8 - switching the scheduler to a Real-Time mode |
|
9 - setting the processor affinity to one single processor |
|
10 - disabling the other thread of the same core |
|
11 |
|
12 This should work rather well for CPU-intensive tasks. A task that is in Real- |
|
13 Time mode will simply not be preempted by the OS. But if you make OS syscalls, |
|
14 especially I/O ones, your task will be de-scheduled. Note that this includes |
|
15 page faults, so if you can, make sure your benchmark's warmup code paths touch |
|
16 most of the data. |
|
17 |
|
18 To do this you need a tool called schedtool (package schedtool), from |
|
19 http://freequaos.host.sk/schedtool/ |
|
20 |
|
21 From this point on, we are using CPU0 for all tasks: |
|
22 |
|
23 If you have a Hyperthreaded multi-core processor (Core-i5 and Core-i7), you |
|
24 have to disable the other thread of the same core as CPU0. To discover which |
|
25 one it is: |
|
26 |
|
27 $ cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list |
|
28 |
|
29 This will print something like 0,4, meaning that CPUs 0 and 4 are sibling |
|
30 threads on the same core. So we'll turn CPU 4 off: |
|
31 |
|
32 (as root) |
|
33 # echo 0 > /sys/devices/system/cpu/cpu4/online |
|
34 |
|
35 To turn it back on, echo 1 into the same file. |
|
36 |
|
37 To run a task on CPU 0 exclusively, using FIFO RT priority 10, you run the |
|
38 following: |
|
39 |
|
40 (as root) |
|
41 # schedtool -F -p 10 -a 1 -e ./taskname |
|
42 |
|
43 For example: |
|
44 # schedtool -F -p 10 -a 1 -e ./tst_bench_qstring -tickcounter |
|
45 |
|
46 Warning: if your task livelocks or takes far too long to complete, your system |
|
47 may be unusable for a long time, especially if you don't have other cores to |
|
48 run stuff on. To prevent that, run it before schedtool and time it. |
|
49 |
|
50 You can also limit the CPU time that the task is allowed to take. Run in the |
|
51 same shell as you'll run schedtool: |
|
52 |
|
53 $ ulimit -s 300 |
|
54 To limit to 300 seconds (5 minutes) |
|
55 |
|
56 If your task runs away, it will get a SIGXCPU after consuming 5 minutes of CPU |
|
57 time (5 minutes running at 100%). |
|
58 |
|
59 If your app is multithreaded, you may want to give it more CPUs, like CPU0 and |
|
60 CPU1 with -a 3 (it's a bitmask). |
|
61 |
|
62 For best results, you should disable ALL other cores and threads of the same |
|
63 processor. The new Core-i7 have one processor with 4 cores, |
|
64 each core can run 2 threads; the older Mac Pros have two processors with 4 |
|
65 cores each. So on those Mac Pros, you'd disable cores 1, 2 and 3, while on the |
|
66 Core-i7, you'll need to disable all other CPUs. |
|
67 |
|
68 However, disabling just the sibling thread seems to produce very reliable |
|
69 results for me already, with variance often below 0.5% (even though there are |
|
70 some measurable spikes). |
|
71 |
|
72 Other things to try: |
|
73 |
|
74 Running the benchmark with highest priority, i.e. "sudo nice -19" |
|
75 usually produces stable results on some machines. If the benchmark also |
|
76 involves displaying something on the screen (on X11), running it with |
|
77 "-sync" is a must. Though, in that case the "real" cost is not correct, |
|
78 but it is useful to discover regressions. |
|
79 |
|
80 Also; not many people know about ionice (1) |
|
81 ionice - get/set program io scheduling class and priority |