author | martin.trojer@nokia.com |
Fri, 31 Jul 2009 15:01:17 +0100 | |
changeset 1 | 2fb8b9db1c86 |
permissions | -rw-r--r-- |
1
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
1 |
This is a patched version of zlib modified to use |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
2 |
Pentium-optimized assembly code in the deflation algorithm. The files |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
3 |
changed/added by this patch are: |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
4 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
5 |
README.586 |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
6 |
match.S |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
7 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
8 |
The effectiveness of these modifications is a bit marginal, as the the |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
9 |
program's bottleneck seems to be mostly L1-cache contention, for which |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
10 |
there is no real way to work around without rewriting the basic |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
11 |
algorithm. The speedup on average is around 5-10% (which is generally |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
12 |
less than the amount of variance between subsequent executions). |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
13 |
However, when used at level 9 compression, the cache contention can |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
14 |
drop enough for the assembly version to achieve 10-20% speedup (and |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
15 |
sometimes more, depending on the amount of overall redundancy in the |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
16 |
files). Even here, though, cache contention can still be the limiting |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
17 |
factor, depending on the nature of the program using the zlib library. |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
18 |
This may also mean that better improvements will be seen on a Pentium |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
19 |
with MMX, which suffers much less from L1-cache contention, but I have |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
20 |
not yet verified this. |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
21 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
22 |
Note that this code has been tailored for the Pentium in particular, |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
23 |
and will not perform well on the Pentium Pro (due to the use of a |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
24 |
partial register in the inner loop). |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
25 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
26 |
If you are using an assembler other than GNU as, you will have to |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
27 |
translate match.S to use your assembler's syntax. (Have fun.) |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
28 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
29 |
Brian Raiter |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
30 |
breadbox@muppetlabs.com |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
31 |
April, 1998 |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
32 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
33 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
34 |
Added for zlib 1.1.3: |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
35 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
36 |
The patches come from |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
37 |
http://www.muppetlabs.com/~breadbox/software/assembly.html |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
38 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
39 |
To compile zlib with this asm file, copy match.S to the zlib directory |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
40 |
then do: |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
41 |
|
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
42 |
CFLAGS="-O3 -DASMV" ./configure |
2fb8b9db1c86
Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff
changeset
|
43 |
make OBJA=match.o |