symbian-qemu-0.9.1-12/zlib-1.2.3/algorithm.txt
author johnathan.white@2718R8BGH51.accenture.com
Mon, 08 Mar 2010 18:45:03 +0000
changeset 46 b6935a90ca64
parent 1 2fb8b9db1c86
permissions -rw-r--r--
Modify framebuffer and NGA framebuffer to read screen size from board model dtb file. Optimise memory usuage of frame buffer Add example minigui application with hooks to profiler (which writes results to S:\). Modified NGA framebuffer to run its own dfc queue at high priority
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     1
1. Compression algorithm (deflate)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     3
The deflation algorithm used by gzip (also zip and zlib) is a variation of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     4
LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     5
the input data.  The second occurrence of a string is replaced by a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     6
pointer to the previous string, in the form of a pair (distance,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     7
length).  Distances are limited to 32K bytes, and lengths are limited
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     8
to 258 bytes. When a string does not occur anywhere in the previous
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
     9
32K bytes, it is emitted as a sequence of literal bytes.  (In this
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    10
description, `string' must be taken as an arbitrary sequence of bytes,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    11
and is not restricted to printable characters.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    12
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    13
Literals or match lengths are compressed with one Huffman tree, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    14
match distances are compressed with another tree. The trees are stored
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    15
in a compact form at the start of each block. The blocks can have any
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    16
size (except that the compressed data for one block must fit in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    17
available memory). A block is terminated when deflate() determines that
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    18
it would be useful to start another block with fresh trees. (This is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    19
somewhat similar to the behavior of LZW-based _compress_.)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    20
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    21
Duplicated strings are found using a hash table. All input strings of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    22
length 3 are inserted in the hash table. A hash index is computed for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    23
the next 3 bytes. If the hash chain for this index is not empty, all
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    24
strings in the chain are compared with the current input string, and
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    25
the longest match is selected.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    26
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    27
The hash chains are searched starting with the most recent strings, to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    28
favor small distances and thus take advantage of the Huffman encoding.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    29
The hash chains are singly linked. There are no deletions from the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    30
hash chains, the algorithm simply discards matches that are too old.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    31
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    32
To avoid a worst-case situation, very long hash chains are arbitrarily
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    33
truncated at a certain length, determined by a runtime option (level
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    34
parameter of deflateInit). So deflate() does not always find the longest
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    35
possible match but generally finds a match which is long enough.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    36
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    37
deflate() also defers the selection of matches with a lazy evaluation
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    38
mechanism. After a match of length N has been found, deflate() searches for
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    39
a longer match at the next input byte. If a longer match is found, the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    40
previous match is truncated to a length of one (thus producing a single
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    41
literal byte) and the process of lazy evaluation begins again. Otherwise,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    42
the original match is kept, and the next match search is attempted only N
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    43
steps later.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    44
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    45
The lazy match evaluation is also subject to a runtime parameter. If
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    46
the current match is long enough, deflate() reduces the search for a longer
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    47
match, thus speeding up the whole process. If compression ratio is more
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    48
important than speed, deflate() attempts a complete second search even if
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    49
the first match is already long enough.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    50
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    51
The lazy match evaluation is not performed for the fastest compression
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    52
modes (level parameter 1 to 3). For these fast modes, new strings
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    53
are inserted in the hash table only when no match was found, or
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    54
when the match is not too long. This degrades the compression ratio
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    55
but saves time since there are both fewer insertions and fewer searches.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    56
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    57
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    58
2. Decompression algorithm (inflate)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    59
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    60
2.1 Introduction
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    61
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    62
The key question is how to represent a Huffman code (or any prefix code) so
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    63
that you can decode fast.  The most important characteristic is that shorter
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    64
codes are much more common than longer codes, so pay attention to decoding the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    65
short codes fast, and let the long codes take longer to decode.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    66
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    67
inflate() sets up a first level table that covers some number of bits of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    68
input less than the length of longest code.  It gets that many bits from the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    69
stream, and looks it up in the table.  The table will tell if the next
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    70
code is that many bits or less and how many, and if it is, it will tell
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    71
the value, else it will point to the next level table for which inflate()
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    72
grabs more bits and tries to decode a longer code.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    73
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    74
How many bits to make the first lookup is a tradeoff between the time it
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    75
takes to decode and the time it takes to build the table.  If building the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    76
table took no time (and if you had infinite memory), then there would only
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    77
be a first level table to cover all the way to the longest code.  However,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    78
building the table ends up taking a lot longer for more bits since short
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    79
codes are replicated many times in such a table.  What inflate() does is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    80
simply to make the number of bits in the first table a variable, and  then
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    81
to set that variable for the maximum speed.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    82
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    83
For inflate, which has 286 possible codes for the literal/length tree, the size
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    84
of the first table is nine bits.  Also the distance trees have 30 possible
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    85
values, and the size of the first table is six bits.  Note that for each of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    86
those cases, the table ended up one bit longer than the ``average'' code
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    87
length, i.e. the code length of an approximately flat code which would be a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    88
little more than eight bits for 286 symbols and a little less than five bits
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    89
for 30 symbols.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    90
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    91
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    92
2.2 More details on the inflate table lookup
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    93
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    94
Ok, you want to know what this cleverly obfuscated inflate tree actually
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    95
looks like.  You are correct that it's not a Huffman tree.  It is simply a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    96
lookup table for the first, let's say, nine bits of a Huffman symbol.  The
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    97
symbol could be as short as one bit or as long as 15 bits.  If a particular
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    98
symbol is shorter than nine bits, then that symbol's translation is duplicated
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
    99
in all those entries that start with that symbol's bits.  For example, if the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   100
symbol is four bits, then it's duplicated 32 times in a nine-bit table.  If a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   101
symbol is nine bits long, it appears in the table once.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   102
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   103
If the symbol is longer than nine bits, then that entry in the table points
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   104
to another similar table for the remaining bits.  Again, there are duplicated
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   105
entries as needed.  The idea is that most of the time the symbol will be short
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   106
and there will only be one table look up.  (That's whole idea behind data
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   107
compression in the first place.)  For the less frequent long symbols, there
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   108
will be two lookups.  If you had a compression method with really long
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   109
symbols, you could have as many levels of lookups as is efficient.  For
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   110
inflate, two is enough.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   111
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   112
So a table entry either points to another table (in which case nine bits in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   113
the above example are gobbled), or it contains the translation for the symbol
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   114
and the number of bits to gobble.  Then you start again with the next
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   115
ungobbled bit.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   116
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   117
You may wonder: why not just have one lookup table for how ever many bits the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   118
longest symbol is?  The reason is that if you do that, you end up spending
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   119
more time filling in duplicate symbol entries than you do actually decoding.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   120
At least for deflate's output that generates new trees every several 10's of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   121
kbytes.  You can imagine that filling in a 2^15 entry table for a 15-bit code
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   122
would take too long if you're only decoding several thousand symbols.  At the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   123
other extreme, you could make a new table for every bit in the code.  In fact,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   124
that's essentially a Huffman tree.  But then you spend two much time
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   125
traversing the tree while decoding, even for short symbols.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   126
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   127
So the number of bits for the first lookup table is a trade of the time to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   128
fill out the table vs. the time spent looking at the second level and above of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   129
the table.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   130
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   131
Here is an example, scaled down:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   132
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   133
The code being decoded, with 10 symbols, from 1 to 6 bits long:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   134
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   135
A: 0
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   136
B: 10
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   137
C: 1100
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   138
D: 11010
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   139
E: 11011
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   140
F: 11100
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   141
G: 11101
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   142
H: 11110
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   143
I: 111110
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   144
J: 111111
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   145
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   146
Let's make the first table three bits long (eight entries):
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   147
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   148
000: A,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   149
001: A,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   150
010: A,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   151
011: A,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   152
100: B,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   153
101: B,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   154
110: -> table X (gobble 3 bits)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   155
111: -> table Y (gobble 3 bits)
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   156
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   157
Each entry is what the bits decode as and how many bits that is, i.e. how
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   158
many bits to gobble.  Or the entry points to another table, with the number of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   159
bits to gobble implicit in the size of the table.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   160
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   161
Table X is two bits long since the longest code starting with 110 is five bits
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   162
long:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   163
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   164
00: C,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   165
01: C,1
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   166
10: D,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   167
11: E,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   168
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   169
Table Y is three bits long since the longest code starting with 111 is six
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   170
bits long:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   171
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   172
000: F,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   173
001: F,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   174
010: G,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   175
011: G,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   176
100: H,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   177
101: H,2
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   178
110: I,3
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   179
111: J,3
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   180
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   181
So what we have here are three tables with a total of 20 entries that had to
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   182
be constructed.  That's compared to 64 entries for a single table.  Or
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   183
compared to 16 entries for a Huffman tree (six two entry tables and one four
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   184
entry table).  Assuming that the code ideally represents the probability of
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   185
the symbols, it takes on the average 1.25 lookups per symbol.  That's compared
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   186
to one lookup for the single table, or 1.66 lookups per symbol for the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   187
Huffman tree.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   188
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   189
There, I think that gives you a picture of what's going on.  For inflate, the
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   190
meaning of a particular symbol is often more than just a letter.  It can be a
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   191
byte (a "literal"), or it can be either a length or a distance which
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   192
indicates a base value and a number of bits to fetch after the code that is
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   193
added to the base value.  Or it might be the special end-of-block code.  The
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   194
data structures created in inftrees.c try to encode all that information
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   195
compactly in the tables.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   196
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   197
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   198
Jean-loup Gailly        Mark Adler
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   199
jloup@gzip.org          madler@alumni.caltech.edu
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   200
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   201
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   202
References:
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   203
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   204
[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   205
Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   206
pp. 337-343.
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   207
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   208
``DEFLATE Compressed Data Format Specification'' available in
2fb8b9db1c86 Initial QEMU (symbian-qemu-0.9.1-12) import
martin.trojer@nokia.com
parents:
diff changeset
   209
http://www.ietf.org/rfc/rfc1951.txt