src/3rdparty/zlib/algorithm.txt
author Alex Gilkes <alex.gilkes@nokia.com>
Mon, 11 Jan 2010 14:00:40 +0000 (2010-01-11)
changeset 0 1918ee327afb
permissions -rw-r--r--
Revision: 200952
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     1
1. Compression algorithm (deflate)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     3
The deflation algorithm used by gzip (also zip and zlib) is a variation of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     4
LZ77 (Lempel-Ziv 1977, see reference below). It finds duplicated strings in
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     5
the input data.  The second occurrence of a string is replaced by a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     6
pointer to the previous string, in the form of a pair (distance,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     7
length).  Distances are limited to 32K bytes, and lengths are limited
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     8
to 258 bytes. When a string does not occur anywhere in the previous
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
     9
32K bytes, it is emitted as a sequence of literal bytes.  (In this
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    10
description, `string' must be taken as an arbitrary sequence of bytes,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    11
and is not restricted to printable characters.)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    12
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    13
Literals or match lengths are compressed with one Huffman tree, and
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    14
match distances are compressed with another tree. The trees are stored
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    15
in a compact form at the start of each block. The blocks can have any
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    16
size (except that the compressed data for one block must fit in
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    17
available memory). A block is terminated when deflate() determines that
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    18
it would be useful to start another block with fresh trees. (This is
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    19
somewhat similar to the behavior of LZW-based _compress_.)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    20
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    21
Duplicated strings are found using a hash table. All input strings of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    22
length 3 are inserted in the hash table. A hash index is computed for
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    23
the next 3 bytes. If the hash chain for this index is not empty, all
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    24
strings in the chain are compared with the current input string, and
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    25
the longest match is selected.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    26
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    27
The hash chains are searched starting with the most recent strings, to
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    28
favor small distances and thus take advantage of the Huffman encoding.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    29
The hash chains are singly linked. There are no deletions from the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    30
hash chains, the algorithm simply discards matches that are too old.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    31
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    32
To avoid a worst-case situation, very long hash chains are arbitrarily
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    33
truncated at a certain length, determined by a runtime option (level
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    34
parameter of deflateInit). So deflate() does not always find the longest
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    35
possible match but generally finds a match which is long enough.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    36
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    37
deflate() also defers the selection of matches with a lazy evaluation
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    38
mechanism. After a match of length N has been found, deflate() searches for
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    39
a longer match at the next input byte. If a longer match is found, the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    40
previous match is truncated to a length of one (thus producing a single
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    41
literal byte) and the process of lazy evaluation begins again. Otherwise,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    42
the original match is kept, and the next match search is attempted only N
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    43
steps later.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    44
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    45
The lazy match evaluation is also subject to a runtime parameter. If
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    46
the current match is long enough, deflate() reduces the search for a longer
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    47
match, thus speeding up the whole process. If compression ratio is more
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    48
important than speed, deflate() attempts a complete second search even if
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    49
the first match is already long enough.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    50
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    51
The lazy match evaluation is not performed for the fastest compression
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    52
modes (level parameter 1 to 3). For these fast modes, new strings
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    53
are inserted in the hash table only when no match was found, or
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    54
when the match is not too long. This degrades the compression ratio
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    55
but saves time since there are both fewer insertions and fewer searches.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    56
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    57
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    58
2. Decompression algorithm (inflate)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    59
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    60
2.1 Introduction
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    61
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    62
The key question is how to represent a Huffman code (or any prefix code) so
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    63
that you can decode fast.  The most important characteristic is that shorter
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    64
codes are much more common than longer codes, so pay attention to decoding the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    65
short codes fast, and let the long codes take longer to decode.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    66
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    67
inflate() sets up a first level table that covers some number of bits of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    68
input less than the length of longest code.  It gets that many bits from the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    69
stream, and looks it up in the table.  The table will tell if the next
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    70
code is that many bits or less and how many, and if it is, it will tell
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    71
the value, else it will point to the next level table for which inflate()
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    72
grabs more bits and tries to decode a longer code.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    73
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    74
How many bits to make the first lookup is a tradeoff between the time it
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    75
takes to decode and the time it takes to build the table.  If building the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    76
table took no time (and if you had infinite memory), then there would only
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    77
be a first level table to cover all the way to the longest code.  However,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    78
building the table ends up taking a lot longer for more bits since short
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    79
codes are replicated many times in such a table.  What inflate() does is
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    80
simply to make the number of bits in the first table a variable, and  then
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    81
to set that variable for the maximum speed.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    82
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    83
For inflate, which has 286 possible codes for the literal/length tree, the size
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    84
of the first table is nine bits.  Also the distance trees have 30 possible
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    85
values, and the size of the first table is six bits.  Note that for each of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    86
those cases, the table ended up one bit longer than the ``average'' code
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    87
length, i.e. the code length of an approximately flat code which would be a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    88
little more than eight bits for 286 symbols and a little less than five bits
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    89
for 30 symbols.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    90
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    91
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    92
2.2 More details on the inflate table lookup
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    93
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    94
Ok, you want to know what this cleverly obfuscated inflate tree actually
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    95
looks like.  You are correct that it's not a Huffman tree.  It is simply a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    96
lookup table for the first, let's say, nine bits of a Huffman symbol.  The
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    97
symbol could be as short as one bit or as long as 15 bits.  If a particular
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    98
symbol is shorter than nine bits, then that symbol's translation is duplicated
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
    99
in all those entries that start with that symbol's bits.  For example, if the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   100
symbol is four bits, then it's duplicated 32 times in a nine-bit table.  If a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   101
symbol is nine bits long, it appears in the table once.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   102
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   103
If the symbol is longer than nine bits, then that entry in the table points
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   104
to another similar table for the remaining bits.  Again, there are duplicated
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   105
entries as needed.  The idea is that most of the time the symbol will be short
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   106
and there will only be one table look up.  (That's whole idea behind data
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   107
compression in the first place.)  For the less frequent long symbols, there
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   108
will be two lookups.  If you had a compression method with really long
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   109
symbols, you could have as many levels of lookups as is efficient.  For
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   110
inflate, two is enough.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   111
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   112
So a table entry either points to another table (in which case nine bits in
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   113
the above example are gobbled), or it contains the translation for the symbol
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   114
and the number of bits to gobble.  Then you start again with the next
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   115
ungobbled bit.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   116
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   117
You may wonder: why not just have one lookup table for how ever many bits the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   118
longest symbol is?  The reason is that if you do that, you end up spending
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   119
more time filling in duplicate symbol entries than you do actually decoding.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   120
At least for deflate's output that generates new trees every several 10's of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   121
kbytes.  You can imagine that filling in a 2^15 entry table for a 15-bit code
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   122
would take too long if you're only decoding several thousand symbols.  At the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   123
other extreme, you could make a new table for every bit in the code.  In fact,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   124
that's essentially a Huffman tree.  But then you spend two much time
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   125
traversing the tree while decoding, even for short symbols.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   126
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   127
So the number of bits for the first lookup table is a trade of the time to
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   128
fill out the table vs. the time spent looking at the second level and above of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   129
the table.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   130
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   131
Here is an example, scaled down:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   132
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   133
The code being decoded, with 10 symbols, from 1 to 6 bits long:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   134
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   135
A: 0
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   136
B: 10
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   137
C: 1100
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   138
D: 11010
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   139
E: 11011
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   140
F: 11100
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   141
G: 11101
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   142
H: 11110
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   143
I: 111110
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   144
J: 111111
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   145
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   146
Let's make the first table three bits long (eight entries):
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   147
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   148
000: A,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   149
001: A,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   150
010: A,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   151
011: A,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   152
100: B,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   153
101: B,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   154
110: -> table X (gobble 3 bits)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   155
111: -> table Y (gobble 3 bits)
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   156
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   157
Each entry is what the bits decode as and how many bits that is, i.e. how
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   158
many bits to gobble.  Or the entry points to another table, with the number of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   159
bits to gobble implicit in the size of the table.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   160
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   161
Table X is two bits long since the longest code starting with 110 is five bits
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   162
long:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   163
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   164
00: C,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   165
01: C,1
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   166
10: D,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   167
11: E,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   168
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   169
Table Y is three bits long since the longest code starting with 111 is six
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   170
bits long:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   171
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   172
000: F,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   173
001: F,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   174
010: G,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   175
011: G,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   176
100: H,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   177
101: H,2
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   178
110: I,3
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   179
111: J,3
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   180
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   181
So what we have here are three tables with a total of 20 entries that had to
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   182
be constructed.  That's compared to 64 entries for a single table.  Or
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   183
compared to 16 entries for a Huffman tree (six two entry tables and one four
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   184
entry table).  Assuming that the code ideally represents the probability of
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   185
the symbols, it takes on the average 1.25 lookups per symbol.  That's compared
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   186
to one lookup for the single table, or 1.66 lookups per symbol for the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   187
Huffman tree.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   188
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   189
There, I think that gives you a picture of what's going on.  For inflate, the
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   190
meaning of a particular symbol is often more than just a letter.  It can be a
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   191
byte (a "literal"), or it can be either a length or a distance which
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   192
indicates a base value and a number of bits to fetch after the code that is
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   193
added to the base value.  Or it might be the special end-of-block code.  The
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   194
data structures created in inftrees.c try to encode all that information
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   195
compactly in the tables.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   196
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   197
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   198
Jean-loup Gailly        Mark Adler
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   199
jloup@gzip.org          madler@alumni.caltech.edu
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   200
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   201
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   202
References:
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   203
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   204
[LZ77] Ziv J., Lempel A., ``A Universal Algorithm for Sequential Data
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   205
Compression,'' IEEE Transactions on Information Theory, Vol. 23, No. 3,
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   206
pp. 337-343.
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   207
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   208
``DEFLATE Compressed Data Format Specification'' available in
1918ee327afb Revision: 200952
Alex Gilkes <alex.gilkes@nokia.com>
parents:
diff changeset
   209
http://www.ietf.org/rfc/rfc1951.txt