--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/symbian-qemu-0.9.1-12/python-2.6.1/Objects/dictnotes.txt Fri Jul 31 15:01:17 2009 +0100
@@ -0,0 +1,270 @@
+NOTES ON OPTIMIZING DICTIONARIES
+================================
+
+
+Principal Use Cases for Dictionaries
+------------------------------------
+
+Passing keyword arguments
+ Typically, one read and one write for 1 to 3 elements.
+ Occurs frequently in normal python code.
+
+Class method lookup
+ Dictionaries vary in size with 8 to 16 elements being common.
+ Usually written once with many lookups.
+ When base classes are used, there are many failed lookups
+ followed by a lookup in a base class.
+
+Instance attribute lookup and Global variables
+ Dictionaries vary in size. 4 to 10 elements are common.
+ Both reads and writes are common.
+
+Builtins
+ Frequent reads. Almost never written.
+ Size 126 interned strings (as of Py2.3b1).
+ A few keys are accessed much more frequently than others.
+
+Uniquification
+ Dictionaries of any size. Bulk of work is in creation.
+ Repeated writes to a smaller set of keys.
+ Single read of each key.
+ Some use cases have two consecutive accesses to the same key.
+
+ * Removing duplicates from a sequence.
+ dict.fromkeys(seqn).keys()
+
+ * Counting elements in a sequence.
+ for e in seqn:
+ d[e] = d.get(e,0) + 1
+
+ * Accumulating references in a dictionary of lists:
+
+ for pagenumber, page in enumerate(pages):
+ for word in page:
+ d.setdefault(word, []).append(pagenumber)
+
+ Note, the second example is a use case characterized by a get and set
+ to the same key. There are similar use cases with a __contains__
+ followed by a get, set, or del to the same key. Part of the
+ justification for d.setdefault is combining the two lookups into one.
+
+Membership Testing
+ Dictionaries of any size. Created once and then rarely changes.
+ Single write to each key.
+ Many calls to __contains__() or has_key().
+ Similar access patterns occur with replacement dictionaries
+ such as with the % formatting operator.
+
+Dynamic Mappings
+ Characterized by deletions interspersed with adds and replacements.
+ Performance benefits greatly from the re-use of dummy entries.
+
+
+Data Layout (assuming a 32-bit box with 64 bytes per cache line)
+----------------------------------------------------------------
+
+Smalldicts (8 entries) are attached to the dictobject structure
+and the whole group nearly fills two consecutive cache lines.
+
+Larger dicts use the first half of the dictobject structure (one cache
+line) and a separate, continuous block of entries (at 12 bytes each
+for a total of 5.333 entries per cache line).
+
+
+Tunable Dictionary Parameters
+-----------------------------
+
+* PyDict_MINSIZE. Currently set to 8.
+ Must be a power of two. New dicts have to zero-out every cell.
+ Each additional 8 consumes 1.5 cache lines. Increasing improves
+ the sparseness of small dictionaries but costs time to read in
+ the additional cache lines if they are not already in cache.
+ That case is common when keyword arguments are passed.
+
+* Maximum dictionary load in PyDict_SetItem. Currently set to 2/3.
+ Increasing this ratio makes dictionaries more dense resulting
+ in more collisions. Decreasing it improves sparseness at the
+ expense of spreading entries over more cache lines and at the
+ cost of total memory consumed.
+
+ The load test occurs in highly time sensitive code. Efforts
+ to make the test more complex (for example, varying the load
+ for different sizes) have degraded performance.
+
+* Growth rate upon hitting maximum load. Currently set to *2.
+ Raising this to *4 results in half the number of resizes,
+ less effort to resize, better sparseness for some (but not
+ all dict sizes), and potentially doubles memory consumption
+ depending on the size of the dictionary. Setting to *4
+ eliminates every other resize step.
+
+* Maximum sparseness (minimum dictionary load). What percentage
+ of entries can be unused before the dictionary shrinks to
+ free up memory and speed up iteration? (The current CPython
+ code does not represent this parameter directly.)
+
+* Shrinkage rate upon exceeding maximum sparseness. The current
+ CPython code never even checks sparseness when deleting a
+ key. When a new key is added, it resizes based on the number
+ of active keys, so that the addition may trigger shrinkage
+ rather than growth.
+
+Tune-ups should be measured across a broad range of applications and
+use cases. A change to any parameter will help in some situations and
+hurt in others. The key is to find settings that help the most common
+cases and do the least damage to the less common cases. Results will
+vary dramatically depending on the exact number of keys, whether the
+keys are all strings, whether reads or writes dominate, the exact
+hash values of the keys (some sets of values have fewer collisions than
+others). Any one test or benchmark is likely to prove misleading.
+
+While making a dictionary more sparse reduces collisions, it impairs
+iteration and key listing. Those methods loop over every potential
+entry. Doubling the size of dictionary results in twice as many
+non-overlapping memory accesses for keys(), items(), values(),
+__iter__(), iterkeys(), iteritems(), itervalues(), and update().
+Also, every dictionary iterates at least twice, once for the memset()
+when it is created and once by dealloc().
+
+Dictionary operations involving only a single key can be O(1) unless
+resizing is possible. By checking for a resize only when the
+dictionary can grow (and may *require* resizing), other operations
+remain O(1), and the odds of resize thrashing or memory fragmentation
+are reduced. In particular, an algorithm that empties a dictionary
+by repeatedly invoking .pop will see no resizing, which might
+not be necessary at all because the dictionary is eventually
+discarded entirely.
+
+
+Results of Cache Locality Experiments
+-------------------------------------
+
+When an entry is retrieved from memory, 4.333 adjacent entries are also
+retrieved into a cache line. Since accessing items in cache is *much*
+cheaper than a cache miss, an enticing idea is to probe the adjacent
+entries as a first step in collision resolution. Unfortunately, the
+introduction of any regularity into collision searches results in more
+collisions than the current random chaining approach.
+
+Exploiting cache locality at the expense of additional collisions fails
+to payoff when the entries are already loaded in cache (the expense
+is paid with no compensating benefit). This occurs in small dictionaries
+where the whole dictionary fits into a pair of cache lines. It also
+occurs frequently in large dictionaries which have a common access pattern
+where some keys are accessed much more frequently than others. The
+more popular entries *and* their collision chains tend to remain in cache.
+
+To exploit cache locality, change the collision resolution section
+in lookdict() and lookdict_string(). Set i^=1 at the top of the
+loop and move the i = (i << 2) + i + perturb + 1 to an unrolled
+version of the loop.
+
+This optimization strategy can be leveraged in several ways:
+
+* If the dictionary is kept sparse (through the tunable parameters),
+then the occurrence of additional collisions is lessened.
+
+* If lookdict() and lookdict_string() are specialized for small dicts
+and for largedicts, then the versions for large_dicts can be given
+an alternate search strategy without increasing collisions in small dicts
+which already have the maximum benefit of cache locality.
+
+* If the use case for a dictionary is known to have a random key
+access pattern (as opposed to a more common pattern with a Zipf's law
+distribution), then there will be more benefit for large dictionaries
+because any given key is no more likely than another to already be
+in cache.
+
+* In use cases with paired accesses to the same key, the second access
+is always in cache and gets no benefit from efforts to further improve
+cache locality.
+
+Optimizing the Search of Small Dictionaries
+-------------------------------------------
+
+If lookdict() and lookdict_string() are specialized for smaller dictionaries,
+then a custom search approach can be implemented that exploits the small
+search space and cache locality.
+
+* The simplest example is a linear search of contiguous entries. This is
+ simple to implement, guaranteed to terminate rapidly, never searches
+ the same entry twice, and precludes the need to check for dummy entries.
+
+* A more advanced example is a self-organizing search so that the most
+ frequently accessed entries get probed first. The organization
+ adapts if the access pattern changes over time. Treaps are ideally
+ suited for self-organization with the most common entries at the
+ top of the heap and a rapid binary search pattern. Most probes and
+ results are all located at the top of the tree allowing them all to
+ be located in one or two cache lines.
+
+* Also, small dictionaries may be made more dense, perhaps filling all
+ eight cells to take the maximum advantage of two cache lines.
+
+
+Strategy Pattern
+----------------
+
+Consider allowing the user to set the tunable parameters or to select a
+particular search method. Since some dictionary use cases have known
+sizes and access patterns, the user may be able to provide useful hints.
+
+1) For example, if membership testing or lookups dominate runtime and memory
+ is not at a premium, the user may benefit from setting the maximum load
+ ratio at 5% or 10% instead of the usual 66.7%. This will sharply
+ curtail the number of collisions but will increase iteration time.
+ The builtin namespace is a prime example of a dictionary that can
+ benefit from being highly sparse.
+
+2) Dictionary creation time can be shortened in cases where the ultimate
+ size of the dictionary is known in advance. The dictionary can be
+ pre-sized so that no resize operations are required during creation.
+ Not only does this save resizes, but the key insertion will go
+ more quickly because the first half of the keys will be inserted into
+ a more sparse environment than before. The preconditions for this
+ strategy arise whenever a dictionary is created from a key or item
+ sequence and the number of *unique* keys is known.
+
+3) If the key space is large and the access pattern is known to be random,
+ then search strategies exploiting cache locality can be fruitful.
+ The preconditions for this strategy arise in simulations and
+ numerical analysis.
+
+4) If the keys are fixed and the access pattern strongly favors some of
+ the keys, then the entries can be stored contiguously and accessed
+ with a linear search or treap. This exploits knowledge of the data,
+ cache locality, and a simplified search routine. It also eliminates
+ the need to test for dummy entries on each probe. The preconditions
+ for this strategy arise in symbol tables and in the builtin dictionary.
+
+
+Readonly Dictionaries
+---------------------
+Some dictionary use cases pass through a build stage and then move to a
+more heavily exercised lookup stage with no further changes to the
+dictionary.
+
+An idea that emerged on python-dev is to be able to convert a dictionary
+to a read-only state. This can help prevent programming errors and also
+provide knowledge that can be exploited for lookup optimization.
+
+The dictionary can be immediately rebuilt (eliminating dummy entries),
+resized (to an appropriate level of sparseness), and the keys can be
+jostled (to minimize collisions). The lookdict() routine can then
+eliminate the test for dummy entries (saving about 1/4 of the time
+spent in the collision resolution loop).
+
+An additional possibility is to insert links into the empty spaces
+so that dictionary iteration can proceed in len(d) steps instead of
+(mp->mask + 1) steps. Alternatively, a separate tuple of keys can be
+kept just for iteration.
+
+
+Caching Lookups
+---------------
+The idea is to exploit key access patterns by anticipating future lookups
+based on previous lookups.
+
+The simplest incarnation is to save the most recently accessed entry.
+This gives optimal performance for use cases where every get is followed
+by a set or del to the same key.