MCL/sftools/depl/docscontent: comparison Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita

equal deleted inserted replaced

-:43e37759235e
+:51a74ef9ed63
+<?xml version="1.0" encoding="utf-8"?>
+<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
+<!-- This component and the accompanying materials are made available under the terms of the License
+"Eclipse Public License v1.0" which accompanies this distribution,
+and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
+<!-- Initial Contributors:
+Nokia Corporation - initial contribution.
+Contributors:
+-->
+<!DOCTYPE concept
+PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A" xml:lang="en"><title>SQL Index
+Tips</title><shortdesc>This document includes several tips for using SQL indexes.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
+<section id="GUID-3895F9D0-DE9C-4375-B541-AC99CABB7B8A"><title>Introduction</title> <p>You can use indexes to speed up access.
+You create indexes automatically using PRIMARY KEY and UNIQUE. </p> <p><b>Intended
+audience:</b> </p> <p>This document is intended to be used by Symbian platform
+licensees and third party application developers. </p> </section>
+<section id="GUID-765F0DF1-ACB0-57DB-B9A8-3697E4637065"><title>Use an Index
+to Speed up Access</title> <p>Suppose you have a table like this: </p> <codeblock id="GUID-F70B25AB-A151-52CE-A413-1C62A2464D6A" xml:space="preserve">
+CREATE TABLE demo5(
+id INTEGER,
+content BLOB
+);
+</codeblock> <p>Further suppose that this table contains thousands or millions
+of rows and you want to access a single row with a particular ID: </p> <codeblock id="GUID-B02FA452-4093-5383-BAFA-AE035919D720" xml:space="preserve">
+SELECT content FROM demo5 WHERE id=?
+</codeblock> <p>The only want that SQLite can perform this query, and be certain
+to get every row with the chosen ID, is to examine every single row, check
+the ID of that row, and return the content if the ID matches. Examining every
+single row this way is called a <i>full table scan</i>. </p> <p>Reading and
+checking every row of a large table can be very slow, so you want to avoid
+full table scans. The usual way to do this is to create an index on the column
+you are searching against. In the example above, an appropriate index would
+be this: </p> <codeblock id="GUID-82E337F1-2CA2-51B0-A7BC-071A83779A18" xml:space="preserve">
+CREATE INDEX demo5_idx1 ON demo5(id);
+</codeblock> <p>With an index on the ID column, SQLite is able to use a binary
+search to locate entries that contain a particular value of ID. So if the
+table contains a million rows, the query can be satisfied with about 20 accesses
+rather than 1000000 accesses. This is a huge performance improvement. </p> <p>One
+of the features of the SQL language is that you do not have to figure out
+what indexes you may need in advance of coding your application. It is perfectly
+acceptable, even preferable, to write the code for your application using
+a database without any indexes. Then once the application is running and you
+can make speed measurements, add whatever indexes are needed in order to make
+it run faster. </p> <p>When you add indexes, the query optimizer within the
+SQL compiler is able to find new more efficient bytecode procedures for carrying
+out the operations that your SQL statements specify. In other words, by adding
+indexes late in the development cycle you have the power to completely reorganize
+your data access patterns without changing a single line of code. </p> </section>
+<section id="GUID-BB1F17C5-1174-5DF4-AA61-611173237F3F"><title>Create Indexes
+Automatically Using PRIMARY KEY and UNIQUE</title> <p>Any column of a table
+that is declared to be the PRIMARY KEY or that is declared UNIQUE will be
+indexed automatically. There is no need to create a separate index on that
+column using the CREATE INDEX statement. So, for example, this table declaration: </p> <codeblock id="GUID-E4BE6077-F639-5CE7-964A-276B0D58A129" xml:space="preserve">
+CREATE TABLE demo39a(
+id INTEGER,
+content BLOB
+);
+CREATE INDEX demo39_idx1 ON demo39a(id);
+</codeblock> <p>Is roughly equivalent to the following: </p> <codeblock id="GUID-DB3167E0-FA95-50CA-92C7-102B5C2C13E3" xml:space="preserve">
+CREATE TABLE demo39b(
+id INTEGER UNIQUE,
+content BLOB
+);
+</codeblock> <p>The two examples above are “roughly” equivalent, but not exactly
+equivalent. Both tables have an index on the ID column. In the first case,
+the index is created explicitly. In the second case, the index is implied
+by the UNIQUE keyword in the type declaration of the ID column. Both table
+designs use exactly the same amount of disk space, and both will run queries
+such as </p> <codeblock id="GUID-7ACAE270-6D20-557B-B7D1-C90EDD757E43" xml:space="preserve">
+SELECT content FROM demo39 WHERE id=?
+</codeblock> <p>using exactly the same bytecode. The only difference is that
+table demo39a lets you insert multiple rows with the same ID whereas table
+demo39b will raise an exception if you try to insert a new row with the same
+ID as an existing row. </p> <p>If you use the UNIQUE keyword in the CREATE
+INDEX statement of demo39a, like this: </p> <codeblock id="GUID-0EE5E186-CC4A-5CC3-AEAE-F1482F1F8F9A" xml:space="preserve">
+CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id);
+</codeblock> <p>Then both table designs really would be exactly the same in
+every way. In fact, whenever SQLite sees the UNIQUE keyword on a column type
+declaration, all it does is create an automatic unique index on that column. </p> <p>The
+PRIMARY KEY modifier on a column type declaration works like UNIQUE; it causes
+a unique index to be created automatically. The main difference is that you
+are only allowed to have a single PRIMARY KEY. This restriction of only allowing
+a single PRIMARY KEY is part of the official SQL language definition. </p> <p>The
+idea is that a PRIMARY KEY is used to order the rows on disk. Some SQL database
+engines actually implement PRIMARY KEYs this way. But with SQLite, a PRIMARY
+KEY is like any other UNIQUE column, with only one exception: INTEGER PRIMARY
+KEY is a special case which is handled differently, as described in the next
+section. </p> </section>
+<section id="GUID-BF7A0301-8490-58ED-BB37-FAC403A84230"><title>Use Multi-Column
+Indexes</title> <p>SQLite is able to make use of multi-column indexes. The
+rule is that if an index is over columns <i>X</i>  <i> 0 </i>, <i>X</i>  <i> 1 </i>, <i>X</i>  <i> 2 </i>,
+..., <i>X</i>  <i> n </i> of some table, then the index can be used if the
+WHERE clause contains equality constraints for some prefix of those columns <i>X</i>  <i>0 </i>, <i>X</i>  <i>1 </i>, <i>X</i>  <i>2 </i>,
+..., <i>X</i>  <i>i </i> where <i>i</i> is less than <i>n</i>. </p> <p>As
+an example, suppose you have a table and index declared as follows: </p> <codeblock id="GUID-C18C97F7-23CA-5636-9F00-130A8FB3DEF5" xml:space="preserve">
+CREATE TABLE demo314(a,b,c,d,e,f,g);
+CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f);
+</codeblock> <p>Then the index might be used to help with a query that contained
+a WHERE clause like this: </p> <codeblock id="GUID-8A0944F4-1ACF-5267-B49F-EB83EFBB5670" xml:space="preserve">
+... WHERE a=1 AND b='Smith' AND c=1
+</codeblock> <p>All three terms of the WHERE clause would be used together
+with the index in order to narrow the search. But the index could not be used
+if there WHERE clause said: </p> <codeblock id="GUID-B5F1C17F-0F5E-5FC2-A9A4-DF19D699A076" xml:space="preserve">
+... WHERE b='Smith' AND c=1
+</codeblock> <p>The second WHERE clause does not contain equality terms for
+a prefix of the columns in the index because it omits a term for the “a” column. </p> <p>In
+a case like this: </p> <codeblock id="GUID-EF2CFE7D-0456-5414-847D-BADCC057CFD8" xml:space="preserve">
+... WHERE a=1 AND c=1
+</codeblock> <p>Only the “a=1” term in the WHERE clause could be used to help
+narrow the search. The “c=1” term is not part of the prefix of terms in the
+index which have equality constraints because there is no equality constraint
+on the “b” column. </p> <p>SQLite only allows a single index to be used per
+table within a simple SQL statement. For UPDATE and DELETE statements, this
+means that only a single index can ever be used, since those statements can
+only operate on a single table at a time. </p> <p>In a simple SELECT statement
+multiple indexes can be used if the SELECT statement is a join – one index
+per table in the join. In a compound SELECT statement (two or more SELECT
+statements connected by UNION or INTERSECT or EXCEPT) each SELECT statement
+is treated separately and can have its own indexes. Likewise, SELECT statements
+that appear in subexpressions are treated separately. </p> <p>Some other SQL
+database engines (for example PostgreSQL) allow multiple indexes to be used
+for each table in a SELECT. For example, if you had a table and index in PostgreSQL
+like this: </p> <codeblock id="GUID-F5DE8F24-7471-5992-9896-295CE173D855" xml:space="preserve">
+CREATE TABLE pg1(a INT, b INT, c INT, d INT);
+CREATE INDEX pg1_ix1 ON pg1(a);
+CREATE INDEX pg1_ix2 ON pg1(b);
+CREATE INDEX pg1_ix3 ON pg1(c);
+</codeblock> <p>And if you were to run a query like the following: </p> <codeblock id="GUID-A35663B7-4E7D-5CC0-BF5E-CF3A4CFED63F" xml:space="preserve">
+SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99;
+</codeblock> <p>Then PostgreSQL might attempt to optimize the query by using
+all three indexes, one for each term of the WHERE clause. </p> <p>SQLite does
+not work this way. SQLite is compelled to select a single index to use in
+the query. It might select any of the three indexes shown, depending on which
+one the optimizer things will give the best speedup. But in every case it
+will only select a single index and only a single term of the WHERE clause
+will be used. </p> <p>SQLite prefers to use a multi-column index such as this: </p> <codeblock id="GUID-40FB7075-1239-5089-BBC5-0D994F4A0C39" xml:space="preserve">
+CREATE INDEX pg1_ix_all ON pg1(a,b,c);
+</codeblock> <p>If the pg1_ix_all index is available for use when the SELECT
+statement above is prepared, SQLite will likely choose it over any of the
+single-column indexes because the multi-column index is able to make use of
+all 3 terms of the WHERE clause. </p> <p>You can trick SQLite into using multiple
+indexes on the same table by rewriting the query. Instead of the SELECT statement
+shown above, if you rewrite it as this: </p> <codeblock id="GUID-D7DE75D4-BB01-50DF-A9DC-956A83DED5D0" xml:space="preserve">
+SELECT d FROM pg1 WHERE RowID IN (
+SELECT RowID FROM pg1 WHERE a=5
+INTERSECT
+SELECT RowID FROM pg1 WHERE b=11
+INTERSECT
+SELECT RowID FROM pg1 WHERE c=99
+)
+</codeblock> <p>Then each of the individual SELECT statements will using a
+different single-column index and their results will be combined by the outer
+SELECT statement to give the correct result. The other SQL database engines
+like PostgreSQL that are able to make use of multiple indexes per table do
+so by treating the simpler SELECT statement shown first as if they where the
+more complicated SELECT statement shown here. </p> </section>
+<section id="GUID-E90057A8-70B6-590C-B8AE-616DA25BB543"><title>Use Inequality
+Constraints on the Last Index Term</title> <p>Terms in the WHERE clause of
+a query or UPDATE or DELETE statement are mostly likely to trigger the use
+of an index if they are an equality constraint – in other words if the term
+consists of the name of an indexed column, an equal sign (“=”), and an expression. </p> <p>So,
+for example, if you have a table and index that look like this: </p> <codeblock id="GUID-84AADB9D-5853-57C2-B489-87DC7FB7AADE" xml:space="preserve">
+CREATE TABLE demo315(a,b,c,d);
+CREATE INDEX demo315_idx1 ON demo315(a,b,c);
+</codeblock> <p>And a query like this: </p> <codeblock id="GUID-A2B7DA9F-DB82-5D06-80E2-7AF714E403D5" xml:space="preserve">
+SELECT d FROM demo315 WHERE a=512;
+</codeblock> <p>The single “a=512” term of the WHERE clause qualifies as an
+equality constraint and is likely to provoke the use of the demo315_idx1 index. </p> <p>SQLite
+supports two other kinds of equality constraints. One is the IN operator: </p> <codeblock id="GUID-EA5D7637-A6B8-5BC0-A72E-D576B0F945A3" xml:space="preserve">
+SELECT d FROM demo315 WHERE a IN (512,1024);
+SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable);
+</codeblock> <p>There other is the IS NULL constraint: </p> <codeblock id="GUID-B2C1C84B-C33D-55C5-8484-24B28EFC8E37" xml:space="preserve">
+SELECT d FROM demo315 WHERE a IS NULL;
+</codeblock> <p>SQLite allows at most one term of an index to be constrained
+by an inequality such as less than “&lt;”, greater than “&gt;”, less than or
+equal to “&lt;=”, or greater than or equal to “&gt;=”. </p> <p>The column that
+the inequality constrains will be the right-most term of the index that is
+used. So, for example, in this query: </p> <codeblock id="GUID-563231B5-EC3A-57C2-BC6F-1A8129ADE308" xml:space="preserve">
+SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND c=1;
+</codeblock> <p>Only the first two terms of the WHERE clause will be used
+with the demo315_idx1 index. The third term, the “c=1” constraint, cannot
+be used because the “c” column occurs to the right of the “b” column in the
+index and the “b” column is constrained by an inequality. </p> <p>SQLite allows
+up to two inequalities on the same column as long as the two inequalities
+provide an upper and lower bound on the column. For example, in this query: </p> <codeblock id="GUID-4EB94886-EDFF-58F2-8692-011A67AC5A60" xml:space="preserve">
+SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND b&lt;23;
+</codeblock> <p>All three terms of the WHERE clause will be used because the
+two inequalities on the “b” column provide an upper and lower bound on the
+value of “b”. </p> <p>SQLite will only use the four inequalities mentioned
+above to help constrain a search: “&lt;”, “&gt;”, “&lt;=”, and “&gt;=”. Other inequality
+operators such as not equal to (“!=” or “&lt;&gt;”) and NOT NULL are not helpful
+to the query optimizer and will never be used to control an index and help
+make the query run faster. </p> </section>
+<section id="GUID-CAD0C181-37E7-578A-A7E1-7843447C247F"><title>Use Indexes
+To Help ORDER BY Clauses Evaluate Faster</title> <p>The default method for
+evaluating an ORDER BY clause in a SELECT statement is to first evaluate the
+SELECT statement and store the results in a temporary tables, then sort the
+temporary table according to the ORDER BY clause and scan the sorted temporary
+table to generate the final output. </p> <p>This method always works, but
+it requires three passes over the data (one pass to generate the result set,
+a second pass to sort the result set, and a third pass to output the results)
+and it requires a temporary storage space sufficiently large to contain the
+entire results set. </p> <p>Where possible, SQLite will avoid storing and
+sorting the result set by using an index that causes the results to emerge
+from the query in sorted order in the first place. </p> <p>The way to get
+SQLite to use an index for sorting is to provide an index that covers the
+same columns specified in the ORDER BY clause. For example, if the table and
+index are like this: </p> <codeblock id="GUID-F0103033-C5C8-5177-8AD7-70BCC45C33C9" xml:space="preserve">
+CREATE TABLE demo316(a,b,c,data);
+CREATE INDEX idx316 ON demo316(a,b,c);
+</codeblock> <p>And you do a query like this: </p> <codeblock id="GUID-D67BB6FF-E213-5B86-A2C1-E1992DA96A62" xml:space="preserve">
+SELECT data FROM demo316 ORDER BY a,b,c;
+</codeblock> <p>SQLite will use the idx316 index to implement the ORDER BY
+clause, obviating the need for temporary storage space and a separate sorting
+pass. </p> <p>An index can be used to satisfy the search constraints of a
+WHERE clause and to impose the ORDER BY ordering of outputs all at once. The
+trick is for the ORDER BY clause terms to occur immediately after the WHERE
+clause terms in the index. For example, one can write: </p> <codeblock id="GUID-02063968-34B5-5766-9D02-86D696D39C1E" xml:space="preserve">
+SELECT data FROM demo316 WHERE a=5 ORDER BY b,c;
+</codeblock> <p>The “a” column is used in the WHERE clause and the immediately
+following terms of the index, “b” and “c” are used in the ORDER BY clause.
+So in this case the idx316 index would be used both to speed up the search
+and to satisfy the ORDER BY clause. </p> <p>This query also uses the idx316
+index because, once again, the ORDER BY clause term “c” immediate follows
+the WHERE clause terms “a” and “b” in the index: </p> <codeblock id="GUID-6760EC7E-E86A-5EBD-BDDD-32A68BE78A9E" xml:space="preserve">
+SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c;
+</codeblock> <p>But now consider this: </p> <codeblock id="GUID-9363996C-8C30-5E04-B05F-392C8262F1F6" xml:space="preserve">
+SELECT data FROM demo316 WHERE a=5 ORDER BY c;
+</codeblock> <p>Here there is a gap between the ORDER BY term “c” and the
+WHERE clause term “a”. So the idx316 index cannot be used to satisfy both
+the WHERE clause and the ORDER BY clause. The index will be used on the WHERE
+clause and a separate sorting pass will occur to put the results in the correct
+order. </p> </section>
+<section id="GUID-109AF0DA-A054-504A-A432-76BD145B2AC4"><title>Add Result
+Columns To The End Of Indexes</title> <p>Queries will sometimes run faster
+if their result columns appear in the right-most entries of an index. Consider
+the following example: </p> <codeblock id="GUID-63292052-B523-5671-B3EE-E10A66C7275F" xml:space="preserve">
+CREATE TABLE demo317(a,b,c,data);
+CREATE INDEX idx317 ON demo316(a,b,c);
+</codeblock> <p>A query where all result column terms appears in the index,
+such as </p> <codeblock id="GUID-41F740E7-EAFC-583B-BFE6-E63DBEA354D7" xml:space="preserve">
+SELECT c FROM demo317 WHERE a=5 ORDER BY b;
+</codeblock> <p>will typically run about twice as fast or faster than a query
+that uses columns that are not in the index, e.g. </p> <codeblock id="GUID-098752F4-304A-5A84-834E-240D97D97C2D" xml:space="preserve">
+SELECT data FROM demo317 WHERE a=5 ORDER BY b;
+</codeblock> <p>The reason for this is that when all information is contained
+within the index entry only a single search has to be made for each row of
+output. But when some of the information is in the index and other parts are
+in the table, first there must be a search for the appropriate index entry
+then a separate search is made for the appropriate table row based on the
+RowID found in the index entry. Twice as much searching has to be done for
+each row of output generated. </p> <p>The extra query speed does not come
+for free, however. Adding additional columns to an index makes the database
+file larger. So when developing an application, the programmer will need to
+make a space versus time trade-off to determine whether the extra columns
+should be added to the index or not. </p> <p>Note that if any column of the
+result must be obtained from the original table, then the table row will have
+to be searched for anyhow. There will be no speed advantage, so you might
+as well omit the extra columns from the end of the index and save on storage
+space. The speed-up described in this section can only be realized when every
+column in a table is obtainable from the index. </p> <p>Taking into account
+the results of the previous few sections, the best set of columns to put in
+an index can be described as follows: </p> <ul>
+<li id="GUID-EBF4DEFB-2F5F-5D78-92FA-06FEAB0C3650"><p>The first columns in
+the index should be columns that have equality constraints in the WHERE clause
+of the query. </p> </li>
+<li id="GUID-E5CB725C-6304-5946-9E18-E69B5F1A6A88"><p>The second group of
+columns should match the columns specified in the ORDER BY clause. </p> </li>
+<li id="GUID-FBC00251-C3AD-5AC0-9102-EF66EA37DE4E"><p>Add additional columns
+to the end of the index that are used in the result set of the query. </p> </li>
+</ul> </section>
+<section id="GUID-D7B5B389-E031-5512-8186-235B22F0D9C1"><title>Resolve Indexing
+Ambiguities Using the Unary “+” Operator</title> <p>The SQLite query optimizer
+usually does a good job of choosing the best index to use for a particular
+query, especially if ANALYZE has been run to provide it with index performance
+statistics. But occasions do arise where it is useful to give the optimizer
+hints. </p> <p>One of the easiest ways to control the operation of the optimizer
+is to disqualify terms in the WHERE clause or ORDER BY clause as candidates
+for optimization by using the unary “+” operator. </p> <p>In SQLite, a unary
+“+” operator is a no-op. It makes no change to its operand, even if the operand
+is something other than a number. So you can always prefix a “+” to an expression
+in without changing the meaning of the expression. As the optimizer will only
+use terms in WHERE, HAVING, or ON clauses that have an index column name on
+one side of a comparison operator, you can prevent such a term from being
+used by the optimizer by prefixing the column name with a “+”. </p> <p>For
+example, suppose you have a database with a schema like this: </p> <codeblock id="GUID-E7747EFD-FE58-5EA4-88B3-097C0A303F52" xml:space="preserve">
+CREATE TABLE demo321(a,b,c,data);
+CREATE INDEX idx321a ON demo321(a);
+CREATE INDEX idx321b ON demo321(b);
+</codeblock> <p>If you issue a query such as this: </p> <codeblock id="GUID-87BD59FC-33A8-598B-B91F-607B26F7349D" xml:space="preserve">
+SELECT data FROM demo321 WHERE a=5 AND b=11;
+</codeblock> <p>The query optimizer might use the “a=5” term with idx321a
+or it might use the “b=11” term with the idx321b index. But if you want to
+force the use of the idx321a index you can accomplish that by disqualifying
+the second term of the WHERE clause as a candidate for optimization using
+a unary “+” like this: </p> <codeblock id="GUID-E6EAB459-726A-5FE4-8065-6C46AC2C5B5C" xml:space="preserve">
+SELECT data FROM demo321 WHERE a=5 AND +b=11;
+</codeblock> <p>The “+” in front of the “b=11” turns the left-hand side of
+the equals comparison operator into an expression instead of an indexed column
+name. The optimizer will then not recognize that the second term can be used
+with an index and so the optimizer is compelled to use the first “a=5” term. </p> <p>The
+unary “+” operator can also be used to disable ORDER BY clause optimizations.
+Consider this query: </p> <codeblock id="GUID-0488D466-77B7-50E0-AB85-FF033A2D75DC" xml:space="preserve">
+SELECT data FROM demo321 WHERE a=5 ORDER BY b;
+</codeblock> <p>The optimizer has the choice of using the “a=5” term of the
+WHERE clause with idx321a to restrict the search. Or it might choose to use
+do a full table scan with idx321b to satisfy the ORDER BY clause and thus
+avoid a separate sorting pass. You can force one choice or the other using
+a unary “+”. </p> <p>To force the use of idx321a on the WHERE clause, add
+the unary “+” in from of the “b” in the ORDER BY clause: </p> <codeblock id="GUID-E55A085F-D91F-58E0-B964-317BB3A9D7ED" xml:space="preserve">
+SELECT data FROM demo321 WHERE a=5 ORDER BY +b;
+</codeblock> <p>To go the other way and force the idx321b index to be used
+to satisfy the ORDER BY clause, disqualify the WHERE term by prefixing with
+a unary “+”: </p> <codeblock id="GUID-D97EF52A-1F74-57EB-AC11-7911B4E088B3" xml:space="preserve">
+SELECT data FROM demo321 WHERE +a=5 ORDER BY b;
+</codeblock> <p>The reader is cautioned not to overuse the unary “+” operator.
+The SQLite query optimizer usually picks the best index without any outside
+help. Premature use of unary “+” can confuse the optimizer and cause less
+than optimal performance. But in some cases it is useful to be able override
+the decisions of the optimizer, and the unary “+” operator is an excellent
+way to do this when it becomes necessary. </p> </section>
+<section id="GUID-7BEBC49C-0528-5D58-9626-2A92F3D0D9E8"><title>Avoid Indexing
+Large BLOBs and CLOBs</title> <p>SQLite stores indexes as b-trees. Each b-tree
+node uses one page of the database file. In order to maintain an acceptable
+fan-out, the b-tree module within SQLite requires that at least 4 entries
+must fit on each page of a b-tree. There is also some overhead associated
+with each b-tree page. So at the most there is about 250 bytes of space available
+on the main b-tree page for each index entry. </p> <p>If an index entry exceeds
+this allotment of approximately 250 bytes excess bytes are spilled to overflow
+pages. There is no arbitrary limit on the number of overflow pages or on the
+length of a b-tree entry, but for maximum efficiency it is best to avoid overflow
+pages, especially in indexes. This means that you should strive to keep the
+number of bytes in each index entry below 250. </p> <p>If you keep the size
+of indexes significantly smaller than 250 bytes, then the b-tree fan-out is
+increased and the binary search algorithm used to search for entries in an
+index has fewer pages to examine and therefore runs faster. So the fewer bytes
+used in each index entry the better, at least from a performance perspective. </p> <p>For
+these reasons, it is recommended that you avoid indexing large BLOBs and CLOBs.
+SQLite will continue to work when large BLOBs and CLOBs are indexed, but there
+will be a performance impact. </p> <p>On the other hand, if you need to lookup
+entries using a large BLOB or CLOB as the key, then by all means use an index.
+An index on a large BLOB or CLOB is not as fast as an index using more compact
+data types such as integers, but it is still many order of magnitude faster
+than doing a full table scan. So to be more precise, the advice of this section
+is that you should design your applications so that you do not need to lookup
+entries using a large BLOB or CLOB as the key. Try to arrange to have compact
+keys consisting of short strings or integers. </p> <p>Note that many other
+SQL database engines disallow the indexing of BLOBs and CLOBs in the first
+place. You simple cannot do it. SQLite is more flexible that most in that
+it does allow BLOBs and CLOBs to be indexed and it will use those indexes
+when appropriate. But for maximum performance, it is best to use smaller search
+keys. </p> </section>
+<section id="GUID-DD40F29F-DF93-536E-9B52-F9B9FF45155D"><title>Avoid Excess
+Indexes</title> <p>Some developers approach SQL-based application development
+with the attitude that indexes never hurt and that the more indexes you have,
+the faster your application will run. This is definitely not the case. There
+is a costs associated with each new index you create: </p> <ul>
+<li id="GUID-FD257BF7-F938-54B5-AC03-9536712D6281"><p>Each new index takes
+up additional space in the database file. The more indexes you have, the larger
+your database files will become for the same amount of data. </p> </li>
+<li id="GUID-E1B74FB6-246A-5148-AF06-04E1B4B949F1"><p>Every INSERT and UPDATE
+statement modifies both the original table and all indexes on that table.
+So the performance of INSERT and UPDATE decreases linearly with the number
+of indexes. </p> </li>
+<li id="GUID-56AAE2D1-71D6-5A23-8190-B0C80B204DED"><p>Compiling new SQL statements
+using <codeph>Prepare()</codeph> takes longer when there are more indexes
+for the optimizer to choose between. </p> </li>
+<li id="GUID-24B7F7D8-FAA9-5C78-B3C7-B886FA774C0B"><p>Surplus indexes give
+the optimizer more opportunities to make a bad choice. </p> </li>
+</ul> <p>Your policy on indexes should be to avoid them wherever you can.
+Indexes are powerful medicine and can work wonders to improve the performance
+of a program. But just as too many drugs can be worse than none at all, so
+also can too many indexes cause more harm than good. </p> <p>When building
+a new application, a good approach is to omit all explicitly declared indexes
+in the beginning and only add indexes as needed to address specific performance
+problems. </p> <p>Take care to avoid redundant indexes. For example, consider
+this schema: </p> <codeblock id="GUID-89F20101-1628-5783-82B0-2ABE84078C7D" xml:space="preserve">
+CREATE TABLE demo323a(a,b,c);
+CREATE INDEX idx323a1 ON demo323(a);
+CREATE INDEX idx323a2 ON demo323(a,b);
+</codeblock> <p>The idx323a1 index is redundant and can be eliminated. Anything
+that the idx323a1 index can do the idx323a2 index can do better. </p> <p>Other
+redundancies are not quite as apparent as the above. Recall that any column
+or columns that are declared UNIQUE or PRIMARY KEY (except for the special
+case of INTEGER PRIMARY KEY) are automatically indexed. So in the following
+schema: </p> <codeblock id="GUID-2FE7B726-4027-518C-9217-B4BD1ECDA991" xml:space="preserve">
+CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE);
+CREATE INDEX idx323b1 ON demo323b(x);
+CREATE INDEX idx323b2 ON demo323b(y);
+</codeblock> <p>Both indexes are redundant and can be eliminated with no loss
+in query performance. Occasionally one sees a novice SQL programmer use both
+UNIQUE and PRIMARY KEY on the same column: </p> <codeblock id="GUID-CDE12649-BDB4-58D4-8981-02628BDF5C79" xml:space="preserve">
+CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q);
+</codeblock> <p>This has the effect of creating two indexes on the “p” column
+– one for the UNIQUE keywords and another for the PRIMARY KEY keyword. Both
+indexes are identical so clearly one can be omitted. A PRIMARY KEY is guaranteed
+to always be unique so the UNIQUE keyword can be removed from the demo323c
+table definition with no ambiguity or loss of functionality. </p> <p>It is
+not a fatal error to create too many indexes or redundant indexes. SQLite
+will continue to generate the correct answers but it may take longer to produce
+those answers and the resulting database files might be a little larger. So
+for best results, keep the number of indexes to a minimum. </p> </section>
+<section id="GUID-9337E315-BB5A-56D0-8319-6C398D26151F"><title>Avoid Tables
+and Indexes with an Excessive Number of Columns</title> <p>SQLite places no
+arbitrary limits on the number of columns in a table or index. There are known
+commercial applications using SQLite that construct tables with tens of thousands
+of columns each. And these applications actually work. </p> <p>However the
+database engine is optimized for the common case of tables with no more than
+a few dozen columns. For best performance you should try to stay in the optimized
+region. Furthermore, we note that relational databases with a large number
+of columns are usually not well normalized. So even apart from performance
+considerations, if you find your design has tables with more than a dozen
+or so columns, you really need to rethink how you are building your application. </p> <p>There
+are a number of places in <codeph>Prepare()</codeph> that run in time O(N<sup>2</sup>)
+where N is the number of columns in the table. The constant of proportionality
+is small in these cases so you should not have any problems for N of less
+than one hundred but for N on the order of a thousand, the time to run <codeph>Prepare()</codeph> can
+start to become noticeable. </p> <p>When the bytecode is running and it needs
+to access the i-th column of a table, the values of the previous i-1 columns
+must be accessed first. So if you have a large number of columns, accessing
+the last column can be an expensive operation. This fact also argues for putting
+smaller and more frequently accessed columns early in the table. </p> <p>There
+are certain optimizations that will only work if the table has 30 or fewer
+columns. The optimization that extracts all necessary information from an
+index and never refers to the underlying table works this way. So in some
+cases, keeping the number of columns in a table at or below 30 can result
+in a 2-fold speed improvement. </p> <p>Indexes will only be used if they contain
+30 or fewer columns. You can put as many columns in an index as you want,
+but if the number is greater than 30, the index will never improve performance
+and will never do anything but take up space in your database file. </p> </section>
+</conbody><related-links>
+<link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext>
+</link>
+<link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server
+Guide</linktext></link>
+<link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext>
+</link>
+<link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext>
+</link>
+<link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
+Tables</linktext></link>
+<link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile
+Corruption</linktext></link>
+<link><linktext/></link>
+<link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion
+Tips</linktext></link>
+<link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
+Tips</linktext></link>
+<link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext>
+</link>
+<link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
+Tips</linktext></link>
+<link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL Joins</linktext>
+</link>
+<link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
+Command</linktext></link>
+<link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE
+CLause Tips</linktext></link>
+</related-links></concept>

changeset 7	51a74ef9ed63
child 13	48780e181b38