Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita
author Dominic Pinkman <dominic.pinkman@nokia.com>
Tue, 20 Jul 2010 12:00:49 +0100
changeset 13 48780e181b38
parent 7 51a74ef9ed63
permissions -rw-r--r--
Week 28 contribution of SDK documentation content. See release notes for details. Fixes bugs Bug 1897 and Bug 1522.

<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
<!-- This component and the accompanying materials are made available under the terms of the License 
"Eclipse Public License v1.0" which accompanies this distribution, 
and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
<!-- Initial Contributors:
    Nokia Corporation - initial contribution.
Contributors: 
-->
<!DOCTYPE concept
  PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A" xml:lang="en"><title>SQL Index Tips</title><shortdesc>This document includes several tips for using SQL indexes.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
<section id="GUID-3895F9D0-DE9C-4375-B541-AC99CABB7B8A"><title>Introduction</title> <p>You can use indexes to speed up access. You create indexes automatically
using PRIMARY KEY and UNIQUE. </p> <p><b>Intended audience:</b> </p> <p>This document is intended to be
used by Symbian platform licensees and third party application developers. </p> </section>
<section id="GUID-765F0DF1-ACB0-57DB-B9A8-3697E4637065"><title>Use
an Index to Speed up Access</title> <p>Suppose you have a table like
this: </p> <codeblock id="GUID-F70B25AB-A151-52CE-A413-1C62A2464D6A" xml:space="preserve">
CREATE TABLE demo5(
    id INTEGER,
    content BLOB
);
</codeblock> <p>Further suppose that this table contains thousands
or millions of rows and you want to access a single row with a particular
ID: </p> <codeblock id="GUID-B02FA452-4093-5383-BAFA-AE035919D720" xml:space="preserve">
SELECT content FROM demo5 WHERE id=?
</codeblock> <p>The only want that SQLite can perform this query,
and be certain to get every row with the chosen ID, is to examine
every single row, check the ID of that row, and return the content
if the ID matches. Examining every single row this way is called a <i>full table scan</i>. </p> <p>Reading and checking every row of a
large table can be very slow, so you want to avoid full table scans.
The usual way to do this is to create an index on the column you are
searching against. In the example above, an appropriate index would
be this: </p> <codeblock id="GUID-82E337F1-2CA2-51B0-A7BC-071A83779A18" xml:space="preserve">
CREATE INDEX demo5_idx1 ON demo5(id);
</codeblock> <p>With an index on the ID column, SQLite is able to
use a binary search to locate entries that contain a particular value
of ID. So if the table contains a million rows, the query can be satisfied
with about 20 accesses rather than 1000000 accesses. This is a huge
performance improvement. </p> <p>One of the features of the SQL language
is that you do not have to figure out what indexes you may need in
advance of coding your application. It is perfectly acceptable, even
preferable, to write the code for your application using a database
without any indexes. Then once the application is running and you
can make speed measurements, add whatever indexes are needed in order
to make it run faster. </p> <p>When you add indexes, the query optimizer
within the SQL compiler is able to find new more efficient bytecode
procedures for carrying out the operations that your SQL statements
specify. In other words, by adding indexes late in the development
cycle you have the power to completely reorganize your data access
patterns without changing a single line of code. </p> </section>
<section id="GUID-BB1F17C5-1174-5DF4-AA61-611173237F3F"><title>Create
Indexes Automatically Using PRIMARY KEY and UNIQUE</title> <p>Any
column of a table that is declared to be the PRIMARY KEY or that is
declared UNIQUE will be indexed automatically. There is no need to
create a separate index on that column using the CREATE INDEX statement.
So, for example, this table declaration: </p> <codeblock id="GUID-E4BE6077-F639-5CE7-964A-276B0D58A129" xml:space="preserve">
CREATE TABLE demo39a(
    id INTEGER,
    content BLOB
);

CREATE INDEX demo39_idx1 ON demo39a(id);
</codeblock> <p>Is roughly equivalent to the following: </p> <codeblock id="GUID-DB3167E0-FA95-50CA-92C7-102B5C2C13E3" xml:space="preserve">
CREATE TABLE demo39b(
    id INTEGER UNIQUE,
    content BLOB
);
</codeblock> <p>The two examples above are “roughly” equivalent, but
not exactly equivalent. Both tables have an index on the ID column.
In the first case, the index is created explicitly. In the second
case, the index is implied by the UNIQUE keyword in the type declaration
of the ID column. Both table designs use exactly the same amount of
disk space, and both will run queries such as </p> <codeblock id="GUID-7ACAE270-6D20-557B-B7D1-C90EDD757E43" xml:space="preserve">
SELECT content FROM demo39 WHERE id=?
</codeblock> <p>using exactly the same bytecode. The only difference
is that table demo39a lets you insert multiple rows with the same
ID whereas table demo39b will raise an exception if you try to insert
a new row with the same ID as an existing row. </p> <p>If you use
the UNIQUE keyword in the CREATE INDEX statement of demo39a, like
this: </p> <codeblock id="GUID-0EE5E186-CC4A-5CC3-AEAE-F1482F1F8F9A" xml:space="preserve">
CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id);
</codeblock> <p>Then both table designs really would be exactly the
same in every way. In fact, whenever SQLite sees the UNIQUE keyword
on a column type declaration, all it does is create an automatic unique
index on that column. </p> <p>The PRIMARY KEY modifier on a column
type declaration works like UNIQUE; it causes a unique index to be
created automatically. The main difference is that you are only allowed
to have a single PRIMARY KEY. This restriction of only allowing a
single PRIMARY KEY is part of the official SQL language definition. </p> <p>The idea is that a PRIMARY KEY is used to order the rows on disk.
Some SQL database engines actually implement PRIMARY KEYs this way.
But with SQLite, a PRIMARY KEY is like any other UNIQUE column, with
only one exception: INTEGER PRIMARY KEY is a special case which is
handled differently, as described in the next section. </p> </section>
<section id="GUID-BF7A0301-8490-58ED-BB37-FAC403A84230"><title>Use
Multi-Column Indexes</title> <p>SQLite is able to make use of multi-column
indexes. The rule is that if an index is over columns <i>X</i>  <i> 0 </i>, <i>X</i>  <i> 1 </i>, <i>X</i>  <i> 2 </i>, ..., <i>X</i>  <i> n </i> of some table, then the index can be used if the WHERE
clause contains equality constraints for some prefix of those columns <i>X</i>  <i>0 </i>, <i>X</i>  <i>1 </i>, <i>X</i>  <i>2 </i>, ..., <i>X</i>  <i>i </i> where <i>i</i> is less than <i>n</i>. </p> <p>As
an example, suppose you have a table and index declared as follows: </p> <codeblock id="GUID-C18C97F7-23CA-5636-9F00-130A8FB3DEF5" xml:space="preserve">
CREATE TABLE demo314(a,b,c,d,e,f,g);
CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f);
</codeblock> <p>Then the index might be used to help with a query
that contained a WHERE clause like this: </p> <codeblock id="GUID-8A0944F4-1ACF-5267-B49F-EB83EFBB5670" xml:space="preserve">
... WHERE a=1 AND b='Smith' AND c=1
</codeblock> <p>All three terms of the WHERE clause would be used
together with the index in order to narrow the search. But the index
could not be used if there WHERE clause said: </p> <codeblock id="GUID-B5F1C17F-0F5E-5FC2-A9A4-DF19D699A076" xml:space="preserve">
... WHERE b='Smith' AND c=1
</codeblock> <p>The second WHERE clause does not contain equality
terms for a prefix of the columns in the index because it omits a
term for the “a” column. </p> <p>In a case like this: </p> <codeblock id="GUID-EF2CFE7D-0456-5414-847D-BADCC057CFD8" xml:space="preserve">
... WHERE a=1 AND c=1
</codeblock> <p>Only the “a=1” term in the WHERE clause could be used
to help narrow the search. The “c=1” term is not part of the prefix
of terms in the index which have equality constraints because there
is no equality constraint on the “b” column. </p> <p>SQLite only allows
a single index to be used per table within a simple SQL statement.
For UPDATE and DELETE statements, this means that only a single index
can ever be used, since those statements can only operate on a single
table at a time. </p> <p>In a simple SELECT statement multiple indexes
can be used if the SELECT statement is a join – one index per table
in the join. In a compound SELECT statement (two or more SELECT statements
connected by UNION or INTERSECT or EXCEPT) each SELECT statement is
treated separately and can have its own indexes. Likewise, SELECT
statements that appear in subexpressions are treated separately. </p> <p>Some other SQL database engines (for example PostgreSQL) allow
multiple indexes to be used for each table in a SELECT. For example,
if you had a table and index in PostgreSQL like this: </p> <codeblock id="GUID-F5DE8F24-7471-5992-9896-295CE173D855" xml:space="preserve">
CREATE TABLE pg1(a INT, b INT, c INT, d INT);
CREATE INDEX pg1_ix1 ON pg1(a);
CREATE INDEX pg1_ix2 ON pg1(b);
CREATE INDEX pg1_ix3 ON pg1(c);
</codeblock> <p>And if you were to run a query like the following: </p> <codeblock id="GUID-A35663B7-4E7D-5CC0-BF5E-CF3A4CFED63F" xml:space="preserve">
SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99;
</codeblock> <p>Then PostgreSQL might attempt to optimize the query
by using all three indexes, one for each term of the WHERE clause. </p> <p>SQLite does not work this way. SQLite is compelled to select
a single index to use in the query. It might select any of the three
indexes shown, depending on which one the optimizer things will give
the best speedup. But in every case it will only select a single index
and only a single term of the WHERE clause will be used. </p> <p>SQLite
prefers to use a multi-column index such as this: </p> <codeblock id="GUID-40FB7075-1239-5089-BBC5-0D994F4A0C39" xml:space="preserve">
CREATE INDEX pg1_ix_all ON pg1(a,b,c);
</codeblock> <p>If the pg1_ix_all index is available for use when
the SELECT statement above is prepared, SQLite will likely choose
it over any of the single-column indexes because the multi-column
index is able to make use of all 3 terms of the WHERE clause. </p> <p>You can trick SQLite into using multiple indexes on the same
table by rewriting the query. Instead of the SELECT statement shown
above, if you rewrite it as this: </p> <codeblock id="GUID-D7DE75D4-BB01-50DF-A9DC-956A83DED5D0" xml:space="preserve">
SELECT d FROM pg1 WHERE RowID IN (
    SELECT RowID FROM pg1 WHERE a=5
    INTERSECT
    SELECT RowID FROM pg1 WHERE b=11
    INTERSECT
    SELECT RowID FROM pg1 WHERE c=99
)
</codeblock> <p>Then each of the individual SELECT statements will
using a different single-column index and their results will be combined
by the outer SELECT statement to give the correct result. The other
SQL database engines like PostgreSQL that are able to make use of
multiple indexes per table do so by treating the simpler SELECT statement
shown first as if they where the more complicated SELECT statement
shown here. </p> </section>
<section id="GUID-E90057A8-70B6-590C-B8AE-616DA25BB543"><title>Use
Inequality Constraints on the Last Index Term</title> <p>Terms in
the WHERE clause of a query or UPDATE or DELETE statement are mostly
likely to trigger the use of an index if they are an equality constraint
– in other words if the term consists of the name of an indexed column,
an equal sign (“=”), and an expression. </p> <p>So, for example, if
you have a table and index that look like this: </p> <codeblock id="GUID-84AADB9D-5853-57C2-B489-87DC7FB7AADE" xml:space="preserve">
CREATE TABLE demo315(a,b,c,d);
CREATE INDEX demo315_idx1 ON demo315(a,b,c);
</codeblock> <p>And a query like this: </p> <codeblock id="GUID-A2B7DA9F-DB82-5D06-80E2-7AF714E403D5" xml:space="preserve">
SELECT d FROM demo315 WHERE a=512;
</codeblock> <p>The single “a=512” term of the WHERE clause qualifies
as an equality constraint and is likely to provoke the use of the
demo315_idx1 index. </p> <p>SQLite supports two other kinds of equality
constraints. One is the IN operator: </p> <codeblock id="GUID-EA5D7637-A6B8-5BC0-A72E-D576B0F945A3" xml:space="preserve">
SELECT d FROM demo315 WHERE a IN (512,1024);
SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable);
</codeblock> <p>There other is the IS NULL constraint: </p> <codeblock id="GUID-B2C1C84B-C33D-55C5-8484-24B28EFC8E37" xml:space="preserve">
SELECT d FROM demo315 WHERE a IS NULL;
</codeblock> <p>SQLite allows at most one term of an index to be constrained
by an inequality such as less than “&lt;”, greater than “&gt;”, less
than or equal to “&lt;=”, or greater than or equal to “&gt;=”. </p> <p>The column that the inequality constrains will be the right-most
term of the index that is used. So, for example, in this query: </p> <codeblock id="GUID-563231B5-EC3A-57C2-BC6F-1A8129ADE308" xml:space="preserve">
SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND c=1;
</codeblock> <p>Only the first two terms of the WHERE clause will
be used with the demo315_idx1 index. The third term, the “c=1” constraint,
cannot be used because the “c” column occurs to the right of the “b”
column in the index and the “b” column is constrained by an inequality. </p> <p>SQLite allows up to two inequalities on the same column as long
as the two inequalities provide an upper and lower bound on the column.
For example, in this query: </p> <codeblock id="GUID-4EB94886-EDFF-58F2-8692-011A67AC5A60" xml:space="preserve">
SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND b&lt;23;
</codeblock> <p>All three terms of the WHERE clause will be used because
the two inequalities on the “b” column provide an upper and lower
bound on the value of “b”. </p> <p>SQLite will only use the four inequalities
mentioned above to help constrain a search: “&lt;”, “&gt;”, “&lt;=”,
and “&gt;=”. Other inequality operators such as not equal to (“!=” or
“&lt;&gt;”) and NOT NULL are not helpful to the query optimizer and will
never be used to control an index and help make the query run faster. </p> </section>
<section id="GUID-CAD0C181-37E7-578A-A7E1-7843447C247F"><title>Use
Indexes To Help ORDER BY Clauses Evaluate Faster</title> <p>The default
method for evaluating an ORDER BY clause in a SELECT statement is
to first evaluate the SELECT statement and store the results in a
temporary tables, then sort the temporary table according to the ORDER
BY clause and scan the sorted temporary table to generate the final
output. </p> <p>This method always works, but it requires three passes
over the data (one pass to generate the result set, a second pass
to sort the result set, and a third pass to output the results) and
it requires a temporary storage space sufficiently large to contain
the entire results set. </p> <p>Where possible, SQLite will avoid
storing and sorting the result set by using an index that causes the
results to emerge from the query in sorted order in the first place. </p> <p>The way to get SQLite to use an index for sorting is to provide
an index that covers the same columns specified in the ORDER BY clause.
For example, if the table and index are like this: </p> <codeblock id="GUID-F0103033-C5C8-5177-8AD7-70BCC45C33C9" xml:space="preserve">
CREATE TABLE demo316(a,b,c,data);
CREATE INDEX idx316 ON demo316(a,b,c);
</codeblock> <p>And you do a query like this: </p> <codeblock id="GUID-D67BB6FF-E213-5B86-A2C1-E1992DA96A62" xml:space="preserve">
SELECT data FROM demo316 ORDER BY a,b,c;
</codeblock> <p>SQLite will use the idx316 index to implement the
ORDER BY clause, obviating the need for temporary storage space and
a separate sorting pass. </p> <p>An index can be used to satisfy the
search constraints of a WHERE clause and to impose the ORDER BY ordering
of outputs all at once. The trick is for the ORDER BY clause terms
to occur immediately after the WHERE clause terms in the index. For
example, one can write: </p> <codeblock id="GUID-02063968-34B5-5766-9D02-86D696D39C1E" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 ORDER BY b,c;
</codeblock> <p>The “a” column is used in the WHERE clause and the
immediately following terms of the index, “b” and “c” are used in
the ORDER BY clause. So in this case the idx316 index would be used
both to speed up the search and to satisfy the ORDER BY clause. </p> <p>This query also uses the idx316 index because, once again, the
ORDER BY clause term “c” immediate follows the WHERE clause terms
“a” and “b” in the index: </p> <codeblock id="GUID-6760EC7E-E86A-5EBD-BDDD-32A68BE78A9E" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c;
</codeblock> <p>But now consider this: </p> <codeblock id="GUID-9363996C-8C30-5E04-B05F-392C8262F1F6" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 ORDER BY c;
</codeblock> <p>Here there is a gap between the ORDER BY term “c”
and the WHERE clause term “a”. So the idx316 index cannot be used
to satisfy both the WHERE clause and the ORDER BY clause. The index
will be used on the WHERE clause and a separate sorting pass will
occur to put the results in the correct order. </p> </section>
<section id="GUID-109AF0DA-A054-504A-A432-76BD145B2AC4"><title>Add
Result Columns To The End Of Indexes</title> <p>Queries will sometimes
run faster if their result columns appear in the right-most entries
of an index. Consider the following example: </p> <codeblock id="GUID-63292052-B523-5671-B3EE-E10A66C7275F" xml:space="preserve">
CREATE TABLE demo317(a,b,c,data);
CREATE INDEX idx317 ON demo316(a,b,c);
</codeblock> <p>A query where all result column terms appears in the
index, such as </p> <codeblock id="GUID-41F740E7-EAFC-583B-BFE6-E63DBEA354D7" xml:space="preserve">
SELECT c FROM demo317 WHERE a=5 ORDER BY b;
</codeblock> <p>will typically run about twice as fast or faster than
a query that uses columns that are not in the index, e.g. </p> <codeblock id="GUID-098752F4-304A-5A84-834E-240D97D97C2D" xml:space="preserve">
SELECT data FROM demo317 WHERE a=5 ORDER BY b;
</codeblock> <p>The reason for this is that when all information is
contained within the index entry only a single search has to be made
for each row of output. But when some of the information is in the
index and other parts are in the table, first there must be a search
for the appropriate index entry then a separate search is made for
the appropriate table row based on the RowID found in the index entry.
Twice as much searching has to be done for each row of output generated. </p> <p>The extra query speed does not come for free, however. Adding
additional columns to an index makes the database file larger. So
when developing an application, the programmer will need to make a
space versus time trade-off to determine whether the extra columns
should be added to the index or not. </p> <p>Note that if any column
of the result must be obtained from the original table, then the table
row will have to be searched for anyhow. There will be no speed advantage,
so you might as well omit the extra columns from the end of the index
and save on storage space. The speed-up described in this section
can only be realized when every column in a table is obtainable from
the index. </p> <p>Taking into account the results of the previous
few sections, the best set of columns to put in an index can be described
as follows: </p> <ul>
<li id="GUID-EBF4DEFB-2F5F-5D78-92FA-06FEAB0C3650"><p>The first columns
in the index should be columns that have equality constraints in the
WHERE clause of the query. </p> </li>
<li id="GUID-E5CB725C-6304-5946-9E18-E69B5F1A6A88"><p>The second group
of columns should match the columns specified in the ORDER BY clause. </p> </li>
<li id="GUID-FBC00251-C3AD-5AC0-9102-EF66EA37DE4E"><p>Add additional
columns to the end of the index that are used in the result set of
the query. </p> </li>
</ul> </section>
<section id="GUID-D7B5B389-E031-5512-8186-235B22F0D9C1"><title>Resolve
Indexing Ambiguities Using the Unary “+” Operator</title> <p>The SQLite
query optimizer usually does a good job of choosing the best index
to use for a particular query, especially if ANALYZE has been run
to provide it with index performance statistics. But occasions do
arise where it is useful to give the optimizer hints. </p> <p>One
of the easiest ways to control the operation of the optimizer is to
disqualify terms in the WHERE clause or ORDER BY clause as candidates
for optimization by using the unary “+” operator. </p> <p>In SQLite,
a unary “+” operator is a no-op. It makes no change to its operand,
even if the operand is something other than a number. So you can always
prefix a “+” to an expression in without changing the meaning of the
expression. As the optimizer will only use terms in WHERE, HAVING,
or ON clauses that have an index column name on one side of a comparison
operator, you can prevent such a term from being used by the optimizer
by prefixing the column name with a “+”. </p> <p>For example, suppose
you have a database with a schema like this: </p> <codeblock id="GUID-E7747EFD-FE58-5EA4-88B3-097C0A303F52" xml:space="preserve">
CREATE TABLE demo321(a,b,c,data);
CREATE INDEX idx321a ON demo321(a);
CREATE INDEX idx321b ON demo321(b);
</codeblock> <p>If you issue a query such as this: </p> <codeblock id="GUID-87BD59FC-33A8-598B-B91F-607B26F7349D" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 AND b=11;
</codeblock> <p>The query optimizer might use the “a=5” term with
idx321a or it might use the “b=11” term with the idx321b index. But
if you want to force the use of the idx321a index you can accomplish
that by disqualifying the second term of the WHERE clause as a candidate
for optimization using a unary “+” like this: </p> <codeblock id="GUID-E6EAB459-726A-5FE4-8065-6C46AC2C5B5C" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 AND +b=11;
</codeblock> <p>The “+” in front of the “b=11” turns the left-hand
side of the equals comparison operator into an expression instead
of an indexed column name. The optimizer will then not recognize that
the second term can be used with an index and so the optimizer is
compelled to use the first “a=5” term. </p> <p>The unary “+” operator
can also be used to disable ORDER BY clause optimizations. Consider
this query: </p> <codeblock id="GUID-0488D466-77B7-50E0-AB85-FF033A2D75DC" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 ORDER BY b;
</codeblock> <p>The optimizer has the choice of using the “a=5” term
of the WHERE clause with idx321a to restrict the search. Or it might
choose to use do a full table scan with idx321b to satisfy the ORDER
BY clause and thus avoid a separate sorting pass. You can force one
choice or the other using a unary “+”. </p> <p>To force the use of
idx321a on the WHERE clause, add the unary “+” in from of the “b”
in the ORDER BY clause: </p> <codeblock id="GUID-E55A085F-D91F-58E0-B964-317BB3A9D7ED" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 ORDER BY +b;
</codeblock> <p>To go the other way and force the idx321b index to
be used to satisfy the ORDER BY clause, disqualify the WHERE term
by prefixing with a unary “+”: </p> <codeblock id="GUID-D97EF52A-1F74-57EB-AC11-7911B4E088B3" xml:space="preserve">
SELECT data FROM demo321 WHERE +a=5 ORDER BY b;
</codeblock> <p>The reader is cautioned not to overuse the unary “+”
operator. The SQLite query optimizer usually picks the best index
without any outside help. Premature use of unary “+” can confuse the
optimizer and cause less than optimal performance. But in some cases
it is useful to be able override the decisions of the optimizer, and
the unary “+” operator is an excellent way to do this when it becomes
necessary. </p> </section>
<section id="GUID-7BEBC49C-0528-5D58-9626-2A92F3D0D9E8"><title>Avoid
Indexing Large BLOBs and CLOBs</title> <p>SQLite stores indexes as
b-trees. Each b-tree node uses one page of the database file. In order
to maintain an acceptable fan-out, the b-tree module within SQLite
requires that at least 4 entries must fit on each page of a b-tree.
There is also some overhead associated with each b-tree page. So at
the most there is about 250 bytes of space available on the main b-tree
page for each index entry. </p> <p>If an index entry exceeds this
allotment of approximately 250 bytes excess bytes are spilled to overflow
pages. There is no arbitrary limit on the number of overflow pages
or on the length of a b-tree entry, but for maximum efficiency it
is best to avoid overflow pages, especially in indexes. This means
that you should strive to keep the number of bytes in each index entry
below 250. </p> <p>If you keep the size of indexes significantly smaller
than 250 bytes, then the b-tree fan-out is increased and the binary
search algorithm used to search for entries in an index has fewer
pages to examine and therefore runs faster. So the fewer bytes used
in each index entry the better, at least from a performance perspective. </p> <p>For these reasons, it is recommended that you avoid indexing
large BLOBs and CLOBs. SQLite will continue to work when large BLOBs
and CLOBs are indexed, but there will be a performance impact. </p> <p>On the other hand, if you need to lookup entries using a large
BLOB or CLOB as the key, then by all means use an index. An index
on a large BLOB or CLOB is not as fast as an index using more compact
data types such as integers, but it is still many order of magnitude
faster than doing a full table scan. So to be more precise, the advice
of this section is that you should design your applications so that
you do not need to lookup entries using a large BLOB or CLOB as the
key. Try to arrange to have compact keys consisting of short strings
or integers. </p> <p>Note that many other SQL database engines disallow
the indexing of BLOBs and CLOBs in the first place. You simple cannot
do it. SQLite is more flexible that most in that it does allow BLOBs
and CLOBs to be indexed and it will use those indexes when appropriate.
But for maximum performance, it is best to use smaller search keys. </p> </section>
<section id="GUID-DD40F29F-DF93-536E-9B52-F9B9FF45155D"><title>Avoid
Excess Indexes</title> <p>Some developers approach SQL-based application
development with the attitude that indexes never hurt and that the
more indexes you have, the faster your application will run. This
is definitely not the case. There is a costs associated with each
new index you create: </p> <ul>
<li id="GUID-FD257BF7-F938-54B5-AC03-9536712D6281"><p>Each new index
takes up additional space in the database file. The more indexes you
have, the larger your database files will become for the same amount
of data. </p> </li>
<li id="GUID-E1B74FB6-246A-5148-AF06-04E1B4B949F1"><p>Every INSERT
and UPDATE statement modifies both the original table and all indexes
on that table. So the performance of INSERT and UPDATE decreases linearly
with the number of indexes. </p> </li>
<li id="GUID-56AAE2D1-71D6-5A23-8190-B0C80B204DED"><p>Compiling new
SQL statements using <codeph>Prepare()</codeph> takes longer when
there are more indexes for the optimizer to choose between. </p> </li>
<li id="GUID-24B7F7D8-FAA9-5C78-B3C7-B886FA774C0B"><p>Surplus indexes
give the optimizer more opportunities to make a bad choice. </p> </li>
</ul> <p>Your policy on indexes should be to avoid them wherever you
can. Indexes are powerful medicine and can work wonders to improve
the performance of a program. But just as too many drugs can be worse
than none at all, so also can too many indexes cause more harm than
good. </p> <p>When building a new application, a good approach is
to omit all explicitly declared indexes in the beginning and only
add indexes as needed to address specific performance problems. </p> <p>Take care to avoid redundant indexes. For example, consider this
schema: </p> <codeblock id="GUID-89F20101-1628-5783-82B0-2ABE84078C7D" xml:space="preserve">
CREATE TABLE demo323a(a,b,c);
CREATE INDEX idx323a1 ON demo323(a);
CREATE INDEX idx323a2 ON demo323(a,b);
</codeblock> <p>The idx323a1 index is redundant and can be eliminated.
Anything that the idx323a1 index can do the idx323a2 index can do
better. </p> <p>Other redundancies are not quite as apparent as the
above. Recall that any column or columns that are declared UNIQUE
or PRIMARY KEY (except for the special case of INTEGER PRIMARY KEY)
are automatically indexed. So in the following schema: </p> <codeblock id="GUID-2FE7B726-4027-518C-9217-B4BD1ECDA991" xml:space="preserve">
CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE);
CREATE INDEX idx323b1 ON demo323b(x);
CREATE INDEX idx323b2 ON demo323b(y);
</codeblock> <p>Both indexes are redundant and can be eliminated with
no loss in query performance. Occasionally one sees a novice SQL programmer
use both UNIQUE and PRIMARY KEY on the same column: </p> <codeblock id="GUID-CDE12649-BDB4-58D4-8981-02628BDF5C79" xml:space="preserve">
CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q);
</codeblock> <p>This has the effect of creating two indexes on the
“p” column – one for the UNIQUE keywords and another for the PRIMARY
KEY keyword. Both indexes are identical so clearly one can be omitted.
A PRIMARY KEY is guaranteed to always be unique so the UNIQUE keyword
can be removed from the demo323c table definition with no ambiguity
or loss of functionality. </p> <p>It is not a fatal error to create
too many indexes or redundant indexes. SQLite will continue to generate
the correct answers but it may take longer to produce those answers
and the resulting database files might be a little larger. So for
best results, keep the number of indexes to a minimum. </p> </section>
<section id="GUID-9337E315-BB5A-56D0-8319-6C398D26151F"><title>Avoid
Tables and Indexes with an Excessive Number of Columns</title> <p>SQLite places no arbitrary limits on the number of columns in a table
or index. There are known commercial applications using SQLite that
construct tables with tens of thousands of columns each. And these
applications actually work. </p> <p>However the database engine is
optimized for the common case of tables with no more than a few dozen
columns. For best performance you should try to stay in the optimized
region. Furthermore, we note that relational databases with a large
number of columns are usually not well normalized. So even apart from
performance considerations, if you find your design has tables with
more than a dozen or so columns, you really need to rethink how you
are building your application. </p> <p>There are a number of places
in <codeph>Prepare()</codeph> that run in time O(N<sup>2</sup>) where
N is the number of columns in the table. The constant of proportionality
is small in these cases so you should not have any problems for N
of less than one hundred but for N on the order of a thousand, the
time to run <codeph>Prepare()</codeph> can start to become noticeable. </p> <p>When the bytecode is running and it needs to access the i-th
column of a table, the values of the previous i-1 columns must be
accessed first. So if you have a large number of columns, accessing
the last column can be an expensive operation. This fact also argues
for putting smaller and more frequently accessed columns early in
the table. </p> <p>There are certain optimizations that will only
work if the table has 30 or fewer columns. The optimization that extracts
all necessary information from an index and never refers to the underlying
table works this way. So in some cases, keeping the number of columns
in a table at or below 30 can result in a 2-fold speed improvement. </p> <p>Indexes will only be used if they contain 30 or fewer columns.
You can put as many columns in an index as you want, but if the number
is greater than 30, the index will never improve performance and will
never do anything but take up space in your database file. </p> </section>
</conbody><related-links>
<link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL
Overview</linktext></link>
<link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL
Server Guide</linktext></link>
<link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext>
</link>
<link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL
DB Overview</linktext></link>
<link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid
Transient Tables</linktext></link>
<link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent
Datafile Corruption</linktext></link>
<link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita">
<linktext>SQL Insertion Tips</linktext></link>
<link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL
Schema Tips</linktext></link>
<link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL
Expressions</linktext></link>
<link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL
Statement Tips</linktext></link>
<link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL
Joins</linktext></link>
<link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
Command</linktext></link>
<link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL
WHERE CLause Tips</linktext></link>
</related-links></concept>