Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita
author Dominic Pinkman <Dominic.Pinkman@Nokia.com>
Wed, 31 Mar 2010 11:11:55 +0100
changeset 7 51a74ef9ed63
child 13 48780e181b38
permissions -rw-r--r--
Week 12 contribution of API Specs and fix SDK submission

<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
<!-- This component and the accompanying materials are made available under the terms of the License 
"Eclipse Public License v1.0" which accompanies this distribution, 
and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
<!-- Initial Contributors:
    Nokia Corporation - initial contribution.
Contributors: 
-->
<!DOCTYPE concept
  PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A" xml:lang="en"><title>SQL Index
Tips</title><shortdesc>This document includes several tips for using SQL indexes.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
<section id="GUID-3895F9D0-DE9C-4375-B541-AC99CABB7B8A"><title>Introduction</title> <p>You can use indexes to speed up access.
You create indexes automatically using PRIMARY KEY and UNIQUE. </p> <p><b>Intended
audience:</b> </p> <p>This document is intended to be used by Symbian platform
licensees and third party application developers. </p> </section>
<section id="GUID-765F0DF1-ACB0-57DB-B9A8-3697E4637065"><title>Use an Index
to Speed up Access</title> <p>Suppose you have a table like this: </p> <codeblock id="GUID-F70B25AB-A151-52CE-A413-1C62A2464D6A" xml:space="preserve">
CREATE TABLE demo5(
    id INTEGER,
    content BLOB
);
</codeblock> <p>Further suppose that this table contains thousands or millions
of rows and you want to access a single row with a particular ID: </p> <codeblock id="GUID-B02FA452-4093-5383-BAFA-AE035919D720" xml:space="preserve">
SELECT content FROM demo5 WHERE id=?
</codeblock> <p>The only want that SQLite can perform this query, and be certain
to get every row with the chosen ID, is to examine every single row, check
the ID of that row, and return the content if the ID matches. Examining every
single row this way is called a <i>full table scan</i>. </p> <p>Reading and
checking every row of a large table can be very slow, so you want to avoid
full table scans. The usual way to do this is to create an index on the column
you are searching against. In the example above, an appropriate index would
be this: </p> <codeblock id="GUID-82E337F1-2CA2-51B0-A7BC-071A83779A18" xml:space="preserve">
CREATE INDEX demo5_idx1 ON demo5(id);
</codeblock> <p>With an index on the ID column, SQLite is able to use a binary
search to locate entries that contain a particular value of ID. So if the
table contains a million rows, the query can be satisfied with about 20 accesses
rather than 1000000 accesses. This is a huge performance improvement. </p> <p>One
of the features of the SQL language is that you do not have to figure out
what indexes you may need in advance of coding your application. It is perfectly
acceptable, even preferable, to write the code for your application using
a database without any indexes. Then once the application is running and you
can make speed measurements, add whatever indexes are needed in order to make
it run faster. </p> <p>When you add indexes, the query optimizer within the
SQL compiler is able to find new more efficient bytecode procedures for carrying
out the operations that your SQL statements specify. In other words, by adding
indexes late in the development cycle you have the power to completely reorganize
your data access patterns without changing a single line of code. </p> </section>
<section id="GUID-BB1F17C5-1174-5DF4-AA61-611173237F3F"><title>Create Indexes
Automatically Using PRIMARY KEY and UNIQUE</title> <p>Any column of a table
that is declared to be the PRIMARY KEY or that is declared UNIQUE will be
indexed automatically. There is no need to create a separate index on that
column using the CREATE INDEX statement. So, for example, this table declaration: </p> <codeblock id="GUID-E4BE6077-F639-5CE7-964A-276B0D58A129" xml:space="preserve">
CREATE TABLE demo39a(
    id INTEGER,
    content BLOB
);

CREATE INDEX demo39_idx1 ON demo39a(id);
</codeblock> <p>Is roughly equivalent to the following: </p> <codeblock id="GUID-DB3167E0-FA95-50CA-92C7-102B5C2C13E3" xml:space="preserve">
CREATE TABLE demo39b(
    id INTEGER UNIQUE,
    content BLOB
);
</codeblock> <p>The two examples above are “roughly” equivalent, but not exactly
equivalent. Both tables have an index on the ID column. In the first case,
the index is created explicitly. In the second case, the index is implied
by the UNIQUE keyword in the type declaration of the ID column. Both table
designs use exactly the same amount of disk space, and both will run queries
such as </p> <codeblock id="GUID-7ACAE270-6D20-557B-B7D1-C90EDD757E43" xml:space="preserve">
SELECT content FROM demo39 WHERE id=?
</codeblock> <p>using exactly the same bytecode. The only difference is that
table demo39a lets you insert multiple rows with the same ID whereas table
demo39b will raise an exception if you try to insert a new row with the same
ID as an existing row. </p> <p>If you use the UNIQUE keyword in the CREATE
INDEX statement of demo39a, like this: </p> <codeblock id="GUID-0EE5E186-CC4A-5CC3-AEAE-F1482F1F8F9A" xml:space="preserve">
CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id);
</codeblock> <p>Then both table designs really would be exactly the same in
every way. In fact, whenever SQLite sees the UNIQUE keyword on a column type
declaration, all it does is create an automatic unique index on that column. </p> <p>The
PRIMARY KEY modifier on a column type declaration works like UNIQUE; it causes
a unique index to be created automatically. The main difference is that you
are only allowed to have a single PRIMARY KEY. This restriction of only allowing
a single PRIMARY KEY is part of the official SQL language definition. </p> <p>The
idea is that a PRIMARY KEY is used to order the rows on disk. Some SQL database
engines actually implement PRIMARY KEYs this way. But with SQLite, a PRIMARY
KEY is like any other UNIQUE column, with only one exception: INTEGER PRIMARY
KEY is a special case which is handled differently, as described in the next
section. </p> </section>
<section id="GUID-BF7A0301-8490-58ED-BB37-FAC403A84230"><title>Use Multi-Column
Indexes</title> <p>SQLite is able to make use of multi-column indexes. The
rule is that if an index is over columns <i>X</i>  <i> 0 </i>, <i>X</i>  <i> 1 </i>, <i>X</i>  <i> 2 </i>,
..., <i>X</i>  <i> n </i> of some table, then the index can be used if the
WHERE clause contains equality constraints for some prefix of those columns <i>X</i>  <i>0 </i>, <i>X</i>  <i>1 </i>, <i>X</i>  <i>2 </i>,
..., <i>X</i>  <i>i </i> where <i>i</i> is less than <i>n</i>. </p> <p>As
an example, suppose you have a table and index declared as follows: </p> <codeblock id="GUID-C18C97F7-23CA-5636-9F00-130A8FB3DEF5" xml:space="preserve">
CREATE TABLE demo314(a,b,c,d,e,f,g);
CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f);
</codeblock> <p>Then the index might be used to help with a query that contained
a WHERE clause like this: </p> <codeblock id="GUID-8A0944F4-1ACF-5267-B49F-EB83EFBB5670" xml:space="preserve">
... WHERE a=1 AND b='Smith' AND c=1
</codeblock> <p>All three terms of the WHERE clause would be used together
with the index in order to narrow the search. But the index could not be used
if there WHERE clause said: </p> <codeblock id="GUID-B5F1C17F-0F5E-5FC2-A9A4-DF19D699A076" xml:space="preserve">
... WHERE b='Smith' AND c=1
</codeblock> <p>The second WHERE clause does not contain equality terms for
a prefix of the columns in the index because it omits a term for the “a” column. </p> <p>In
a case like this: </p> <codeblock id="GUID-EF2CFE7D-0456-5414-847D-BADCC057CFD8" xml:space="preserve">
... WHERE a=1 AND c=1
</codeblock> <p>Only the “a=1” term in the WHERE clause could be used to help
narrow the search. The “c=1” term is not part of the prefix of terms in the
index which have equality constraints because there is no equality constraint
on the “b” column. </p> <p>SQLite only allows a single index to be used per
table within a simple SQL statement. For UPDATE and DELETE statements, this
means that only a single index can ever be used, since those statements can
only operate on a single table at a time. </p> <p>In a simple SELECT statement
multiple indexes can be used if the SELECT statement is a join – one index
per table in the join. In a compound SELECT statement (two or more SELECT
statements connected by UNION or INTERSECT or EXCEPT) each SELECT statement
is treated separately and can have its own indexes. Likewise, SELECT statements
that appear in subexpressions are treated separately. </p> <p>Some other SQL
database engines (for example PostgreSQL) allow multiple indexes to be used
for each table in a SELECT. For example, if you had a table and index in PostgreSQL
like this: </p> <codeblock id="GUID-F5DE8F24-7471-5992-9896-295CE173D855" xml:space="preserve">
CREATE TABLE pg1(a INT, b INT, c INT, d INT);
CREATE INDEX pg1_ix1 ON pg1(a);
CREATE INDEX pg1_ix2 ON pg1(b);
CREATE INDEX pg1_ix3 ON pg1(c);
</codeblock> <p>And if you were to run a query like the following: </p> <codeblock id="GUID-A35663B7-4E7D-5CC0-BF5E-CF3A4CFED63F" xml:space="preserve">
SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99;
</codeblock> <p>Then PostgreSQL might attempt to optimize the query by using
all three indexes, one for each term of the WHERE clause. </p> <p>SQLite does
not work this way. SQLite is compelled to select a single index to use in
the query. It might select any of the three indexes shown, depending on which
one the optimizer things will give the best speedup. But in every case it
will only select a single index and only a single term of the WHERE clause
will be used. </p> <p>SQLite prefers to use a multi-column index such as this: </p> <codeblock id="GUID-40FB7075-1239-5089-BBC5-0D994F4A0C39" xml:space="preserve">
CREATE INDEX pg1_ix_all ON pg1(a,b,c);
</codeblock> <p>If the pg1_ix_all index is available for use when the SELECT
statement above is prepared, SQLite will likely choose it over any of the
single-column indexes because the multi-column index is able to make use of
all 3 terms of the WHERE clause. </p> <p>You can trick SQLite into using multiple
indexes on the same table by rewriting the query. Instead of the SELECT statement
shown above, if you rewrite it as this: </p> <codeblock id="GUID-D7DE75D4-BB01-50DF-A9DC-956A83DED5D0" xml:space="preserve">
SELECT d FROM pg1 WHERE RowID IN (
    SELECT RowID FROM pg1 WHERE a=5
    INTERSECT
    SELECT RowID FROM pg1 WHERE b=11
    INTERSECT
    SELECT RowID FROM pg1 WHERE c=99
)
</codeblock> <p>Then each of the individual SELECT statements will using a
different single-column index and their results will be combined by the outer
SELECT statement to give the correct result. The other SQL database engines
like PostgreSQL that are able to make use of multiple indexes per table do
so by treating the simpler SELECT statement shown first as if they where the
more complicated SELECT statement shown here. </p> </section>
<section id="GUID-E90057A8-70B6-590C-B8AE-616DA25BB543"><title>Use Inequality
Constraints on the Last Index Term</title> <p>Terms in the WHERE clause of
a query or UPDATE or DELETE statement are mostly likely to trigger the use
of an index if they are an equality constraint – in other words if the term
consists of the name of an indexed column, an equal sign (“=”), and an expression. </p> <p>So,
for example, if you have a table and index that look like this: </p> <codeblock id="GUID-84AADB9D-5853-57C2-B489-87DC7FB7AADE" xml:space="preserve">
CREATE TABLE demo315(a,b,c,d);
CREATE INDEX demo315_idx1 ON demo315(a,b,c);
</codeblock> <p>And a query like this: </p> <codeblock id="GUID-A2B7DA9F-DB82-5D06-80E2-7AF714E403D5" xml:space="preserve">
SELECT d FROM demo315 WHERE a=512;
</codeblock> <p>The single “a=512” term of the WHERE clause qualifies as an
equality constraint and is likely to provoke the use of the demo315_idx1 index. </p> <p>SQLite
supports two other kinds of equality constraints. One is the IN operator: </p> <codeblock id="GUID-EA5D7637-A6B8-5BC0-A72E-D576B0F945A3" xml:space="preserve">
SELECT d FROM demo315 WHERE a IN (512,1024);
SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable);
</codeblock> <p>There other is the IS NULL constraint: </p> <codeblock id="GUID-B2C1C84B-C33D-55C5-8484-24B28EFC8E37" xml:space="preserve">
SELECT d FROM demo315 WHERE a IS NULL;
</codeblock> <p>SQLite allows at most one term of an index to be constrained
by an inequality such as less than “&lt;”, greater than “&gt;”, less than or
equal to “&lt;=”, or greater than or equal to “&gt;=”. </p> <p>The column that
the inequality constrains will be the right-most term of the index that is
used. So, for example, in this query: </p> <codeblock id="GUID-563231B5-EC3A-57C2-BC6F-1A8129ADE308" xml:space="preserve">
SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND c=1;
</codeblock> <p>Only the first two terms of the WHERE clause will be used
with the demo315_idx1 index. The third term, the “c=1” constraint, cannot
be used because the “c” column occurs to the right of the “b” column in the
index and the “b” column is constrained by an inequality. </p> <p>SQLite allows
up to two inequalities on the same column as long as the two inequalities
provide an upper and lower bound on the column. For example, in this query: </p> <codeblock id="GUID-4EB94886-EDFF-58F2-8692-011A67AC5A60" xml:space="preserve">
SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND b&lt;23;
</codeblock> <p>All three terms of the WHERE clause will be used because the
two inequalities on the “b” column provide an upper and lower bound on the
value of “b”. </p> <p>SQLite will only use the four inequalities mentioned
above to help constrain a search: “&lt;”, “&gt;”, “&lt;=”, and “&gt;=”. Other inequality
operators such as not equal to (“!=” or “&lt;&gt;”) and NOT NULL are not helpful
to the query optimizer and will never be used to control an index and help
make the query run faster. </p> </section>
<section id="GUID-CAD0C181-37E7-578A-A7E1-7843447C247F"><title>Use Indexes
To Help ORDER BY Clauses Evaluate Faster</title> <p>The default method for
evaluating an ORDER BY clause in a SELECT statement is to first evaluate the
SELECT statement and store the results in a temporary tables, then sort the
temporary table according to the ORDER BY clause and scan the sorted temporary
table to generate the final output. </p> <p>This method always works, but
it requires three passes over the data (one pass to generate the result set,
a second pass to sort the result set, and a third pass to output the results)
and it requires a temporary storage space sufficiently large to contain the
entire results set. </p> <p>Where possible, SQLite will avoid storing and
sorting the result set by using an index that causes the results to emerge
from the query in sorted order in the first place. </p> <p>The way to get
SQLite to use an index for sorting is to provide an index that covers the
same columns specified in the ORDER BY clause. For example, if the table and
index are like this: </p> <codeblock id="GUID-F0103033-C5C8-5177-8AD7-70BCC45C33C9" xml:space="preserve">
CREATE TABLE demo316(a,b,c,data);
CREATE INDEX idx316 ON demo316(a,b,c);
</codeblock> <p>And you do a query like this: </p> <codeblock id="GUID-D67BB6FF-E213-5B86-A2C1-E1992DA96A62" xml:space="preserve">
SELECT data FROM demo316 ORDER BY a,b,c;
</codeblock> <p>SQLite will use the idx316 index to implement the ORDER BY
clause, obviating the need for temporary storage space and a separate sorting
pass. </p> <p>An index can be used to satisfy the search constraints of a
WHERE clause and to impose the ORDER BY ordering of outputs all at once. The
trick is for the ORDER BY clause terms to occur immediately after the WHERE
clause terms in the index. For example, one can write: </p> <codeblock id="GUID-02063968-34B5-5766-9D02-86D696D39C1E" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 ORDER BY b,c;
</codeblock> <p>The “a” column is used in the WHERE clause and the immediately
following terms of the index, “b” and “c” are used in the ORDER BY clause.
So in this case the idx316 index would be used both to speed up the search
and to satisfy the ORDER BY clause. </p> <p>This query also uses the idx316
index because, once again, the ORDER BY clause term “c” immediate follows
the WHERE clause terms “a” and “b” in the index: </p> <codeblock id="GUID-6760EC7E-E86A-5EBD-BDDD-32A68BE78A9E" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c;
</codeblock> <p>But now consider this: </p> <codeblock id="GUID-9363996C-8C30-5E04-B05F-392C8262F1F6" xml:space="preserve">
SELECT data FROM demo316 WHERE a=5 ORDER BY c;
</codeblock> <p>Here there is a gap between the ORDER BY term “c” and the
WHERE clause term “a”. So the idx316 index cannot be used to satisfy both
the WHERE clause and the ORDER BY clause. The index will be used on the WHERE
clause and a separate sorting pass will occur to put the results in the correct
order. </p> </section>
<section id="GUID-109AF0DA-A054-504A-A432-76BD145B2AC4"><title>Add Result
Columns To The End Of Indexes</title> <p>Queries will sometimes run faster
if their result columns appear in the right-most entries of an index. Consider
the following example: </p> <codeblock id="GUID-63292052-B523-5671-B3EE-E10A66C7275F" xml:space="preserve">
CREATE TABLE demo317(a,b,c,data);
CREATE INDEX idx317 ON demo316(a,b,c);
</codeblock> <p>A query where all result column terms appears in the index,
such as </p> <codeblock id="GUID-41F740E7-EAFC-583B-BFE6-E63DBEA354D7" xml:space="preserve">
SELECT c FROM demo317 WHERE a=5 ORDER BY b;
</codeblock> <p>will typically run about twice as fast or faster than a query
that uses columns that are not in the index, e.g. </p> <codeblock id="GUID-098752F4-304A-5A84-834E-240D97D97C2D" xml:space="preserve">
SELECT data FROM demo317 WHERE a=5 ORDER BY b;
</codeblock> <p>The reason for this is that when all information is contained
within the index entry only a single search has to be made for each row of
output. But when some of the information is in the index and other parts are
in the table, first there must be a search for the appropriate index entry
then a separate search is made for the appropriate table row based on the
RowID found in the index entry. Twice as much searching has to be done for
each row of output generated. </p> <p>The extra query speed does not come
for free, however. Adding additional columns to an index makes the database
file larger. So when developing an application, the programmer will need to
make a space versus time trade-off to determine whether the extra columns
should be added to the index or not. </p> <p>Note that if any column of the
result must be obtained from the original table, then the table row will have
to be searched for anyhow. There will be no speed advantage, so you might
as well omit the extra columns from the end of the index and save on storage
space. The speed-up described in this section can only be realized when every
column in a table is obtainable from the index. </p> <p>Taking into account
the results of the previous few sections, the best set of columns to put in
an index can be described as follows: </p> <ul>
<li id="GUID-EBF4DEFB-2F5F-5D78-92FA-06FEAB0C3650"><p>The first columns in
the index should be columns that have equality constraints in the WHERE clause
of the query. </p> </li>
<li id="GUID-E5CB725C-6304-5946-9E18-E69B5F1A6A88"><p>The second group of
columns should match the columns specified in the ORDER BY clause. </p> </li>
<li id="GUID-FBC00251-C3AD-5AC0-9102-EF66EA37DE4E"><p>Add additional columns
to the end of the index that are used in the result set of the query. </p> </li>
</ul> </section>
<section id="GUID-D7B5B389-E031-5512-8186-235B22F0D9C1"><title>Resolve Indexing
Ambiguities Using the Unary “+” Operator</title> <p>The SQLite query optimizer
usually does a good job of choosing the best index to use for a particular
query, especially if ANALYZE has been run to provide it with index performance
statistics. But occasions do arise where it is useful to give the optimizer
hints. </p> <p>One of the easiest ways to control the operation of the optimizer
is to disqualify terms in the WHERE clause or ORDER BY clause as candidates
for optimization by using the unary “+” operator. </p> <p>In SQLite, a unary
“+” operator is a no-op. It makes no change to its operand, even if the operand
is something other than a number. So you can always prefix a “+” to an expression
in without changing the meaning of the expression. As the optimizer will only
use terms in WHERE, HAVING, or ON clauses that have an index column name on
one side of a comparison operator, you can prevent such a term from being
used by the optimizer by prefixing the column name with a “+”. </p> <p>For
example, suppose you have a database with a schema like this: </p> <codeblock id="GUID-E7747EFD-FE58-5EA4-88B3-097C0A303F52" xml:space="preserve">
CREATE TABLE demo321(a,b,c,data);
CREATE INDEX idx321a ON demo321(a);
CREATE INDEX idx321b ON demo321(b);
</codeblock> <p>If you issue a query such as this: </p> <codeblock id="GUID-87BD59FC-33A8-598B-B91F-607B26F7349D" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 AND b=11;
</codeblock> <p>The query optimizer might use the “a=5” term with idx321a
or it might use the “b=11” term with the idx321b index. But if you want to
force the use of the idx321a index you can accomplish that by disqualifying
the second term of the WHERE clause as a candidate for optimization using
a unary “+” like this: </p> <codeblock id="GUID-E6EAB459-726A-5FE4-8065-6C46AC2C5B5C" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 AND +b=11;
</codeblock> <p>The “+” in front of the “b=11” turns the left-hand side of
the equals comparison operator into an expression instead of an indexed column
name. The optimizer will then not recognize that the second term can be used
with an index and so the optimizer is compelled to use the first “a=5” term. </p> <p>The
unary “+” operator can also be used to disable ORDER BY clause optimizations.
Consider this query: </p> <codeblock id="GUID-0488D466-77B7-50E0-AB85-FF033A2D75DC" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 ORDER BY b;
</codeblock> <p>The optimizer has the choice of using the “a=5” term of the
WHERE clause with idx321a to restrict the search. Or it might choose to use
do a full table scan with idx321b to satisfy the ORDER BY clause and thus
avoid a separate sorting pass. You can force one choice or the other using
a unary “+”. </p> <p>To force the use of idx321a on the WHERE clause, add
the unary “+” in from of the “b” in the ORDER BY clause: </p> <codeblock id="GUID-E55A085F-D91F-58E0-B964-317BB3A9D7ED" xml:space="preserve">
SELECT data FROM demo321 WHERE a=5 ORDER BY +b;
</codeblock> <p>To go the other way and force the idx321b index to be used
to satisfy the ORDER BY clause, disqualify the WHERE term by prefixing with
a unary “+”: </p> <codeblock id="GUID-D97EF52A-1F74-57EB-AC11-7911B4E088B3" xml:space="preserve">
SELECT data FROM demo321 WHERE +a=5 ORDER BY b;
</codeblock> <p>The reader is cautioned not to overuse the unary “+” operator.
The SQLite query optimizer usually picks the best index without any outside
help. Premature use of unary “+” can confuse the optimizer and cause less
than optimal performance. But in some cases it is useful to be able override
the decisions of the optimizer, and the unary “+” operator is an excellent
way to do this when it becomes necessary. </p> </section>
<section id="GUID-7BEBC49C-0528-5D58-9626-2A92F3D0D9E8"><title>Avoid Indexing
Large BLOBs and CLOBs</title> <p>SQLite stores indexes as b-trees. Each b-tree
node uses one page of the database file. In order to maintain an acceptable
fan-out, the b-tree module within SQLite requires that at least 4 entries
must fit on each page of a b-tree. There is also some overhead associated
with each b-tree page. So at the most there is about 250 bytes of space available
on the main b-tree page for each index entry. </p> <p>If an index entry exceeds
this allotment of approximately 250 bytes excess bytes are spilled to overflow
pages. There is no arbitrary limit on the number of overflow pages or on the
length of a b-tree entry, but for maximum efficiency it is best to avoid overflow
pages, especially in indexes. This means that you should strive to keep the
number of bytes in each index entry below 250. </p> <p>If you keep the size
of indexes significantly smaller than 250 bytes, then the b-tree fan-out is
increased and the binary search algorithm used to search for entries in an
index has fewer pages to examine and therefore runs faster. So the fewer bytes
used in each index entry the better, at least from a performance perspective. </p> <p>For
these reasons, it is recommended that you avoid indexing large BLOBs and CLOBs.
SQLite will continue to work when large BLOBs and CLOBs are indexed, but there
will be a performance impact. </p> <p>On the other hand, if you need to lookup
entries using a large BLOB or CLOB as the key, then by all means use an index.
An index on a large BLOB or CLOB is not as fast as an index using more compact
data types such as integers, but it is still many order of magnitude faster
than doing a full table scan. So to be more precise, the advice of this section
is that you should design your applications so that you do not need to lookup
entries using a large BLOB or CLOB as the key. Try to arrange to have compact
keys consisting of short strings or integers. </p> <p>Note that many other
SQL database engines disallow the indexing of BLOBs and CLOBs in the first
place. You simple cannot do it. SQLite is more flexible that most in that
it does allow BLOBs and CLOBs to be indexed and it will use those indexes
when appropriate. But for maximum performance, it is best to use smaller search
keys. </p> </section>
<section id="GUID-DD40F29F-DF93-536E-9B52-F9B9FF45155D"><title>Avoid Excess
Indexes</title> <p>Some developers approach SQL-based application development
with the attitude that indexes never hurt and that the more indexes you have,
the faster your application will run. This is definitely not the case. There
is a costs associated with each new index you create: </p> <ul>
<li id="GUID-FD257BF7-F938-54B5-AC03-9536712D6281"><p>Each new index takes
up additional space in the database file. The more indexes you have, the larger
your database files will become for the same amount of data. </p> </li>
<li id="GUID-E1B74FB6-246A-5148-AF06-04E1B4B949F1"><p>Every INSERT and UPDATE
statement modifies both the original table and all indexes on that table.
So the performance of INSERT and UPDATE decreases linearly with the number
of indexes. </p> </li>
<li id="GUID-56AAE2D1-71D6-5A23-8190-B0C80B204DED"><p>Compiling new SQL statements
using <codeph>Prepare()</codeph> takes longer when there are more indexes
for the optimizer to choose between. </p> </li>
<li id="GUID-24B7F7D8-FAA9-5C78-B3C7-B886FA774C0B"><p>Surplus indexes give
the optimizer more opportunities to make a bad choice. </p> </li>
</ul> <p>Your policy on indexes should be to avoid them wherever you can.
Indexes are powerful medicine and can work wonders to improve the performance
of a program. But just as too many drugs can be worse than none at all, so
also can too many indexes cause more harm than good. </p> <p>When building
a new application, a good approach is to omit all explicitly declared indexes
in the beginning and only add indexes as needed to address specific performance
problems. </p> <p>Take care to avoid redundant indexes. For example, consider
this schema: </p> <codeblock id="GUID-89F20101-1628-5783-82B0-2ABE84078C7D" xml:space="preserve">
CREATE TABLE demo323a(a,b,c);
CREATE INDEX idx323a1 ON demo323(a);
CREATE INDEX idx323a2 ON demo323(a,b);
</codeblock> <p>The idx323a1 index is redundant and can be eliminated. Anything
that the idx323a1 index can do the idx323a2 index can do better. </p> <p>Other
redundancies are not quite as apparent as the above. Recall that any column
or columns that are declared UNIQUE or PRIMARY KEY (except for the special
case of INTEGER PRIMARY KEY) are automatically indexed. So in the following
schema: </p> <codeblock id="GUID-2FE7B726-4027-518C-9217-B4BD1ECDA991" xml:space="preserve">
CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE);
CREATE INDEX idx323b1 ON demo323b(x);
CREATE INDEX idx323b2 ON demo323b(y);
</codeblock> <p>Both indexes are redundant and can be eliminated with no loss
in query performance. Occasionally one sees a novice SQL programmer use both
UNIQUE and PRIMARY KEY on the same column: </p> <codeblock id="GUID-CDE12649-BDB4-58D4-8981-02628BDF5C79" xml:space="preserve">
CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q);
</codeblock> <p>This has the effect of creating two indexes on the “p” column
– one for the UNIQUE keywords and another for the PRIMARY KEY keyword. Both
indexes are identical so clearly one can be omitted. A PRIMARY KEY is guaranteed
to always be unique so the UNIQUE keyword can be removed from the demo323c
table definition with no ambiguity or loss of functionality. </p> <p>It is
not a fatal error to create too many indexes or redundant indexes. SQLite
will continue to generate the correct answers but it may take longer to produce
those answers and the resulting database files might be a little larger. So
for best results, keep the number of indexes to a minimum. </p> </section>
<section id="GUID-9337E315-BB5A-56D0-8319-6C398D26151F"><title>Avoid Tables
and Indexes with an Excessive Number of Columns</title> <p>SQLite places no
arbitrary limits on the number of columns in a table or index. There are known
commercial applications using SQLite that construct tables with tens of thousands
of columns each. And these applications actually work. </p> <p>However the
database engine is optimized for the common case of tables with no more than
a few dozen columns. For best performance you should try to stay in the optimized
region. Furthermore, we note that relational databases with a large number
of columns are usually not well normalized. So even apart from performance
considerations, if you find your design has tables with more than a dozen
or so columns, you really need to rethink how you are building your application. </p> <p>There
are a number of places in <codeph>Prepare()</codeph> that run in time O(N<sup>2</sup>)
where N is the number of columns in the table. The constant of proportionality
is small in these cases so you should not have any problems for N of less
than one hundred but for N on the order of a thousand, the time to run <codeph>Prepare()</codeph> can
start to become noticeable. </p> <p>When the bytecode is running and it needs
to access the i-th column of a table, the values of the previous i-1 columns
must be accessed first. So if you have a large number of columns, accessing
the last column can be an expensive operation. This fact also argues for putting
smaller and more frequently accessed columns early in the table. </p> <p>There
are certain optimizations that will only work if the table has 30 or fewer
columns. The optimization that extracts all necessary information from an
index and never refers to the underlying table works this way. So in some
cases, keeping the number of columns in a table at or below 30 can result
in a 2-fold speed improvement. </p> <p>Indexes will only be used if they contain
30 or fewer columns. You can put as many columns in an index as you want,
but if the number is greater than 30, the index will never improve performance
and will never do anything but take up space in your database file. </p> </section>
</conbody><related-links>
<link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext>
</link>
<link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server
Guide</linktext></link>
<link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext>
</link>
<link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext>
</link>
<link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
Tables</linktext></link>
<link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile
Corruption</linktext></link>
<link><linktext/></link>
<link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion
Tips</linktext></link>
<link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
Tips</linktext></link>
<link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext>
</link>
<link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
Tips</linktext></link>
<link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL Joins</linktext>
</link>
<link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
Command</linktext></link>
<link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE
CLause Tips</linktext></link>
</related-links></concept>