SQL Index -Tips

diff -r 80ef3a206772 -r 48780e181b38 Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita --- a/Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita Fri Jul 16 17:23:46 2010 +0100 +++ b/Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita Tue Jul 20 12:00:49 2010 +0100 @@ -1,461 +1,477 @@ - - - - - -SQL Index -TipsThis document includes several tips for using SQL indexes. -

Introduction

You can use indexes to speed up access. -You create indexes automatically using PRIMARY KEY and UNIQUE.

Intended -audience:

This document is intended to be used by Symbian platform -licensees and third party application developers.

Use an Index -to Speed up Access

Suppose you have a table like this:

-CREATE TABLE demo5( - id INTEGER, - content BLOB -); -

Further suppose that this table contains thousands or millions -of rows and you want to access a single row with a particular ID:

-SELECT content FROM demo5 WHERE id=? -

The only want that SQLite can perform this query, and be certain -to get every row with the chosen ID, is to examine every single row, check -the ID of that row, and return the content if the ID matches. Examining every -single row this way is called a full table scan.

Reading and -checking every row of a large table can be very slow, so you want to avoid -full table scans. The usual way to do this is to create an index on the column -you are searching against. In the example above, an appropriate index would -be this:

-CREATE INDEX demo5_idx1 ON demo5(id); -

With an index on the ID column, SQLite is able to use a binary -search to locate entries that contain a particular value of ID. So if the -table contains a million rows, the query can be satisfied with about 20 accesses -rather than 1000000 accesses. This is a huge performance improvement.

One -of the features of the SQL language is that you do not have to figure out -what indexes you may need in advance of coding your application. It is perfectly -acceptable, even preferable, to write the code for your application using -a database without any indexes. Then once the application is running and you -can make speed measurements, add whatever indexes are needed in order to make -it run faster.

When you add indexes, the query optimizer within the -SQL compiler is able to find new more efficient bytecode procedures for carrying -out the operations that your SQL statements specify. In other words, by adding -indexes late in the development cycle you have the power to completely reorganize -your data access patterns without changing a single line of code.

Create Indexes -Automatically Using PRIMARY KEY and UNIQUE

Any column of a table -that is declared to be the PRIMARY KEY or that is declared UNIQUE will be -indexed automatically. There is no need to create a separate index on that -column using the CREATE INDEX statement. So, for example, this table declaration:

-CREATE TABLE demo39a( - id INTEGER, - content BLOB -); - -CREATE INDEX demo39_idx1 ON demo39a(id); -

Is roughly equivalent to the following:

-CREATE TABLE demo39b( - id INTEGER UNIQUE, - content BLOB -); -

The two examples above are “roughly” equivalent, but not exactly -equivalent. Both tables have an index on the ID column. In the first case, -the index is created explicitly. In the second case, the index is implied -by the UNIQUE keyword in the type declaration of the ID column. Both table -designs use exactly the same amount of disk space, and both will run queries -such as

-SELECT content FROM demo39 WHERE id=? -

using exactly the same bytecode. The only difference is that -table demo39a lets you insert multiple rows with the same ID whereas table -demo39b will raise an exception if you try to insert a new row with the same -ID as an existing row.

If you use the UNIQUE keyword in the CREATE -INDEX statement of demo39a, like this:

-CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id); -

Then both table designs really would be exactly the same in -every way. In fact, whenever SQLite sees the UNIQUE keyword on a column type -declaration, all it does is create an automatic unique index on that column.

The -PRIMARY KEY modifier on a column type declaration works like UNIQUE; it causes -a unique index to be created automatically. The main difference is that you -are only allowed to have a single PRIMARY KEY. This restriction of only allowing -a single PRIMARY KEY is part of the official SQL language definition.

The -idea is that a PRIMARY KEY is used to order the rows on disk. Some SQL database -engines actually implement PRIMARY KEYs this way. But with SQLite, a PRIMARY -KEY is like any other UNIQUE column, with only one exception: INTEGER PRIMARY -KEY is a special case which is handled differently, as described in the next -section.

Use Multi-Column -Indexes

SQLite is able to make use of multi-column indexes. The -rule is that if an index is over columns X 0 , X 1 , X 2 , -..., X n of some table, then the index can be used if the -WHERE clause contains equality constraints for some prefix of those columns X 0 , X 1 , X 2 , -..., X i where i is less than n.

As -an example, suppose you have a table and index declared as follows:

-CREATE TABLE demo314(a,b,c,d,e,f,g); -CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f); -

Then the index might be used to help with a query that contained -a WHERE clause like this:

-... WHERE a=1 AND b='Smith' AND c=1 -

All three terms of the WHERE clause would be used together -with the index in order to narrow the search. But the index could not be used -if there WHERE clause said:

-... WHERE b='Smith' AND c=1 -

The second WHERE clause does not contain equality terms for -a prefix of the columns in the index because it omits a term for the “a” column.

In -a case like this:

-... WHERE a=1 AND c=1 -

Only the “a=1” term in the WHERE clause could be used to help -narrow the search. The “c=1” term is not part of the prefix of terms in the -index which have equality constraints because there is no equality constraint -on the “b” column.

SQLite only allows a single index to be used per -table within a simple SQL statement. For UPDATE and DELETE statements, this -means that only a single index can ever be used, since those statements can -only operate on a single table at a time.

In a simple SELECT statement -multiple indexes can be used if the SELECT statement is a join – one index -per table in the join. In a compound SELECT statement (two or more SELECT -statements connected by UNION or INTERSECT or EXCEPT) each SELECT statement -is treated separately and can have its own indexes. Likewise, SELECT statements -that appear in subexpressions are treated separately.

Some other SQL -database engines (for example PostgreSQL) allow multiple indexes to be used -for each table in a SELECT. For example, if you had a table and index in PostgreSQL -like this:

-CREATE TABLE pg1(a INT, b INT, c INT, d INT); -CREATE INDEX pg1_ix1 ON pg1(a); -CREATE INDEX pg1_ix2 ON pg1(b); -CREATE INDEX pg1_ix3 ON pg1(c); -

And if you were to run a query like the following:

-SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99; -

Then PostgreSQL might attempt to optimize the query by using -all three indexes, one for each term of the WHERE clause.

SQLite does -not work this way. SQLite is compelled to select a single index to use in -the query. It might select any of the three indexes shown, depending on which -one the optimizer things will give the best speedup. But in every case it -will only select a single index and only a single term of the WHERE clause -will be used.

SQLite prefers to use a multi-column index such as this:

-CREATE INDEX pg1_ix_all ON pg1(a,b,c); -

If the pg1_ix_all index is available for use when the SELECT -statement above is prepared, SQLite will likely choose it over any of the -single-column indexes because the multi-column index is able to make use of -all 3 terms of the WHERE clause.

You can trick SQLite into using multiple -indexes on the same table by rewriting the query. Instead of the SELECT statement -shown above, if you rewrite it as this:

-SELECT d FROM pg1 WHERE RowID IN ( - SELECT RowID FROM pg1 WHERE a=5 - INTERSECT - SELECT RowID FROM pg1 WHERE b=11 - INTERSECT - SELECT RowID FROM pg1 WHERE c=99 -) -

Then each of the individual SELECT statements will using a -different single-column index and their results will be combined by the outer -SELECT statement to give the correct result. The other SQL database engines -like PostgreSQL that are able to make use of multiple indexes per table do -so by treating the simpler SELECT statement shown first as if they where the -more complicated SELECT statement shown here.

Use Inequality -Constraints on the Last Index Term

Terms in the WHERE clause of -a query or UPDATE or DELETE statement are mostly likely to trigger the use -of an index if they are an equality constraint – in other words if the term -consists of the name of an indexed column, an equal sign (“=”), and an expression.

So, -for example, if you have a table and index that look like this:

-CREATE TABLE demo315(a,b,c,d); -CREATE INDEX demo315_idx1 ON demo315(a,b,c); -

And a query like this:

-SELECT d FROM demo315 WHERE a=512; -

The single “a=512” term of the WHERE clause qualifies as an -equality constraint and is likely to provoke the use of the demo315_idx1 index.

SQLite -supports two other kinds of equality constraints. One is the IN operator:

-SELECT d FROM demo315 WHERE a IN (512,1024); -SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable); -

There other is the IS NULL constraint:

-SELECT d FROM demo315 WHERE a IS NULL; -

SQLite allows at most one term of an index to be constrained -by an inequality such as less than “<”, greater than “>”, less than or -equal to “<=”, or greater than or equal to “>=”.

The column that -the inequality constrains will be the right-most term of the index that is -used. So, for example, in this query:

-SELECT d FROM demo315 WHERE a=5 AND b>11 AND c=1; -

Only the first two terms of the WHERE clause will be used -with the demo315_idx1 index. The third term, the “c=1” constraint, cannot -be used because the “c” column occurs to the right of the “b” column in the -index and the “b” column is constrained by an inequality.

SQLite allows -up to two inequalities on the same column as long as the two inequalities -provide an upper and lower bound on the column. For example, in this query:

-SELECT d FROM demo315 WHERE a=5 AND b>11 AND b<23; -

All three terms of the WHERE clause will be used because the -two inequalities on the “b” column provide an upper and lower bound on the -value of “b”.

SQLite will only use the four inequalities mentioned -above to help constrain a search: “<”, “>”, “<=”, and “>=”. Other inequality -operators such as not equal to (“!=” or “<>”) and NOT NULL are not helpful -to the query optimizer and will never be used to control an index and help -make the query run faster.

Use Indexes -To Help ORDER BY Clauses Evaluate Faster

The default method for -evaluating an ORDER BY clause in a SELECT statement is to first evaluate the -SELECT statement and store the results in a temporary tables, then sort the -temporary table according to the ORDER BY clause and scan the sorted temporary -table to generate the final output.

This method always works, but -it requires three passes over the data (one pass to generate the result set, -a second pass to sort the result set, and a third pass to output the results) -and it requires a temporary storage space sufficiently large to contain the -entire results set.

Where possible, SQLite will avoid storing and -sorting the result set by using an index that causes the results to emerge -from the query in sorted order in the first place.

The way to get -SQLite to use an index for sorting is to provide an index that covers the -same columns specified in the ORDER BY clause. For example, if the table and -index are like this:

-CREATE TABLE demo316(a,b,c,data); -CREATE INDEX idx316 ON demo316(a,b,c); -

And you do a query like this:

-SELECT data FROM demo316 ORDER BY a,b,c; -

SQLite will use the idx316 index to implement the ORDER BY -clause, obviating the need for temporary storage space and a separate sorting -pass.

An index can be used to satisfy the search constraints of a -WHERE clause and to impose the ORDER BY ordering of outputs all at once. The -trick is for the ORDER BY clause terms to occur immediately after the WHERE -clause terms in the index. For example, one can write:

-SELECT data FROM demo316 WHERE a=5 ORDER BY b,c; -

The “a” column is used in the WHERE clause and the immediately -following terms of the index, “b” and “c” are used in the ORDER BY clause. -So in this case the idx316 index would be used both to speed up the search -and to satisfy the ORDER BY clause.

This query also uses the idx316 -index because, once again, the ORDER BY clause term “c” immediate follows -the WHERE clause terms “a” and “b” in the index:

-SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c; -

But now consider this:

-SELECT data FROM demo316 WHERE a=5 ORDER BY c; -

Here there is a gap between the ORDER BY term “c” and the -WHERE clause term “a”. So the idx316 index cannot be used to satisfy both -the WHERE clause and the ORDER BY clause. The index will be used on the WHERE -clause and a separate sorting pass will occur to put the results in the correct -order.

Add Result -Columns To The End Of Indexes

Queries will sometimes run faster -if their result columns appear in the right-most entries of an index. Consider -the following example:

-CREATE TABLE demo317(a,b,c,data); -CREATE INDEX idx317 ON demo316(a,b,c); -

A query where all result column terms appears in the index, -such as

-SELECT c FROM demo317 WHERE a=5 ORDER BY b; -

will typically run about twice as fast or faster than a query -that uses columns that are not in the index, e.g.

-SELECT data FROM demo317 WHERE a=5 ORDER BY b; -

The reason for this is that when all information is contained -within the index entry only a single search has to be made for each row of -output. But when some of the information is in the index and other parts are -in the table, first there must be a search for the appropriate index entry -then a separate search is made for the appropriate table row based on the -RowID found in the index entry. Twice as much searching has to be done for -each row of output generated.

The extra query speed does not come -for free, however. Adding additional columns to an index makes the database -file larger. So when developing an application, the programmer will need to -make a space versus time trade-off to determine whether the extra columns -should be added to the index or not.

Note that if any column of the -result must be obtained from the original table, then the table row will have -to be searched for anyhow. There will be no speed advantage, so you might -as well omit the extra columns from the end of the index and save on storage -space. The speed-up described in this section can only be realized when every -column in a table is obtainable from the index.

Taking into account -the results of the previous few sections, the best set of columns to put in -an index can be described as follows:

The first columns in -the index should be columns that have equality constraints in the WHERE clause -of the query.
The second group of -columns should match the columns specified in the ORDER BY clause.
Add additional columns -to the end of the index that are used in the result set of the query.

Resolve Indexing -Ambiguities Using the Unary “+” Operator

The SQLite query optimizer -usually does a good job of choosing the best index to use for a particular -query, especially if ANALYZE has been run to provide it with index performance -statistics. But occasions do arise where it is useful to give the optimizer -hints.

One of the easiest ways to control the operation of the optimizer -is to disqualify terms in the WHERE clause or ORDER BY clause as candidates -for optimization by using the unary “+” operator.

In SQLite, a unary -“+” operator is a no-op. It makes no change to its operand, even if the operand -is something other than a number. So you can always prefix a “+” to an expression -in without changing the meaning of the expression. As the optimizer will only -use terms in WHERE, HAVING, or ON clauses that have an index column name on -one side of a comparison operator, you can prevent such a term from being -used by the optimizer by prefixing the column name with a “+”.

For -example, suppose you have a database with a schema like this:

-CREATE TABLE demo321(a,b,c,data); -CREATE INDEX idx321a ON demo321(a); -CREATE INDEX idx321b ON demo321(b); -

If you issue a query such as this:

-SELECT data FROM demo321 WHERE a=5 AND b=11; -

The query optimizer might use the “a=5” term with idx321a -or it might use the “b=11” term with the idx321b index. But if you want to -force the use of the idx321a index you can accomplish that by disqualifying -the second term of the WHERE clause as a candidate for optimization using -a unary “+” like this:

-SELECT data FROM demo321 WHERE a=5 AND +b=11; -

The “+” in front of the “b=11” turns the left-hand side of -the equals comparison operator into an expression instead of an indexed column -name. The optimizer will then not recognize that the second term can be used -with an index and so the optimizer is compelled to use the first “a=5” term.

The -unary “+” operator can also be used to disable ORDER BY clause optimizations. -Consider this query:

-SELECT data FROM demo321 WHERE a=5 ORDER BY b; -

The optimizer has the choice of using the “a=5” term of the -WHERE clause with idx321a to restrict the search. Or it might choose to use -do a full table scan with idx321b to satisfy the ORDER BY clause and thus -avoid a separate sorting pass. You can force one choice or the other using -a unary “+”.

To force the use of idx321a on the WHERE clause, add -the unary “+” in from of the “b” in the ORDER BY clause:

-SELECT data FROM demo321 WHERE a=5 ORDER BY +b; -

To go the other way and force the idx321b index to be used -to satisfy the ORDER BY clause, disqualify the WHERE term by prefixing with -a unary “+”:

-SELECT data FROM demo321 WHERE +a=5 ORDER BY b; -

The reader is cautioned not to overuse the unary “+” operator. -The SQLite query optimizer usually picks the best index without any outside -help. Premature use of unary “+” can confuse the optimizer and cause less -than optimal performance. But in some cases it is useful to be able override -the decisions of the optimizer, and the unary “+” operator is an excellent -way to do this when it becomes necessary.

Avoid Indexing -Large BLOBs and CLOBs

SQLite stores indexes as b-trees. Each b-tree -node uses one page of the database file. In order to maintain an acceptable -fan-out, the b-tree module within SQLite requires that at least 4 entries -must fit on each page of a b-tree. There is also some overhead associated -with each b-tree page. So at the most there is about 250 bytes of space available -on the main b-tree page for each index entry.

If an index entry exceeds -this allotment of approximately 250 bytes excess bytes are spilled to overflow -pages. There is no arbitrary limit on the number of overflow pages or on the -length of a b-tree entry, but for maximum efficiency it is best to avoid overflow -pages, especially in indexes. This means that you should strive to keep the -number of bytes in each index entry below 250.

If you keep the size -of indexes significantly smaller than 250 bytes, then the b-tree fan-out is -increased and the binary search algorithm used to search for entries in an -index has fewer pages to examine and therefore runs faster. So the fewer bytes -used in each index entry the better, at least from a performance perspective.

For -these reasons, it is recommended that you avoid indexing large BLOBs and CLOBs. -SQLite will continue to work when large BLOBs and CLOBs are indexed, but there -will be a performance impact.

On the other hand, if you need to lookup -entries using a large BLOB or CLOB as the key, then by all means use an index. -An index on a large BLOB or CLOB is not as fast as an index using more compact -data types such as integers, but it is still many order of magnitude faster -than doing a full table scan. So to be more precise, the advice of this section -is that you should design your applications so that you do not need to lookup -entries using a large BLOB or CLOB as the key. Try to arrange to have compact -keys consisting of short strings or integers.

Note that many other -SQL database engines disallow the indexing of BLOBs and CLOBs in the first -place. You simple cannot do it. SQLite is more flexible that most in that -it does allow BLOBs and CLOBs to be indexed and it will use those indexes -when appropriate. But for maximum performance, it is best to use smaller search -keys.

Avoid Excess -Indexes

Some developers approach SQL-based application development -with the attitude that indexes never hurt and that the more indexes you have, -the faster your application will run. This is definitely not the case. There -is a costs associated with each new index you create:

Each new index takes -up additional space in the database file. The more indexes you have, the larger -your database files will become for the same amount of data.
Every INSERT and UPDATE -statement modifies both the original table and all indexes on that table. -So the performance of INSERT and UPDATE decreases linearly with the number -of indexes.
Compiling new SQL statements -using Prepare() takes longer when there are more indexes -for the optimizer to choose between.
Surplus indexes give -the optimizer more opportunities to make a bad choice.

Your policy on indexes should be to avoid them wherever you can. -Indexes are powerful medicine and can work wonders to improve the performance -of a program. But just as too many drugs can be worse than none at all, so -also can too many indexes cause more harm than good.

When building -a new application, a good approach is to omit all explicitly declared indexes -in the beginning and only add indexes as needed to address specific performance -problems.

Take care to avoid redundant indexes. For example, consider -this schema:

-CREATE TABLE demo323a(a,b,c); -CREATE INDEX idx323a1 ON demo323(a); -CREATE INDEX idx323a2 ON demo323(a,b); -

The idx323a1 index is redundant and can be eliminated. Anything -that the idx323a1 index can do the idx323a2 index can do better.

Other -redundancies are not quite as apparent as the above. Recall that any column -or columns that are declared UNIQUE or PRIMARY KEY (except for the special -case of INTEGER PRIMARY KEY) are automatically indexed. So in the following -schema:

-CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE); -CREATE INDEX idx323b1 ON demo323b(x); -CREATE INDEX idx323b2 ON demo323b(y); -

Both indexes are redundant and can be eliminated with no loss -in query performance. Occasionally one sees a novice SQL programmer use both -UNIQUE and PRIMARY KEY on the same column:

-CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q); -

This has the effect of creating two indexes on the “p” column -– one for the UNIQUE keywords and another for the PRIMARY KEY keyword. Both -indexes are identical so clearly one can be omitted. A PRIMARY KEY is guaranteed -to always be unique so the UNIQUE keyword can be removed from the demo323c -table definition with no ambiguity or loss of functionality.

It is -not a fatal error to create too many indexes or redundant indexes. SQLite -will continue to generate the correct answers but it may take longer to produce -those answers and the resulting database files might be a little larger. So -for best results, keep the number of indexes to a minimum.

Avoid Tables -and Indexes with an Excessive Number of Columns

SQLite places no -arbitrary limits on the number of columns in a table or index. There are known -commercial applications using SQLite that construct tables with tens of thousands -of columns each. And these applications actually work.

However the -database engine is optimized for the common case of tables with no more than -a few dozen columns. For best performance you should try to stay in the optimized -region. Furthermore, we note that relational databases with a large number -of columns are usually not well normalized. So even apart from performance -considerations, if you find your design has tables with more than a dozen -or so columns, you really need to rethink how you are building your application.

There -are a number of places in Prepare() that run in time O(N²) -where N is the number of columns in the table. The constant of proportionality -is small in these cases so you should not have any problems for N of less -than one hundred but for N on the order of a thousand, the time to run Prepare() can -start to become noticeable.

When the bytecode is running and it needs -to access the i-th column of a table, the values of the previous i-1 columns -must be accessed first. So if you have a large number of columns, accessing -the last column can be an expensive operation. This fact also argues for putting -smaller and more frequently accessed columns early in the table.

There -are certain optimizations that will only work if the table has 30 or fewer -columns. The optimization that extracts all necessary information from an -index and never refers to the underlying table works this way. So in some -cases, keeping the number of columns in a table at or below 30 can result -in a 2-fold speed improvement.

Indexes will only be used if they contain -30 or fewer columns. You can put as many columns in an index as you want, -but if the number is greater than 30, the index will never improve performance -and will never do anything but take up space in your database file.

-SQL Overview - -SQL Server -Guide -SQLite - -SQL DB Overview - -Avoid Transient -Tables -Prevent Datafile -Corruption - -SQL Insertion -Tips -SQL Schema -Tips -SQL Expressions - -SQL Statement -Tips -SQL Joins - - ANALYZE -Command - SQL WHERE -CLause Tips + + + + + +SQL Index TipsThis document includes several tips for using SQL indexes. +

Introduction

You can use indexes to speed up access. You create indexes automatically +using PRIMARY KEY and UNIQUE.

Intended audience:

This document is intended to be +used by Symbian platform licensees and third party application developers.

Use +an Index to Speed up Access

Suppose you have a table like +this:

+CREATE TABLE demo5( + id INTEGER, + content BLOB +); +

Further suppose that this table contains thousands +or millions of rows and you want to access a single row with a particular +ID:

+SELECT content FROM demo5 WHERE id=? +

The only want that SQLite can perform this query, +and be certain to get every row with the chosen ID, is to examine +every single row, check the ID of that row, and return the content +if the ID matches. Examining every single row this way is called a full table scan.

Reading and checking every row of a +large table can be very slow, so you want to avoid full table scans. +The usual way to do this is to create an index on the column you are +searching against. In the example above, an appropriate index would +be this:

+CREATE INDEX demo5_idx1 ON demo5(id); +

With an index on the ID column, SQLite is able to +use a binary search to locate entries that contain a particular value +of ID. So if the table contains a million rows, the query can be satisfied +with about 20 accesses rather than 1000000 accesses. This is a huge +performance improvement.

One of the features of the SQL language +is that you do not have to figure out what indexes you may need in +advance of coding your application. It is perfectly acceptable, even +preferable, to write the code for your application using a database +without any indexes. Then once the application is running and you +can make speed measurements, add whatever indexes are needed in order +to make it run faster.

When you add indexes, the query optimizer +within the SQL compiler is able to find new more efficient bytecode +procedures for carrying out the operations that your SQL statements +specify. In other words, by adding indexes late in the development +cycle you have the power to completely reorganize your data access +patterns without changing a single line of code.

Create +Indexes Automatically Using PRIMARY KEY and UNIQUE

Any +column of a table that is declared to be the PRIMARY KEY or that is +declared UNIQUE will be indexed automatically. There is no need to +create a separate index on that column using the CREATE INDEX statement. +So, for example, this table declaration:

+CREATE TABLE demo39a( + id INTEGER, + content BLOB +); + +CREATE INDEX demo39_idx1 ON demo39a(id); +

Is roughly equivalent to the following:

+CREATE TABLE demo39b( + id INTEGER UNIQUE, + content BLOB +); +

The two examples above are “roughly” equivalent, but +not exactly equivalent. Both tables have an index on the ID column. +In the first case, the index is created explicitly. In the second +case, the index is implied by the UNIQUE keyword in the type declaration +of the ID column. Both table designs use exactly the same amount of +disk space, and both will run queries such as

+SELECT content FROM demo39 WHERE id=? +

using exactly the same bytecode. The only difference +is that table demo39a lets you insert multiple rows with the same +ID whereas table demo39b will raise an exception if you try to insert +a new row with the same ID as an existing row.

If you use +the UNIQUE keyword in the CREATE INDEX statement of demo39a, like +this:

+CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id); +

Then both table designs really would be exactly the +same in every way. In fact, whenever SQLite sees the UNIQUE keyword +on a column type declaration, all it does is create an automatic unique +index on that column.

The PRIMARY KEY modifier on a column +type declaration works like UNIQUE; it causes a unique index to be +created automatically. The main difference is that you are only allowed +to have a single PRIMARY KEY. This restriction of only allowing a +single PRIMARY KEY is part of the official SQL language definition.

The idea is that a PRIMARY KEY is used to order the rows on disk. +Some SQL database engines actually implement PRIMARY KEYs this way. +But with SQLite, a PRIMARY KEY is like any other UNIQUE column, with +only one exception: INTEGER PRIMARY KEY is a special case which is +handled differently, as described in the next section.

Use +Multi-Column Indexes

SQLite is able to make use of multi-column +indexes. The rule is that if an index is over columns X 0 , X 1 , X 2 , ..., X n of some table, then the index can be used if the WHERE +clause contains equality constraints for some prefix of those columns X 0 , X 1 , X 2 , ..., X i where i is less than n.

As +an example, suppose you have a table and index declared as follows:

+CREATE TABLE demo314(a,b,c,d,e,f,g); +CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f); +

Then the index might be used to help with a query +that contained a WHERE clause like this:

+... WHERE a=1 AND b='Smith' AND c=1 +

All three terms of the WHERE clause would be used +together with the index in order to narrow the search. But the index +could not be used if there WHERE clause said:

+... WHERE b='Smith' AND c=1 +

The second WHERE clause does not contain equality +terms for a prefix of the columns in the index because it omits a +term for the “a” column.

In a case like this:

+... WHERE a=1 AND c=1 +

Only the “a=1” term in the WHERE clause could be used +to help narrow the search. The “c=1” term is not part of the prefix +of terms in the index which have equality constraints because there +is no equality constraint on the “b” column.

SQLite only allows +a single index to be used per table within a simple SQL statement. +For UPDATE and DELETE statements, this means that only a single index +can ever be used, since those statements can only operate on a single +table at a time.

In a simple SELECT statement multiple indexes +can be used if the SELECT statement is a join – one index per table +in the join. In a compound SELECT statement (two or more SELECT statements +connected by UNION or INTERSECT or EXCEPT) each SELECT statement is +treated separately and can have its own indexes. Likewise, SELECT +statements that appear in subexpressions are treated separately.

Some other SQL database engines (for example PostgreSQL) allow +multiple indexes to be used for each table in a SELECT. For example, +if you had a table and index in PostgreSQL like this:

+CREATE TABLE pg1(a INT, b INT, c INT, d INT); +CREATE INDEX pg1_ix1 ON pg1(a); +CREATE INDEX pg1_ix2 ON pg1(b); +CREATE INDEX pg1_ix3 ON pg1(c); +

And if you were to run a query like the following:

+SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99; +

Then PostgreSQL might attempt to optimize the query +by using all three indexes, one for each term of the WHERE clause.

SQLite does not work this way. SQLite is compelled to select +a single index to use in the query. It might select any of the three +indexes shown, depending on which one the optimizer things will give +the best speedup. But in every case it will only select a single index +and only a single term of the WHERE clause will be used.

SQLite +prefers to use a multi-column index such as this:

+CREATE INDEX pg1_ix_all ON pg1(a,b,c); +

If the pg1_ix_all index is available for use when +the SELECT statement above is prepared, SQLite will likely choose +it over any of the single-column indexes because the multi-column +index is able to make use of all 3 terms of the WHERE clause.

You can trick SQLite into using multiple indexes on the same +table by rewriting the query. Instead of the SELECT statement shown +above, if you rewrite it as this:

+SELECT d FROM pg1 WHERE RowID IN ( + SELECT RowID FROM pg1 WHERE a=5 + INTERSECT + SELECT RowID FROM pg1 WHERE b=11 + INTERSECT + SELECT RowID FROM pg1 WHERE c=99 +) +

Then each of the individual SELECT statements will +using a different single-column index and their results will be combined +by the outer SELECT statement to give the correct result. The other +SQL database engines like PostgreSQL that are able to make use of +multiple indexes per table do so by treating the simpler SELECT statement +shown first as if they where the more complicated SELECT statement +shown here.

Use +Inequality Constraints on the Last Index Term

Terms in +the WHERE clause of a query or UPDATE or DELETE statement are mostly +likely to trigger the use of an index if they are an equality constraint +– in other words if the term consists of the name of an indexed column, +an equal sign (“=”), and an expression.

So, for example, if +you have a table and index that look like this:

+CREATE TABLE demo315(a,b,c,d); +CREATE INDEX demo315_idx1 ON demo315(a,b,c); +

And a query like this:

+SELECT d FROM demo315 WHERE a=512; +

The single “a=512” term of the WHERE clause qualifies +as an equality constraint and is likely to provoke the use of the +demo315_idx1 index.

SQLite supports two other kinds of equality +constraints. One is the IN operator:

+SELECT d FROM demo315 WHERE a IN (512,1024); +SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable); +

There other is the IS NULL constraint:

+SELECT d FROM demo315 WHERE a IS NULL; +

SQLite allows at most one term of an index to be constrained +by an inequality such as less than “<”, greater than “>”, less +than or equal to “<=”, or greater than or equal to “>=”.

The column that the inequality constrains will be the right-most +term of the index that is used. So, for example, in this query:

+SELECT d FROM demo315 WHERE a=5 AND b>11 AND c=1; +

Only the first two terms of the WHERE clause will +be used with the demo315_idx1 index. The third term, the “c=1” constraint, +cannot be used because the “c” column occurs to the right of the “b” +column in the index and the “b” column is constrained by an inequality.

SQLite allows up to two inequalities on the same column as long +as the two inequalities provide an upper and lower bound on the column. +For example, in this query:

+SELECT d FROM demo315 WHERE a=5 AND b>11 AND b<23; +

All three terms of the WHERE clause will be used because +the two inequalities on the “b” column provide an upper and lower +bound on the value of “b”.

SQLite will only use the four inequalities +mentioned above to help constrain a search: “<”, “>”, “<=”, +and “>=”. Other inequality operators such as not equal to (“!=” or +“<>”) and NOT NULL are not helpful to the query optimizer and will +never be used to control an index and help make the query run faster.

Use +Indexes To Help ORDER BY Clauses Evaluate Faster

The default +method for evaluating an ORDER BY clause in a SELECT statement is +to first evaluate the SELECT statement and store the results in a +temporary tables, then sort the temporary table according to the ORDER +BY clause and scan the sorted temporary table to generate the final +output.

This method always works, but it requires three passes +over the data (one pass to generate the result set, a second pass +to sort the result set, and a third pass to output the results) and +it requires a temporary storage space sufficiently large to contain +the entire results set.

Where possible, SQLite will avoid +storing and sorting the result set by using an index that causes the +results to emerge from the query in sorted order in the first place.

The way to get SQLite to use an index for sorting is to provide +an index that covers the same columns specified in the ORDER BY clause. +For example, if the table and index are like this:

+CREATE TABLE demo316(a,b,c,data); +CREATE INDEX idx316 ON demo316(a,b,c); +

And you do a query like this:

+SELECT data FROM demo316 ORDER BY a,b,c; +

SQLite will use the idx316 index to implement the +ORDER BY clause, obviating the need for temporary storage space and +a separate sorting pass.

An index can be used to satisfy the +search constraints of a WHERE clause and to impose the ORDER BY ordering +of outputs all at once. The trick is for the ORDER BY clause terms +to occur immediately after the WHERE clause terms in the index. For +example, one can write:

+SELECT data FROM demo316 WHERE a=5 ORDER BY b,c; +

The “a” column is used in the WHERE clause and the +immediately following terms of the index, “b” and “c” are used in +the ORDER BY clause. So in this case the idx316 index would be used +both to speed up the search and to satisfy the ORDER BY clause.

This query also uses the idx316 index because, once again, the +ORDER BY clause term “c” immediate follows the WHERE clause terms +“a” and “b” in the index:

+SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c; +

But now consider this:

+SELECT data FROM demo316 WHERE a=5 ORDER BY c; +

Here there is a gap between the ORDER BY term “c” +and the WHERE clause term “a”. So the idx316 index cannot be used +to satisfy both the WHERE clause and the ORDER BY clause. The index +will be used on the WHERE clause and a separate sorting pass will +occur to put the results in the correct order.

Add +Result Columns To The End Of Indexes

Queries will sometimes +run faster if their result columns appear in the right-most entries +of an index. Consider the following example:

+CREATE TABLE demo317(a,b,c,data); +CREATE INDEX idx317 ON demo316(a,b,c); +

A query where all result column terms appears in the +index, such as

+SELECT c FROM demo317 WHERE a=5 ORDER BY b; +

will typically run about twice as fast or faster than +a query that uses columns that are not in the index, e.g.

+SELECT data FROM demo317 WHERE a=5 ORDER BY b; +

The reason for this is that when all information is +contained within the index entry only a single search has to be made +for each row of output. But when some of the information is in the +index and other parts are in the table, first there must be a search +for the appropriate index entry then a separate search is made for +the appropriate table row based on the RowID found in the index entry. +Twice as much searching has to be done for each row of output generated.

The extra query speed does not come for free, however. Adding +additional columns to an index makes the database file larger. So +when developing an application, the programmer will need to make a +space versus time trade-off to determine whether the extra columns +should be added to the index or not.

Note that if any column +of the result must be obtained from the original table, then the table +row will have to be searched for anyhow. There will be no speed advantage, +so you might as well omit the extra columns from the end of the index +and save on storage space. The speed-up described in this section +can only be realized when every column in a table is obtainable from +the index.

Taking into account the results of the previous +few sections, the best set of columns to put in an index can be described +as follows:

The first columns +in the index should be columns that have equality constraints in the +WHERE clause of the query.
The second group +of columns should match the columns specified in the ORDER BY clause.
Add additional +columns to the end of the index that are used in the result set of +the query.

Resolve +Indexing Ambiguities Using the Unary “+” Operator

The SQLite +query optimizer usually does a good job of choosing the best index +to use for a particular query, especially if ANALYZE has been run +to provide it with index performance statistics. But occasions do +arise where it is useful to give the optimizer hints.

One +of the easiest ways to control the operation of the optimizer is to +disqualify terms in the WHERE clause or ORDER BY clause as candidates +for optimization by using the unary “+” operator.

In SQLite, +a unary “+” operator is a no-op. It makes no change to its operand, +even if the operand is something other than a number. So you can always +prefix a “+” to an expression in without changing the meaning of the +expression. As the optimizer will only use terms in WHERE, HAVING, +or ON clauses that have an index column name on one side of a comparison +operator, you can prevent such a term from being used by the optimizer +by prefixing the column name with a “+”.

For example, suppose +you have a database with a schema like this:

+CREATE TABLE demo321(a,b,c,data); +CREATE INDEX idx321a ON demo321(a); +CREATE INDEX idx321b ON demo321(b); +

If you issue a query such as this:

+SELECT data FROM demo321 WHERE a=5 AND b=11; +

The query optimizer might use the “a=5” term with +idx321a or it might use the “b=11” term with the idx321b index. But +if you want to force the use of the idx321a index you can accomplish +that by disqualifying the second term of the WHERE clause as a candidate +for optimization using a unary “+” like this:

+SELECT data FROM demo321 WHERE a=5 AND +b=11; +

The “+” in front of the “b=11” turns the left-hand +side of the equals comparison operator into an expression instead +of an indexed column name. The optimizer will then not recognize that +the second term can be used with an index and so the optimizer is +compelled to use the first “a=5” term.

The unary “+” operator +can also be used to disable ORDER BY clause optimizations. Consider +this query:

+SELECT data FROM demo321 WHERE a=5 ORDER BY b; +

The optimizer has the choice of using the “a=5” term +of the WHERE clause with idx321a to restrict the search. Or it might +choose to use do a full table scan with idx321b to satisfy the ORDER +BY clause and thus avoid a separate sorting pass. You can force one +choice or the other using a unary “+”.

To force the use of +idx321a on the WHERE clause, add the unary “+” in from of the “b” +in the ORDER BY clause:

+SELECT data FROM demo321 WHERE a=5 ORDER BY +b; +

To go the other way and force the idx321b index to +be used to satisfy the ORDER BY clause, disqualify the WHERE term +by prefixing with a unary “+”:

+SELECT data FROM demo321 WHERE +a=5 ORDER BY b; +

The reader is cautioned not to overuse the unary “+” +operator. The SQLite query optimizer usually picks the best index +without any outside help. Premature use of unary “+” can confuse the +optimizer and cause less than optimal performance. But in some cases +it is useful to be able override the decisions of the optimizer, and +the unary “+” operator is an excellent way to do this when it becomes +necessary.

Avoid +Indexing Large BLOBs and CLOBs

SQLite stores indexes as +b-trees. Each b-tree node uses one page of the database file. In order +to maintain an acceptable fan-out, the b-tree module within SQLite +requires that at least 4 entries must fit on each page of a b-tree. +There is also some overhead associated with each b-tree page. So at +the most there is about 250 bytes of space available on the main b-tree +page for each index entry.

If an index entry exceeds this +allotment of approximately 250 bytes excess bytes are spilled to overflow +pages. There is no arbitrary limit on the number of overflow pages +or on the length of a b-tree entry, but for maximum efficiency it +is best to avoid overflow pages, especially in indexes. This means +that you should strive to keep the number of bytes in each index entry +below 250.

If you keep the size of indexes significantly smaller +than 250 bytes, then the b-tree fan-out is increased and the binary +search algorithm used to search for entries in an index has fewer +pages to examine and therefore runs faster. So the fewer bytes used +in each index entry the better, at least from a performance perspective.

For these reasons, it is recommended that you avoid indexing +large BLOBs and CLOBs. SQLite will continue to work when large BLOBs +and CLOBs are indexed, but there will be a performance impact.

On the other hand, if you need to lookup entries using a large +BLOB or CLOB as the key, then by all means use an index. An index +on a large BLOB or CLOB is not as fast as an index using more compact +data types such as integers, but it is still many order of magnitude +faster than doing a full table scan. So to be more precise, the advice +of this section is that you should design your applications so that +you do not need to lookup entries using a large BLOB or CLOB as the +key. Try to arrange to have compact keys consisting of short strings +or integers.

Note that many other SQL database engines disallow +the indexing of BLOBs and CLOBs in the first place. You simple cannot +do it. SQLite is more flexible that most in that it does allow BLOBs +and CLOBs to be indexed and it will use those indexes when appropriate. +But for maximum performance, it is best to use smaller search keys.

Avoid +Excess Indexes

Some developers approach SQL-based application +development with the attitude that indexes never hurt and that the +more indexes you have, the faster your application will run. This +is definitely not the case. There is a costs associated with each +new index you create:

Each new index +takes up additional space in the database file. The more indexes you +have, the larger your database files will become for the same amount +of data.
Every INSERT +and UPDATE statement modifies both the original table and all indexes +on that table. So the performance of INSERT and UPDATE decreases linearly +with the number of indexes.
Compiling new +SQL statements using Prepare() takes longer when +there are more indexes for the optimizer to choose between.
Surplus indexes +give the optimizer more opportunities to make a bad choice.

Your policy on indexes should be to avoid them wherever you +can. Indexes are powerful medicine and can work wonders to improve +the performance of a program. But just as too many drugs can be worse +than none at all, so also can too many indexes cause more harm than +good.

When building a new application, a good approach is +to omit all explicitly declared indexes in the beginning and only +add indexes as needed to address specific performance problems.

Take care to avoid redundant indexes. For example, consider this +schema:

+CREATE TABLE demo323a(a,b,c); +CREATE INDEX idx323a1 ON demo323(a); +CREATE INDEX idx323a2 ON demo323(a,b); +

The idx323a1 index is redundant and can be eliminated. +Anything that the idx323a1 index can do the idx323a2 index can do +better.

Other redundancies are not quite as apparent as the +above. Recall that any column or columns that are declared UNIQUE +or PRIMARY KEY (except for the special case of INTEGER PRIMARY KEY) +are automatically indexed. So in the following schema:

+CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE); +CREATE INDEX idx323b1 ON demo323b(x); +CREATE INDEX idx323b2 ON demo323b(y); +

Both indexes are redundant and can be eliminated with +no loss in query performance. Occasionally one sees a novice SQL programmer +use both UNIQUE and PRIMARY KEY on the same column:

+CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q); +

This has the effect of creating two indexes on the +“p” column – one for the UNIQUE keywords and another for the PRIMARY +KEY keyword. Both indexes are identical so clearly one can be omitted. +A PRIMARY KEY is guaranteed to always be unique so the UNIQUE keyword +can be removed from the demo323c table definition with no ambiguity +or loss of functionality.

It is not a fatal error to create +too many indexes or redundant indexes. SQLite will continue to generate +the correct answers but it may take longer to produce those answers +and the resulting database files might be a little larger. So for +best results, keep the number of indexes to a minimum.

Avoid +Tables and Indexes with an Excessive Number of Columns

SQLite places no arbitrary limits on the number of columns in a table +or index. There are known commercial applications using SQLite that +construct tables with tens of thousands of columns each. And these +applications actually work.

However the database engine is +optimized for the common case of tables with no more than a few dozen +columns. For best performance you should try to stay in the optimized +region. Furthermore, we note that relational databases with a large +number of columns are usually not well normalized. So even apart from +performance considerations, if you find your design has tables with +more than a dozen or so columns, you really need to rethink how you +are building your application.

There are a number of places +in Prepare() that run in time O(N²) where +N is the number of columns in the table. The constant of proportionality +is small in these cases so you should not have any problems for N +of less than one hundred but for N on the order of a thousand, the +time to run Prepare() can start to become noticeable.

When the bytecode is running and it needs to access the i-th +column of a table, the values of the previous i-1 columns must be +accessed first. So if you have a large number of columns, accessing +the last column can be an expensive operation. This fact also argues +for putting smaller and more frequently accessed columns early in +the table.

There are certain optimizations that will only +work if the table has 30 or fewer columns. The optimization that extracts +all necessary information from an index and never refers to the underlying +table works this way. So in some cases, keeping the number of columns +in a table at or below 30 can result in a 2-fold speed improvement.

Indexes will only be used if they contain 30 or fewer columns. +You can put as many columns in an index as you want, but if the number +is greater than 30, the index will never improve performance and will +never do anything but take up space in your database file.

+SQL +Overview +SQL +Server Guide +SQLite + +SQL +DB Overview +Avoid +Transient Tables +Prevent +Datafile Corruption + +SQL Insertion Tips +SQL +Schema Tips +SQL +Expressions +SQL +Statement Tips +SQL +Joins + ANALYZE +Command + SQL +WHERE CLause Tips

\ No newline at end of file