Symbian3/SDK/Source/GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita
changeset 7 51a74ef9ed63
child 13 48780e181b38
equal deleted inserted replaced
6:43e37759235e 7:51a74ef9ed63
       
     1 <?xml version="1.0" encoding="utf-8"?>
       
     2 <!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
       
     3 <!-- This component and the accompanying materials are made available under the terms of the License 
       
     4 "Eclipse Public License v1.0" which accompanies this distribution, 
       
     5 and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
       
     6 <!-- Initial Contributors:
       
     7     Nokia Corporation - initial contribution.
       
     8 Contributors: 
       
     9 -->
       
    10 <!DOCTYPE concept
       
    11   PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
       
    12 <concept id="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A" xml:lang="en"><title>SQL Index
       
    13 Tips</title><shortdesc>This document includes several tips for using SQL indexes.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
       
    14 <section id="GUID-3895F9D0-DE9C-4375-B541-AC99CABB7B8A"><title>Introduction</title> <p>You can use indexes to speed up access.
       
    15 You create indexes automatically using PRIMARY KEY and UNIQUE. </p> <p><b>Intended
       
    16 audience:</b> </p> <p>This document is intended to be used by Symbian platform
       
    17 licensees and third party application developers. </p> </section>
       
    18 <section id="GUID-765F0DF1-ACB0-57DB-B9A8-3697E4637065"><title>Use an Index
       
    19 to Speed up Access</title> <p>Suppose you have a table like this: </p> <codeblock id="GUID-F70B25AB-A151-52CE-A413-1C62A2464D6A" xml:space="preserve">
       
    20 CREATE TABLE demo5(
       
    21     id INTEGER,
       
    22     content BLOB
       
    23 );
       
    24 </codeblock> <p>Further suppose that this table contains thousands or millions
       
    25 of rows and you want to access a single row with a particular ID: </p> <codeblock id="GUID-B02FA452-4093-5383-BAFA-AE035919D720" xml:space="preserve">
       
    26 SELECT content FROM demo5 WHERE id=?
       
    27 </codeblock> <p>The only want that SQLite can perform this query, and be certain
       
    28 to get every row with the chosen ID, is to examine every single row, check
       
    29 the ID of that row, and return the content if the ID matches. Examining every
       
    30 single row this way is called a <i>full table scan</i>. </p> <p>Reading and
       
    31 checking every row of a large table can be very slow, so you want to avoid
       
    32 full table scans. The usual way to do this is to create an index on the column
       
    33 you are searching against. In the example above, an appropriate index would
       
    34 be this: </p> <codeblock id="GUID-82E337F1-2CA2-51B0-A7BC-071A83779A18" xml:space="preserve">
       
    35 CREATE INDEX demo5_idx1 ON demo5(id);
       
    36 </codeblock> <p>With an index on the ID column, SQLite is able to use a binary
       
    37 search to locate entries that contain a particular value of ID. So if the
       
    38 table contains a million rows, the query can be satisfied with about 20 accesses
       
    39 rather than 1000000 accesses. This is a huge performance improvement. </p> <p>One
       
    40 of the features of the SQL language is that you do not have to figure out
       
    41 what indexes you may need in advance of coding your application. It is perfectly
       
    42 acceptable, even preferable, to write the code for your application using
       
    43 a database without any indexes. Then once the application is running and you
       
    44 can make speed measurements, add whatever indexes are needed in order to make
       
    45 it run faster. </p> <p>When you add indexes, the query optimizer within the
       
    46 SQL compiler is able to find new more efficient bytecode procedures for carrying
       
    47 out the operations that your SQL statements specify. In other words, by adding
       
    48 indexes late in the development cycle you have the power to completely reorganize
       
    49 your data access patterns without changing a single line of code. </p> </section>
       
    50 <section id="GUID-BB1F17C5-1174-5DF4-AA61-611173237F3F"><title>Create Indexes
       
    51 Automatically Using PRIMARY KEY and UNIQUE</title> <p>Any column of a table
       
    52 that is declared to be the PRIMARY KEY or that is declared UNIQUE will be
       
    53 indexed automatically. There is no need to create a separate index on that
       
    54 column using the CREATE INDEX statement. So, for example, this table declaration: </p> <codeblock id="GUID-E4BE6077-F639-5CE7-964A-276B0D58A129" xml:space="preserve">
       
    55 CREATE TABLE demo39a(
       
    56     id INTEGER,
       
    57     content BLOB
       
    58 );
       
    59 
       
    60 CREATE INDEX demo39_idx1 ON demo39a(id);
       
    61 </codeblock> <p>Is roughly equivalent to the following: </p> <codeblock id="GUID-DB3167E0-FA95-50CA-92C7-102B5C2C13E3" xml:space="preserve">
       
    62 CREATE TABLE demo39b(
       
    63     id INTEGER UNIQUE,
       
    64     content BLOB
       
    65 );
       
    66 </codeblock> <p>The two examples above are “roughly” equivalent, but not exactly
       
    67 equivalent. Both tables have an index on the ID column. In the first case,
       
    68 the index is created explicitly. In the second case, the index is implied
       
    69 by the UNIQUE keyword in the type declaration of the ID column. Both table
       
    70 designs use exactly the same amount of disk space, and both will run queries
       
    71 such as </p> <codeblock id="GUID-7ACAE270-6D20-557B-B7D1-C90EDD757E43" xml:space="preserve">
       
    72 SELECT content FROM demo39 WHERE id=?
       
    73 </codeblock> <p>using exactly the same bytecode. The only difference is that
       
    74 table demo39a lets you insert multiple rows with the same ID whereas table
       
    75 demo39b will raise an exception if you try to insert a new row with the same
       
    76 ID as an existing row. </p> <p>If you use the UNIQUE keyword in the CREATE
       
    77 INDEX statement of demo39a, like this: </p> <codeblock id="GUID-0EE5E186-CC4A-5CC3-AEAE-F1482F1F8F9A" xml:space="preserve">
       
    78 CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id);
       
    79 </codeblock> <p>Then both table designs really would be exactly the same in
       
    80 every way. In fact, whenever SQLite sees the UNIQUE keyword on a column type
       
    81 declaration, all it does is create an automatic unique index on that column. </p> <p>The
       
    82 PRIMARY KEY modifier on a column type declaration works like UNIQUE; it causes
       
    83 a unique index to be created automatically. The main difference is that you
       
    84 are only allowed to have a single PRIMARY KEY. This restriction of only allowing
       
    85 a single PRIMARY KEY is part of the official SQL language definition. </p> <p>The
       
    86 idea is that a PRIMARY KEY is used to order the rows on disk. Some SQL database
       
    87 engines actually implement PRIMARY KEYs this way. But with SQLite, a PRIMARY
       
    88 KEY is like any other UNIQUE column, with only one exception: INTEGER PRIMARY
       
    89 KEY is a special case which is handled differently, as described in the next
       
    90 section. </p> </section>
       
    91 <section id="GUID-BF7A0301-8490-58ED-BB37-FAC403A84230"><title>Use Multi-Column
       
    92 Indexes</title> <p>SQLite is able to make use of multi-column indexes. The
       
    93 rule is that if an index is over columns <i>X</i>  <i> 0 </i>, <i>X</i>  <i> 1 </i>, <i>X</i>  <i> 2 </i>,
       
    94 ..., <i>X</i>  <i> n </i> of some table, then the index can be used if the
       
    95 WHERE clause contains equality constraints for some prefix of those columns <i>X</i>  <i>0 </i>, <i>X</i>  <i>1 </i>, <i>X</i>  <i>2 </i>,
       
    96 ..., <i>X</i>  <i>i </i> where <i>i</i> is less than <i>n</i>. </p> <p>As
       
    97 an example, suppose you have a table and index declared as follows: </p> <codeblock id="GUID-C18C97F7-23CA-5636-9F00-130A8FB3DEF5" xml:space="preserve">
       
    98 CREATE TABLE demo314(a,b,c,d,e,f,g);
       
    99 CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f);
       
   100 </codeblock> <p>Then the index might be used to help with a query that contained
       
   101 a WHERE clause like this: </p> <codeblock id="GUID-8A0944F4-1ACF-5267-B49F-EB83EFBB5670" xml:space="preserve">
       
   102 ... WHERE a=1 AND b='Smith' AND c=1
       
   103 </codeblock> <p>All three terms of the WHERE clause would be used together
       
   104 with the index in order to narrow the search. But the index could not be used
       
   105 if there WHERE clause said: </p> <codeblock id="GUID-B5F1C17F-0F5E-5FC2-A9A4-DF19D699A076" xml:space="preserve">
       
   106 ... WHERE b='Smith' AND c=1
       
   107 </codeblock> <p>The second WHERE clause does not contain equality terms for
       
   108 a prefix of the columns in the index because it omits a term for the “a” column. </p> <p>In
       
   109 a case like this: </p> <codeblock id="GUID-EF2CFE7D-0456-5414-847D-BADCC057CFD8" xml:space="preserve">
       
   110 ... WHERE a=1 AND c=1
       
   111 </codeblock> <p>Only the “a=1” term in the WHERE clause could be used to help
       
   112 narrow the search. The “c=1” term is not part of the prefix of terms in the
       
   113 index which have equality constraints because there is no equality constraint
       
   114 on the “b” column. </p> <p>SQLite only allows a single index to be used per
       
   115 table within a simple SQL statement. For UPDATE and DELETE statements, this
       
   116 means that only a single index can ever be used, since those statements can
       
   117 only operate on a single table at a time. </p> <p>In a simple SELECT statement
       
   118 multiple indexes can be used if the SELECT statement is a join – one index
       
   119 per table in the join. In a compound SELECT statement (two or more SELECT
       
   120 statements connected by UNION or INTERSECT or EXCEPT) each SELECT statement
       
   121 is treated separately and can have its own indexes. Likewise, SELECT statements
       
   122 that appear in subexpressions are treated separately. </p> <p>Some other SQL
       
   123 database engines (for example PostgreSQL) allow multiple indexes to be used
       
   124 for each table in a SELECT. For example, if you had a table and index in PostgreSQL
       
   125 like this: </p> <codeblock id="GUID-F5DE8F24-7471-5992-9896-295CE173D855" xml:space="preserve">
       
   126 CREATE TABLE pg1(a INT, b INT, c INT, d INT);
       
   127 CREATE INDEX pg1_ix1 ON pg1(a);
       
   128 CREATE INDEX pg1_ix2 ON pg1(b);
       
   129 CREATE INDEX pg1_ix3 ON pg1(c);
       
   130 </codeblock> <p>And if you were to run a query like the following: </p> <codeblock id="GUID-A35663B7-4E7D-5CC0-BF5E-CF3A4CFED63F" xml:space="preserve">
       
   131 SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99;
       
   132 </codeblock> <p>Then PostgreSQL might attempt to optimize the query by using
       
   133 all three indexes, one for each term of the WHERE clause. </p> <p>SQLite does
       
   134 not work this way. SQLite is compelled to select a single index to use in
       
   135 the query. It might select any of the three indexes shown, depending on which
       
   136 one the optimizer things will give the best speedup. But in every case it
       
   137 will only select a single index and only a single term of the WHERE clause
       
   138 will be used. </p> <p>SQLite prefers to use a multi-column index such as this: </p> <codeblock id="GUID-40FB7075-1239-5089-BBC5-0D994F4A0C39" xml:space="preserve">
       
   139 CREATE INDEX pg1_ix_all ON pg1(a,b,c);
       
   140 </codeblock> <p>If the pg1_ix_all index is available for use when the SELECT
       
   141 statement above is prepared, SQLite will likely choose it over any of the
       
   142 single-column indexes because the multi-column index is able to make use of
       
   143 all 3 terms of the WHERE clause. </p> <p>You can trick SQLite into using multiple
       
   144 indexes on the same table by rewriting the query. Instead of the SELECT statement
       
   145 shown above, if you rewrite it as this: </p> <codeblock id="GUID-D7DE75D4-BB01-50DF-A9DC-956A83DED5D0" xml:space="preserve">
       
   146 SELECT d FROM pg1 WHERE RowID IN (
       
   147     SELECT RowID FROM pg1 WHERE a=5
       
   148     INTERSECT
       
   149     SELECT RowID FROM pg1 WHERE b=11
       
   150     INTERSECT
       
   151     SELECT RowID FROM pg1 WHERE c=99
       
   152 )
       
   153 </codeblock> <p>Then each of the individual SELECT statements will using a
       
   154 different single-column index and their results will be combined by the outer
       
   155 SELECT statement to give the correct result. The other SQL database engines
       
   156 like PostgreSQL that are able to make use of multiple indexes per table do
       
   157 so by treating the simpler SELECT statement shown first as if they where the
       
   158 more complicated SELECT statement shown here. </p> </section>
       
   159 <section id="GUID-E90057A8-70B6-590C-B8AE-616DA25BB543"><title>Use Inequality
       
   160 Constraints on the Last Index Term</title> <p>Terms in the WHERE clause of
       
   161 a query or UPDATE or DELETE statement are mostly likely to trigger the use
       
   162 of an index if they are an equality constraint – in other words if the term
       
   163 consists of the name of an indexed column, an equal sign (“=”), and an expression. </p> <p>So,
       
   164 for example, if you have a table and index that look like this: </p> <codeblock id="GUID-84AADB9D-5853-57C2-B489-87DC7FB7AADE" xml:space="preserve">
       
   165 CREATE TABLE demo315(a,b,c,d);
       
   166 CREATE INDEX demo315_idx1 ON demo315(a,b,c);
       
   167 </codeblock> <p>And a query like this: </p> <codeblock id="GUID-A2B7DA9F-DB82-5D06-80E2-7AF714E403D5" xml:space="preserve">
       
   168 SELECT d FROM demo315 WHERE a=512;
       
   169 </codeblock> <p>The single “a=512” term of the WHERE clause qualifies as an
       
   170 equality constraint and is likely to provoke the use of the demo315_idx1 index. </p> <p>SQLite
       
   171 supports two other kinds of equality constraints. One is the IN operator: </p> <codeblock id="GUID-EA5D7637-A6B8-5BC0-A72E-D576B0F945A3" xml:space="preserve">
       
   172 SELECT d FROM demo315 WHERE a IN (512,1024);
       
   173 SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable);
       
   174 </codeblock> <p>There other is the IS NULL constraint: </p> <codeblock id="GUID-B2C1C84B-C33D-55C5-8484-24B28EFC8E37" xml:space="preserve">
       
   175 SELECT d FROM demo315 WHERE a IS NULL;
       
   176 </codeblock> <p>SQLite allows at most one term of an index to be constrained
       
   177 by an inequality such as less than “&lt;”, greater than “&gt;”, less than or
       
   178 equal to “&lt;=”, or greater than or equal to “&gt;=”. </p> <p>The column that
       
   179 the inequality constrains will be the right-most term of the index that is
       
   180 used. So, for example, in this query: </p> <codeblock id="GUID-563231B5-EC3A-57C2-BC6F-1A8129ADE308" xml:space="preserve">
       
   181 SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND c=1;
       
   182 </codeblock> <p>Only the first two terms of the WHERE clause will be used
       
   183 with the demo315_idx1 index. The third term, the “c=1” constraint, cannot
       
   184 be used because the “c” column occurs to the right of the “b” column in the
       
   185 index and the “b” column is constrained by an inequality. </p> <p>SQLite allows
       
   186 up to two inequalities on the same column as long as the two inequalities
       
   187 provide an upper and lower bound on the column. For example, in this query: </p> <codeblock id="GUID-4EB94886-EDFF-58F2-8692-011A67AC5A60" xml:space="preserve">
       
   188 SELECT d FROM demo315 WHERE a=5 AND b&gt;11 AND b&lt;23;
       
   189 </codeblock> <p>All three terms of the WHERE clause will be used because the
       
   190 two inequalities on the “b” column provide an upper and lower bound on the
       
   191 value of “b”. </p> <p>SQLite will only use the four inequalities mentioned
       
   192 above to help constrain a search: “&lt;”, “&gt;”, “&lt;=”, and “&gt;=”. Other inequality
       
   193 operators such as not equal to (“!=” or “&lt;&gt;”) and NOT NULL are not helpful
       
   194 to the query optimizer and will never be used to control an index and help
       
   195 make the query run faster. </p> </section>
       
   196 <section id="GUID-CAD0C181-37E7-578A-A7E1-7843447C247F"><title>Use Indexes
       
   197 To Help ORDER BY Clauses Evaluate Faster</title> <p>The default method for
       
   198 evaluating an ORDER BY clause in a SELECT statement is to first evaluate the
       
   199 SELECT statement and store the results in a temporary tables, then sort the
       
   200 temporary table according to the ORDER BY clause and scan the sorted temporary
       
   201 table to generate the final output. </p> <p>This method always works, but
       
   202 it requires three passes over the data (one pass to generate the result set,
       
   203 a second pass to sort the result set, and a third pass to output the results)
       
   204 and it requires a temporary storage space sufficiently large to contain the
       
   205 entire results set. </p> <p>Where possible, SQLite will avoid storing and
       
   206 sorting the result set by using an index that causes the results to emerge
       
   207 from the query in sorted order in the first place. </p> <p>The way to get
       
   208 SQLite to use an index for sorting is to provide an index that covers the
       
   209 same columns specified in the ORDER BY clause. For example, if the table and
       
   210 index are like this: </p> <codeblock id="GUID-F0103033-C5C8-5177-8AD7-70BCC45C33C9" xml:space="preserve">
       
   211 CREATE TABLE demo316(a,b,c,data);
       
   212 CREATE INDEX idx316 ON demo316(a,b,c);
       
   213 </codeblock> <p>And you do a query like this: </p> <codeblock id="GUID-D67BB6FF-E213-5B86-A2C1-E1992DA96A62" xml:space="preserve">
       
   214 SELECT data FROM demo316 ORDER BY a,b,c;
       
   215 </codeblock> <p>SQLite will use the idx316 index to implement the ORDER BY
       
   216 clause, obviating the need for temporary storage space and a separate sorting
       
   217 pass. </p> <p>An index can be used to satisfy the search constraints of a
       
   218 WHERE clause and to impose the ORDER BY ordering of outputs all at once. The
       
   219 trick is for the ORDER BY clause terms to occur immediately after the WHERE
       
   220 clause terms in the index. For example, one can write: </p> <codeblock id="GUID-02063968-34B5-5766-9D02-86D696D39C1E" xml:space="preserve">
       
   221 SELECT data FROM demo316 WHERE a=5 ORDER BY b,c;
       
   222 </codeblock> <p>The “a” column is used in the WHERE clause and the immediately
       
   223 following terms of the index, “b” and “c” are used in the ORDER BY clause.
       
   224 So in this case the idx316 index would be used both to speed up the search
       
   225 and to satisfy the ORDER BY clause. </p> <p>This query also uses the idx316
       
   226 index because, once again, the ORDER BY clause term “c” immediate follows
       
   227 the WHERE clause terms “a” and “b” in the index: </p> <codeblock id="GUID-6760EC7E-E86A-5EBD-BDDD-32A68BE78A9E" xml:space="preserve">
       
   228 SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c;
       
   229 </codeblock> <p>But now consider this: </p> <codeblock id="GUID-9363996C-8C30-5E04-B05F-392C8262F1F6" xml:space="preserve">
       
   230 SELECT data FROM demo316 WHERE a=5 ORDER BY c;
       
   231 </codeblock> <p>Here there is a gap between the ORDER BY term “c” and the
       
   232 WHERE clause term “a”. So the idx316 index cannot be used to satisfy both
       
   233 the WHERE clause and the ORDER BY clause. The index will be used on the WHERE
       
   234 clause and a separate sorting pass will occur to put the results in the correct
       
   235 order. </p> </section>
       
   236 <section id="GUID-109AF0DA-A054-504A-A432-76BD145B2AC4"><title>Add Result
       
   237 Columns To The End Of Indexes</title> <p>Queries will sometimes run faster
       
   238 if their result columns appear in the right-most entries of an index. Consider
       
   239 the following example: </p> <codeblock id="GUID-63292052-B523-5671-B3EE-E10A66C7275F" xml:space="preserve">
       
   240 CREATE TABLE demo317(a,b,c,data);
       
   241 CREATE INDEX idx317 ON demo316(a,b,c);
       
   242 </codeblock> <p>A query where all result column terms appears in the index,
       
   243 such as </p> <codeblock id="GUID-41F740E7-EAFC-583B-BFE6-E63DBEA354D7" xml:space="preserve">
       
   244 SELECT c FROM demo317 WHERE a=5 ORDER BY b;
       
   245 </codeblock> <p>will typically run about twice as fast or faster than a query
       
   246 that uses columns that are not in the index, e.g. </p> <codeblock id="GUID-098752F4-304A-5A84-834E-240D97D97C2D" xml:space="preserve">
       
   247 SELECT data FROM demo317 WHERE a=5 ORDER BY b;
       
   248 </codeblock> <p>The reason for this is that when all information is contained
       
   249 within the index entry only a single search has to be made for each row of
       
   250 output. But when some of the information is in the index and other parts are
       
   251 in the table, first there must be a search for the appropriate index entry
       
   252 then a separate search is made for the appropriate table row based on the
       
   253 RowID found in the index entry. Twice as much searching has to be done for
       
   254 each row of output generated. </p> <p>The extra query speed does not come
       
   255 for free, however. Adding additional columns to an index makes the database
       
   256 file larger. So when developing an application, the programmer will need to
       
   257 make a space versus time trade-off to determine whether the extra columns
       
   258 should be added to the index or not. </p> <p>Note that if any column of the
       
   259 result must be obtained from the original table, then the table row will have
       
   260 to be searched for anyhow. There will be no speed advantage, so you might
       
   261 as well omit the extra columns from the end of the index and save on storage
       
   262 space. The speed-up described in this section can only be realized when every
       
   263 column in a table is obtainable from the index. </p> <p>Taking into account
       
   264 the results of the previous few sections, the best set of columns to put in
       
   265 an index can be described as follows: </p> <ul>
       
   266 <li id="GUID-EBF4DEFB-2F5F-5D78-92FA-06FEAB0C3650"><p>The first columns in
       
   267 the index should be columns that have equality constraints in the WHERE clause
       
   268 of the query. </p> </li>
       
   269 <li id="GUID-E5CB725C-6304-5946-9E18-E69B5F1A6A88"><p>The second group of
       
   270 columns should match the columns specified in the ORDER BY clause. </p> </li>
       
   271 <li id="GUID-FBC00251-C3AD-5AC0-9102-EF66EA37DE4E"><p>Add additional columns
       
   272 to the end of the index that are used in the result set of the query. </p> </li>
       
   273 </ul> </section>
       
   274 <section id="GUID-D7B5B389-E031-5512-8186-235B22F0D9C1"><title>Resolve Indexing
       
   275 Ambiguities Using the Unary “+” Operator</title> <p>The SQLite query optimizer
       
   276 usually does a good job of choosing the best index to use for a particular
       
   277 query, especially if ANALYZE has been run to provide it with index performance
       
   278 statistics. But occasions do arise where it is useful to give the optimizer
       
   279 hints. </p> <p>One of the easiest ways to control the operation of the optimizer
       
   280 is to disqualify terms in the WHERE clause or ORDER BY clause as candidates
       
   281 for optimization by using the unary “+” operator. </p> <p>In SQLite, a unary
       
   282 “+” operator is a no-op. It makes no change to its operand, even if the operand
       
   283 is something other than a number. So you can always prefix a “+” to an expression
       
   284 in without changing the meaning of the expression. As the optimizer will only
       
   285 use terms in WHERE, HAVING, or ON clauses that have an index column name on
       
   286 one side of a comparison operator, you can prevent such a term from being
       
   287 used by the optimizer by prefixing the column name with a “+”. </p> <p>For
       
   288 example, suppose you have a database with a schema like this: </p> <codeblock id="GUID-E7747EFD-FE58-5EA4-88B3-097C0A303F52" xml:space="preserve">
       
   289 CREATE TABLE demo321(a,b,c,data);
       
   290 CREATE INDEX idx321a ON demo321(a);
       
   291 CREATE INDEX idx321b ON demo321(b);
       
   292 </codeblock> <p>If you issue a query such as this: </p> <codeblock id="GUID-87BD59FC-33A8-598B-B91F-607B26F7349D" xml:space="preserve">
       
   293 SELECT data FROM demo321 WHERE a=5 AND b=11;
       
   294 </codeblock> <p>The query optimizer might use the “a=5” term with idx321a
       
   295 or it might use the “b=11” term with the idx321b index. But if you want to
       
   296 force the use of the idx321a index you can accomplish that by disqualifying
       
   297 the second term of the WHERE clause as a candidate for optimization using
       
   298 a unary “+” like this: </p> <codeblock id="GUID-E6EAB459-726A-5FE4-8065-6C46AC2C5B5C" xml:space="preserve">
       
   299 SELECT data FROM demo321 WHERE a=5 AND +b=11;
       
   300 </codeblock> <p>The “+” in front of the “b=11” turns the left-hand side of
       
   301 the equals comparison operator into an expression instead of an indexed column
       
   302 name. The optimizer will then not recognize that the second term can be used
       
   303 with an index and so the optimizer is compelled to use the first “a=5” term. </p> <p>The
       
   304 unary “+” operator can also be used to disable ORDER BY clause optimizations.
       
   305 Consider this query: </p> <codeblock id="GUID-0488D466-77B7-50E0-AB85-FF033A2D75DC" xml:space="preserve">
       
   306 SELECT data FROM demo321 WHERE a=5 ORDER BY b;
       
   307 </codeblock> <p>The optimizer has the choice of using the “a=5” term of the
       
   308 WHERE clause with idx321a to restrict the search. Or it might choose to use
       
   309 do a full table scan with idx321b to satisfy the ORDER BY clause and thus
       
   310 avoid a separate sorting pass. You can force one choice or the other using
       
   311 a unary “+”. </p> <p>To force the use of idx321a on the WHERE clause, add
       
   312 the unary “+” in from of the “b” in the ORDER BY clause: </p> <codeblock id="GUID-E55A085F-D91F-58E0-B964-317BB3A9D7ED" xml:space="preserve">
       
   313 SELECT data FROM demo321 WHERE a=5 ORDER BY +b;
       
   314 </codeblock> <p>To go the other way and force the idx321b index to be used
       
   315 to satisfy the ORDER BY clause, disqualify the WHERE term by prefixing with
       
   316 a unary “+”: </p> <codeblock id="GUID-D97EF52A-1F74-57EB-AC11-7911B4E088B3" xml:space="preserve">
       
   317 SELECT data FROM demo321 WHERE +a=5 ORDER BY b;
       
   318 </codeblock> <p>The reader is cautioned not to overuse the unary “+” operator.
       
   319 The SQLite query optimizer usually picks the best index without any outside
       
   320 help. Premature use of unary “+” can confuse the optimizer and cause less
       
   321 than optimal performance. But in some cases it is useful to be able override
       
   322 the decisions of the optimizer, and the unary “+” operator is an excellent
       
   323 way to do this when it becomes necessary. </p> </section>
       
   324 <section id="GUID-7BEBC49C-0528-5D58-9626-2A92F3D0D9E8"><title>Avoid Indexing
       
   325 Large BLOBs and CLOBs</title> <p>SQLite stores indexes as b-trees. Each b-tree
       
   326 node uses one page of the database file. In order to maintain an acceptable
       
   327 fan-out, the b-tree module within SQLite requires that at least 4 entries
       
   328 must fit on each page of a b-tree. There is also some overhead associated
       
   329 with each b-tree page. So at the most there is about 250 bytes of space available
       
   330 on the main b-tree page for each index entry. </p> <p>If an index entry exceeds
       
   331 this allotment of approximately 250 bytes excess bytes are spilled to overflow
       
   332 pages. There is no arbitrary limit on the number of overflow pages or on the
       
   333 length of a b-tree entry, but for maximum efficiency it is best to avoid overflow
       
   334 pages, especially in indexes. This means that you should strive to keep the
       
   335 number of bytes in each index entry below 250. </p> <p>If you keep the size
       
   336 of indexes significantly smaller than 250 bytes, then the b-tree fan-out is
       
   337 increased and the binary search algorithm used to search for entries in an
       
   338 index has fewer pages to examine and therefore runs faster. So the fewer bytes
       
   339 used in each index entry the better, at least from a performance perspective. </p> <p>For
       
   340 these reasons, it is recommended that you avoid indexing large BLOBs and CLOBs.
       
   341 SQLite will continue to work when large BLOBs and CLOBs are indexed, but there
       
   342 will be a performance impact. </p> <p>On the other hand, if you need to lookup
       
   343 entries using a large BLOB or CLOB as the key, then by all means use an index.
       
   344 An index on a large BLOB or CLOB is not as fast as an index using more compact
       
   345 data types such as integers, but it is still many order of magnitude faster
       
   346 than doing a full table scan. So to be more precise, the advice of this section
       
   347 is that you should design your applications so that you do not need to lookup
       
   348 entries using a large BLOB or CLOB as the key. Try to arrange to have compact
       
   349 keys consisting of short strings or integers. </p> <p>Note that many other
       
   350 SQL database engines disallow the indexing of BLOBs and CLOBs in the first
       
   351 place. You simple cannot do it. SQLite is more flexible that most in that
       
   352 it does allow BLOBs and CLOBs to be indexed and it will use those indexes
       
   353 when appropriate. But for maximum performance, it is best to use smaller search
       
   354 keys. </p> </section>
       
   355 <section id="GUID-DD40F29F-DF93-536E-9B52-F9B9FF45155D"><title>Avoid Excess
       
   356 Indexes</title> <p>Some developers approach SQL-based application development
       
   357 with the attitude that indexes never hurt and that the more indexes you have,
       
   358 the faster your application will run. This is definitely not the case. There
       
   359 is a costs associated with each new index you create: </p> <ul>
       
   360 <li id="GUID-FD257BF7-F938-54B5-AC03-9536712D6281"><p>Each new index takes
       
   361 up additional space in the database file. The more indexes you have, the larger
       
   362 your database files will become for the same amount of data. </p> </li>
       
   363 <li id="GUID-E1B74FB6-246A-5148-AF06-04E1B4B949F1"><p>Every INSERT and UPDATE
       
   364 statement modifies both the original table and all indexes on that table.
       
   365 So the performance of INSERT and UPDATE decreases linearly with the number
       
   366 of indexes. </p> </li>
       
   367 <li id="GUID-56AAE2D1-71D6-5A23-8190-B0C80B204DED"><p>Compiling new SQL statements
       
   368 using <codeph>Prepare()</codeph> takes longer when there are more indexes
       
   369 for the optimizer to choose between. </p> </li>
       
   370 <li id="GUID-24B7F7D8-FAA9-5C78-B3C7-B886FA774C0B"><p>Surplus indexes give
       
   371 the optimizer more opportunities to make a bad choice. </p> </li>
       
   372 </ul> <p>Your policy on indexes should be to avoid them wherever you can.
       
   373 Indexes are powerful medicine and can work wonders to improve the performance
       
   374 of a program. But just as too many drugs can be worse than none at all, so
       
   375 also can too many indexes cause more harm than good. </p> <p>When building
       
   376 a new application, a good approach is to omit all explicitly declared indexes
       
   377 in the beginning and only add indexes as needed to address specific performance
       
   378 problems. </p> <p>Take care to avoid redundant indexes. For example, consider
       
   379 this schema: </p> <codeblock id="GUID-89F20101-1628-5783-82B0-2ABE84078C7D" xml:space="preserve">
       
   380 CREATE TABLE demo323a(a,b,c);
       
   381 CREATE INDEX idx323a1 ON demo323(a);
       
   382 CREATE INDEX idx323a2 ON demo323(a,b);
       
   383 </codeblock> <p>The idx323a1 index is redundant and can be eliminated. Anything
       
   384 that the idx323a1 index can do the idx323a2 index can do better. </p> <p>Other
       
   385 redundancies are not quite as apparent as the above. Recall that any column
       
   386 or columns that are declared UNIQUE or PRIMARY KEY (except for the special
       
   387 case of INTEGER PRIMARY KEY) are automatically indexed. So in the following
       
   388 schema: </p> <codeblock id="GUID-2FE7B726-4027-518C-9217-B4BD1ECDA991" xml:space="preserve">
       
   389 CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE);
       
   390 CREATE INDEX idx323b1 ON demo323b(x);
       
   391 CREATE INDEX idx323b2 ON demo323b(y);
       
   392 </codeblock> <p>Both indexes are redundant and can be eliminated with no loss
       
   393 in query performance. Occasionally one sees a novice SQL programmer use both
       
   394 UNIQUE and PRIMARY KEY on the same column: </p> <codeblock id="GUID-CDE12649-BDB4-58D4-8981-02628BDF5C79" xml:space="preserve">
       
   395 CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q);
       
   396 </codeblock> <p>This has the effect of creating two indexes on the “p” column
       
   397 – one for the UNIQUE keywords and another for the PRIMARY KEY keyword. Both
       
   398 indexes are identical so clearly one can be omitted. A PRIMARY KEY is guaranteed
       
   399 to always be unique so the UNIQUE keyword can be removed from the demo323c
       
   400 table definition with no ambiguity or loss of functionality. </p> <p>It is
       
   401 not a fatal error to create too many indexes or redundant indexes. SQLite
       
   402 will continue to generate the correct answers but it may take longer to produce
       
   403 those answers and the resulting database files might be a little larger. So
       
   404 for best results, keep the number of indexes to a minimum. </p> </section>
       
   405 <section id="GUID-9337E315-BB5A-56D0-8319-6C398D26151F"><title>Avoid Tables
       
   406 and Indexes with an Excessive Number of Columns</title> <p>SQLite places no
       
   407 arbitrary limits on the number of columns in a table or index. There are known
       
   408 commercial applications using SQLite that construct tables with tens of thousands
       
   409 of columns each. And these applications actually work. </p> <p>However the
       
   410 database engine is optimized for the common case of tables with no more than
       
   411 a few dozen columns. For best performance you should try to stay in the optimized
       
   412 region. Furthermore, we note that relational databases with a large number
       
   413 of columns are usually not well normalized. So even apart from performance
       
   414 considerations, if you find your design has tables with more than a dozen
       
   415 or so columns, you really need to rethink how you are building your application. </p> <p>There
       
   416 are a number of places in <codeph>Prepare()</codeph> that run in time O(N<sup>2</sup>)
       
   417 where N is the number of columns in the table. The constant of proportionality
       
   418 is small in these cases so you should not have any problems for N of less
       
   419 than one hundred but for N on the order of a thousand, the time to run <codeph>Prepare()</codeph> can
       
   420 start to become noticeable. </p> <p>When the bytecode is running and it needs
       
   421 to access the i-th column of a table, the values of the previous i-1 columns
       
   422 must be accessed first. So if you have a large number of columns, accessing
       
   423 the last column can be an expensive operation. This fact also argues for putting
       
   424 smaller and more frequently accessed columns early in the table. </p> <p>There
       
   425 are certain optimizations that will only work if the table has 30 or fewer
       
   426 columns. The optimization that extracts all necessary information from an
       
   427 index and never refers to the underlying table works this way. So in some
       
   428 cases, keeping the number of columns in a table at or below 30 can result
       
   429 in a 2-fold speed improvement. </p> <p>Indexes will only be used if they contain
       
   430 30 or fewer columns. You can put as many columns in an index as you want,
       
   431 but if the number is greater than 30, the index will never improve performance
       
   432 and will never do anything but take up space in your database file. </p> </section>
       
   433 </conbody><related-links>
       
   434 <link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext>
       
   435 </link>
       
   436 <link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server
       
   437 Guide</linktext></link>
       
   438 <link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext>
       
   439 </link>
       
   440 <link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext>
       
   441 </link>
       
   442 <link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
       
   443 Tables</linktext></link>
       
   444 <link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile
       
   445 Corruption</linktext></link>
       
   446 <link><linktext/></link>
       
   447 <link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion
       
   448 Tips</linktext></link>
       
   449 <link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
       
   450 Tips</linktext></link>
       
   451 <link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext>
       
   452 </link>
       
   453 <link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
       
   454 Tips</linktext></link>
       
   455 <link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL Joins</linktext>
       
   456 </link>
       
   457 <link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
       
   458 Command</linktext></link>
       
   459 <link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE
       
   460 CLause Tips</linktext></link>
       
   461 </related-links></concept>