|
1 <?xml version="1.0" encoding="utf-8"?> |
|
2 <!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. --> |
|
3 <!-- This component and the accompanying materials are made available under the terms of the License |
|
4 "Eclipse Public License v1.0" which accompanies this distribution, |
|
5 and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". --> |
|
6 <!-- Initial Contributors: |
|
7 Nokia Corporation - initial contribution. |
|
8 Contributors: |
|
9 --> |
|
10 <!DOCTYPE concept |
|
11 PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
|
12 <concept id="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A" xml:lang="en"><title>SQL Index |
|
13 Tips</title><shortdesc>This document includes several tips for using SQL indexes.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody> |
|
14 <section id="GUID-3895F9D0-DE9C-4375-B541-AC99CABB7B8A"><title>Introduction</title> <p>You can use indexes to speed up access. |
|
15 You create indexes automatically using PRIMARY KEY and UNIQUE. </p> <p><b>Intended |
|
16 audience:</b> </p> <p>This document is intended to be used by Symbian platform |
|
17 licensees and third party application developers. </p> </section> |
|
18 <section id="GUID-765F0DF1-ACB0-57DB-B9A8-3697E4637065"><title>Use an Index |
|
19 to Speed up Access</title> <p>Suppose you have a table like this: </p> <codeblock id="GUID-F70B25AB-A151-52CE-A413-1C62A2464D6A" xml:space="preserve"> |
|
20 CREATE TABLE demo5( |
|
21 id INTEGER, |
|
22 content BLOB |
|
23 ); |
|
24 </codeblock> <p>Further suppose that this table contains thousands or millions |
|
25 of rows and you want to access a single row with a particular ID: </p> <codeblock id="GUID-B02FA452-4093-5383-BAFA-AE035919D720" xml:space="preserve"> |
|
26 SELECT content FROM demo5 WHERE id=? |
|
27 </codeblock> <p>The only want that SQLite can perform this query, and be certain |
|
28 to get every row with the chosen ID, is to examine every single row, check |
|
29 the ID of that row, and return the content if the ID matches. Examining every |
|
30 single row this way is called a <i>full table scan</i>. </p> <p>Reading and |
|
31 checking every row of a large table can be very slow, so you want to avoid |
|
32 full table scans. The usual way to do this is to create an index on the column |
|
33 you are searching against. In the example above, an appropriate index would |
|
34 be this: </p> <codeblock id="GUID-82E337F1-2CA2-51B0-A7BC-071A83779A18" xml:space="preserve"> |
|
35 CREATE INDEX demo5_idx1 ON demo5(id); |
|
36 </codeblock> <p>With an index on the ID column, SQLite is able to use a binary |
|
37 search to locate entries that contain a particular value of ID. So if the |
|
38 table contains a million rows, the query can be satisfied with about 20 accesses |
|
39 rather than 1000000 accesses. This is a huge performance improvement. </p> <p>One |
|
40 of the features of the SQL language is that you do not have to figure out |
|
41 what indexes you may need in advance of coding your application. It is perfectly |
|
42 acceptable, even preferable, to write the code for your application using |
|
43 a database without any indexes. Then once the application is running and you |
|
44 can make speed measurements, add whatever indexes are needed in order to make |
|
45 it run faster. </p> <p>When you add indexes, the query optimizer within the |
|
46 SQL compiler is able to find new more efficient bytecode procedures for carrying |
|
47 out the operations that your SQL statements specify. In other words, by adding |
|
48 indexes late in the development cycle you have the power to completely reorganize |
|
49 your data access patterns without changing a single line of code. </p> </section> |
|
50 <section id="GUID-BB1F17C5-1174-5DF4-AA61-611173237F3F"><title>Create Indexes |
|
51 Automatically Using PRIMARY KEY and UNIQUE</title> <p>Any column of a table |
|
52 that is declared to be the PRIMARY KEY or that is declared UNIQUE will be |
|
53 indexed automatically. There is no need to create a separate index on that |
|
54 column using the CREATE INDEX statement. So, for example, this table declaration: </p> <codeblock id="GUID-E4BE6077-F639-5CE7-964A-276B0D58A129" xml:space="preserve"> |
|
55 CREATE TABLE demo39a( |
|
56 id INTEGER, |
|
57 content BLOB |
|
58 ); |
|
59 |
|
60 CREATE INDEX demo39_idx1 ON demo39a(id); |
|
61 </codeblock> <p>Is roughly equivalent to the following: </p> <codeblock id="GUID-DB3167E0-FA95-50CA-92C7-102B5C2C13E3" xml:space="preserve"> |
|
62 CREATE TABLE demo39b( |
|
63 id INTEGER UNIQUE, |
|
64 content BLOB |
|
65 ); |
|
66 </codeblock> <p>The two examples above are “roughly” equivalent, but not exactly |
|
67 equivalent. Both tables have an index on the ID column. In the first case, |
|
68 the index is created explicitly. In the second case, the index is implied |
|
69 by the UNIQUE keyword in the type declaration of the ID column. Both table |
|
70 designs use exactly the same amount of disk space, and both will run queries |
|
71 such as </p> <codeblock id="GUID-7ACAE270-6D20-557B-B7D1-C90EDD757E43" xml:space="preserve"> |
|
72 SELECT content FROM demo39 WHERE id=? |
|
73 </codeblock> <p>using exactly the same bytecode. The only difference is that |
|
74 table demo39a lets you insert multiple rows with the same ID whereas table |
|
75 demo39b will raise an exception if you try to insert a new row with the same |
|
76 ID as an existing row. </p> <p>If you use the UNIQUE keyword in the CREATE |
|
77 INDEX statement of demo39a, like this: </p> <codeblock id="GUID-0EE5E186-CC4A-5CC3-AEAE-F1482F1F8F9A" xml:space="preserve"> |
|
78 CREATE UNIQUE INDEX demo39_idx1 ON demo39a(id); |
|
79 </codeblock> <p>Then both table designs really would be exactly the same in |
|
80 every way. In fact, whenever SQLite sees the UNIQUE keyword on a column type |
|
81 declaration, all it does is create an automatic unique index on that column. </p> <p>The |
|
82 PRIMARY KEY modifier on a column type declaration works like UNIQUE; it causes |
|
83 a unique index to be created automatically. The main difference is that you |
|
84 are only allowed to have a single PRIMARY KEY. This restriction of only allowing |
|
85 a single PRIMARY KEY is part of the official SQL language definition. </p> <p>The |
|
86 idea is that a PRIMARY KEY is used to order the rows on disk. Some SQL database |
|
87 engines actually implement PRIMARY KEYs this way. But with SQLite, a PRIMARY |
|
88 KEY is like any other UNIQUE column, with only one exception: INTEGER PRIMARY |
|
89 KEY is a special case which is handled differently, as described in the next |
|
90 section. </p> </section> |
|
91 <section id="GUID-BF7A0301-8490-58ED-BB37-FAC403A84230"><title>Use Multi-Column |
|
92 Indexes</title> <p>SQLite is able to make use of multi-column indexes. The |
|
93 rule is that if an index is over columns <i>X</i> <i> 0 </i>, <i>X</i> <i> 1 </i>, <i>X</i> <i> 2 </i>, |
|
94 ..., <i>X</i> <i> n </i> of some table, then the index can be used if the |
|
95 WHERE clause contains equality constraints for some prefix of those columns <i>X</i> <i>0 </i>, <i>X</i> <i>1 </i>, <i>X</i> <i>2 </i>, |
|
96 ..., <i>X</i> <i>i </i> where <i>i</i> is less than <i>n</i>. </p> <p>As |
|
97 an example, suppose you have a table and index declared as follows: </p> <codeblock id="GUID-C18C97F7-23CA-5636-9F00-130A8FB3DEF5" xml:space="preserve"> |
|
98 CREATE TABLE demo314(a,b,c,d,e,f,g); |
|
99 CREATE INDEX demo314_idx ON demo314(a,b,c,d,e,f); |
|
100 </codeblock> <p>Then the index might be used to help with a query that contained |
|
101 a WHERE clause like this: </p> <codeblock id="GUID-8A0944F4-1ACF-5267-B49F-EB83EFBB5670" xml:space="preserve"> |
|
102 ... WHERE a=1 AND b='Smith' AND c=1 |
|
103 </codeblock> <p>All three terms of the WHERE clause would be used together |
|
104 with the index in order to narrow the search. But the index could not be used |
|
105 if there WHERE clause said: </p> <codeblock id="GUID-B5F1C17F-0F5E-5FC2-A9A4-DF19D699A076" xml:space="preserve"> |
|
106 ... WHERE b='Smith' AND c=1 |
|
107 </codeblock> <p>The second WHERE clause does not contain equality terms for |
|
108 a prefix of the columns in the index because it omits a term for the “a” column. </p> <p>In |
|
109 a case like this: </p> <codeblock id="GUID-EF2CFE7D-0456-5414-847D-BADCC057CFD8" xml:space="preserve"> |
|
110 ... WHERE a=1 AND c=1 |
|
111 </codeblock> <p>Only the “a=1” term in the WHERE clause could be used to help |
|
112 narrow the search. The “c=1” term is not part of the prefix of terms in the |
|
113 index which have equality constraints because there is no equality constraint |
|
114 on the “b” column. </p> <p>SQLite only allows a single index to be used per |
|
115 table within a simple SQL statement. For UPDATE and DELETE statements, this |
|
116 means that only a single index can ever be used, since those statements can |
|
117 only operate on a single table at a time. </p> <p>In a simple SELECT statement |
|
118 multiple indexes can be used if the SELECT statement is a join – one index |
|
119 per table in the join. In a compound SELECT statement (two or more SELECT |
|
120 statements connected by UNION or INTERSECT or EXCEPT) each SELECT statement |
|
121 is treated separately and can have its own indexes. Likewise, SELECT statements |
|
122 that appear in subexpressions are treated separately. </p> <p>Some other SQL |
|
123 database engines (for example PostgreSQL) allow multiple indexes to be used |
|
124 for each table in a SELECT. For example, if you had a table and index in PostgreSQL |
|
125 like this: </p> <codeblock id="GUID-F5DE8F24-7471-5992-9896-295CE173D855" xml:space="preserve"> |
|
126 CREATE TABLE pg1(a INT, b INT, c INT, d INT); |
|
127 CREATE INDEX pg1_ix1 ON pg1(a); |
|
128 CREATE INDEX pg1_ix2 ON pg1(b); |
|
129 CREATE INDEX pg1_ix3 ON pg1(c); |
|
130 </codeblock> <p>And if you were to run a query like the following: </p> <codeblock id="GUID-A35663B7-4E7D-5CC0-BF5E-CF3A4CFED63F" xml:space="preserve"> |
|
131 SELECT d FROM pg1 WHERE a=5 AND b=11 AND c=99; |
|
132 </codeblock> <p>Then PostgreSQL might attempt to optimize the query by using |
|
133 all three indexes, one for each term of the WHERE clause. </p> <p>SQLite does |
|
134 not work this way. SQLite is compelled to select a single index to use in |
|
135 the query. It might select any of the three indexes shown, depending on which |
|
136 one the optimizer things will give the best speedup. But in every case it |
|
137 will only select a single index and only a single term of the WHERE clause |
|
138 will be used. </p> <p>SQLite prefers to use a multi-column index such as this: </p> <codeblock id="GUID-40FB7075-1239-5089-BBC5-0D994F4A0C39" xml:space="preserve"> |
|
139 CREATE INDEX pg1_ix_all ON pg1(a,b,c); |
|
140 </codeblock> <p>If the pg1_ix_all index is available for use when the SELECT |
|
141 statement above is prepared, SQLite will likely choose it over any of the |
|
142 single-column indexes because the multi-column index is able to make use of |
|
143 all 3 terms of the WHERE clause. </p> <p>You can trick SQLite into using multiple |
|
144 indexes on the same table by rewriting the query. Instead of the SELECT statement |
|
145 shown above, if you rewrite it as this: </p> <codeblock id="GUID-D7DE75D4-BB01-50DF-A9DC-956A83DED5D0" xml:space="preserve"> |
|
146 SELECT d FROM pg1 WHERE RowID IN ( |
|
147 SELECT RowID FROM pg1 WHERE a=5 |
|
148 INTERSECT |
|
149 SELECT RowID FROM pg1 WHERE b=11 |
|
150 INTERSECT |
|
151 SELECT RowID FROM pg1 WHERE c=99 |
|
152 ) |
|
153 </codeblock> <p>Then each of the individual SELECT statements will using a |
|
154 different single-column index and their results will be combined by the outer |
|
155 SELECT statement to give the correct result. The other SQL database engines |
|
156 like PostgreSQL that are able to make use of multiple indexes per table do |
|
157 so by treating the simpler SELECT statement shown first as if they where the |
|
158 more complicated SELECT statement shown here. </p> </section> |
|
159 <section id="GUID-E90057A8-70B6-590C-B8AE-616DA25BB543"><title>Use Inequality |
|
160 Constraints on the Last Index Term</title> <p>Terms in the WHERE clause of |
|
161 a query or UPDATE or DELETE statement are mostly likely to trigger the use |
|
162 of an index if they are an equality constraint – in other words if the term |
|
163 consists of the name of an indexed column, an equal sign (“=”), and an expression. </p> <p>So, |
|
164 for example, if you have a table and index that look like this: </p> <codeblock id="GUID-84AADB9D-5853-57C2-B489-87DC7FB7AADE" xml:space="preserve"> |
|
165 CREATE TABLE demo315(a,b,c,d); |
|
166 CREATE INDEX demo315_idx1 ON demo315(a,b,c); |
|
167 </codeblock> <p>And a query like this: </p> <codeblock id="GUID-A2B7DA9F-DB82-5D06-80E2-7AF714E403D5" xml:space="preserve"> |
|
168 SELECT d FROM demo315 WHERE a=512; |
|
169 </codeblock> <p>The single “a=512” term of the WHERE clause qualifies as an |
|
170 equality constraint and is likely to provoke the use of the demo315_idx1 index. </p> <p>SQLite |
|
171 supports two other kinds of equality constraints. One is the IN operator: </p> <codeblock id="GUID-EA5D7637-A6B8-5BC0-A72E-D576B0F945A3" xml:space="preserve"> |
|
172 SELECT d FROM demo315 WHERE a IN (512,1024); |
|
173 SELECT d FROM demo315 WHERE a IN (SELECT x FROM someothertable); |
|
174 </codeblock> <p>There other is the IS NULL constraint: </p> <codeblock id="GUID-B2C1C84B-C33D-55C5-8484-24B28EFC8E37" xml:space="preserve"> |
|
175 SELECT d FROM demo315 WHERE a IS NULL; |
|
176 </codeblock> <p>SQLite allows at most one term of an index to be constrained |
|
177 by an inequality such as less than “<”, greater than “>”, less than or |
|
178 equal to “<=”, or greater than or equal to “>=”. </p> <p>The column that |
|
179 the inequality constrains will be the right-most term of the index that is |
|
180 used. So, for example, in this query: </p> <codeblock id="GUID-563231B5-EC3A-57C2-BC6F-1A8129ADE308" xml:space="preserve"> |
|
181 SELECT d FROM demo315 WHERE a=5 AND b>11 AND c=1; |
|
182 </codeblock> <p>Only the first two terms of the WHERE clause will be used |
|
183 with the demo315_idx1 index. The third term, the “c=1” constraint, cannot |
|
184 be used because the “c” column occurs to the right of the “b” column in the |
|
185 index and the “b” column is constrained by an inequality. </p> <p>SQLite allows |
|
186 up to two inequalities on the same column as long as the two inequalities |
|
187 provide an upper and lower bound on the column. For example, in this query: </p> <codeblock id="GUID-4EB94886-EDFF-58F2-8692-011A67AC5A60" xml:space="preserve"> |
|
188 SELECT d FROM demo315 WHERE a=5 AND b>11 AND b<23; |
|
189 </codeblock> <p>All three terms of the WHERE clause will be used because the |
|
190 two inequalities on the “b” column provide an upper and lower bound on the |
|
191 value of “b”. </p> <p>SQLite will only use the four inequalities mentioned |
|
192 above to help constrain a search: “<”, “>”, “<=”, and “>=”. Other inequality |
|
193 operators such as not equal to (“!=” or “<>”) and NOT NULL are not helpful |
|
194 to the query optimizer and will never be used to control an index and help |
|
195 make the query run faster. </p> </section> |
|
196 <section id="GUID-CAD0C181-37E7-578A-A7E1-7843447C247F"><title>Use Indexes |
|
197 To Help ORDER BY Clauses Evaluate Faster</title> <p>The default method for |
|
198 evaluating an ORDER BY clause in a SELECT statement is to first evaluate the |
|
199 SELECT statement and store the results in a temporary tables, then sort the |
|
200 temporary table according to the ORDER BY clause and scan the sorted temporary |
|
201 table to generate the final output. </p> <p>This method always works, but |
|
202 it requires three passes over the data (one pass to generate the result set, |
|
203 a second pass to sort the result set, and a third pass to output the results) |
|
204 and it requires a temporary storage space sufficiently large to contain the |
|
205 entire results set. </p> <p>Where possible, SQLite will avoid storing and |
|
206 sorting the result set by using an index that causes the results to emerge |
|
207 from the query in sorted order in the first place. </p> <p>The way to get |
|
208 SQLite to use an index for sorting is to provide an index that covers the |
|
209 same columns specified in the ORDER BY clause. For example, if the table and |
|
210 index are like this: </p> <codeblock id="GUID-F0103033-C5C8-5177-8AD7-70BCC45C33C9" xml:space="preserve"> |
|
211 CREATE TABLE demo316(a,b,c,data); |
|
212 CREATE INDEX idx316 ON demo316(a,b,c); |
|
213 </codeblock> <p>And you do a query like this: </p> <codeblock id="GUID-D67BB6FF-E213-5B86-A2C1-E1992DA96A62" xml:space="preserve"> |
|
214 SELECT data FROM demo316 ORDER BY a,b,c; |
|
215 </codeblock> <p>SQLite will use the idx316 index to implement the ORDER BY |
|
216 clause, obviating the need for temporary storage space and a separate sorting |
|
217 pass. </p> <p>An index can be used to satisfy the search constraints of a |
|
218 WHERE clause and to impose the ORDER BY ordering of outputs all at once. The |
|
219 trick is for the ORDER BY clause terms to occur immediately after the WHERE |
|
220 clause terms in the index. For example, one can write: </p> <codeblock id="GUID-02063968-34B5-5766-9D02-86D696D39C1E" xml:space="preserve"> |
|
221 SELECT data FROM demo316 WHERE a=5 ORDER BY b,c; |
|
222 </codeblock> <p>The “a” column is used in the WHERE clause and the immediately |
|
223 following terms of the index, “b” and “c” are used in the ORDER BY clause. |
|
224 So in this case the idx316 index would be used both to speed up the search |
|
225 and to satisfy the ORDER BY clause. </p> <p>This query also uses the idx316 |
|
226 index because, once again, the ORDER BY clause term “c” immediate follows |
|
227 the WHERE clause terms “a” and “b” in the index: </p> <codeblock id="GUID-6760EC7E-E86A-5EBD-BDDD-32A68BE78A9E" xml:space="preserve"> |
|
228 SELECT data FROM demo316 WHERE a=5 AND b=17 ORDER BY c; |
|
229 </codeblock> <p>But now consider this: </p> <codeblock id="GUID-9363996C-8C30-5E04-B05F-392C8262F1F6" xml:space="preserve"> |
|
230 SELECT data FROM demo316 WHERE a=5 ORDER BY c; |
|
231 </codeblock> <p>Here there is a gap between the ORDER BY term “c” and the |
|
232 WHERE clause term “a”. So the idx316 index cannot be used to satisfy both |
|
233 the WHERE clause and the ORDER BY clause. The index will be used on the WHERE |
|
234 clause and a separate sorting pass will occur to put the results in the correct |
|
235 order. </p> </section> |
|
236 <section id="GUID-109AF0DA-A054-504A-A432-76BD145B2AC4"><title>Add Result |
|
237 Columns To The End Of Indexes</title> <p>Queries will sometimes run faster |
|
238 if their result columns appear in the right-most entries of an index. Consider |
|
239 the following example: </p> <codeblock id="GUID-63292052-B523-5671-B3EE-E10A66C7275F" xml:space="preserve"> |
|
240 CREATE TABLE demo317(a,b,c,data); |
|
241 CREATE INDEX idx317 ON demo316(a,b,c); |
|
242 </codeblock> <p>A query where all result column terms appears in the index, |
|
243 such as </p> <codeblock id="GUID-41F740E7-EAFC-583B-BFE6-E63DBEA354D7" xml:space="preserve"> |
|
244 SELECT c FROM demo317 WHERE a=5 ORDER BY b; |
|
245 </codeblock> <p>will typically run about twice as fast or faster than a query |
|
246 that uses columns that are not in the index, e.g. </p> <codeblock id="GUID-098752F4-304A-5A84-834E-240D97D97C2D" xml:space="preserve"> |
|
247 SELECT data FROM demo317 WHERE a=5 ORDER BY b; |
|
248 </codeblock> <p>The reason for this is that when all information is contained |
|
249 within the index entry only a single search has to be made for each row of |
|
250 output. But when some of the information is in the index and other parts are |
|
251 in the table, first there must be a search for the appropriate index entry |
|
252 then a separate search is made for the appropriate table row based on the |
|
253 RowID found in the index entry. Twice as much searching has to be done for |
|
254 each row of output generated. </p> <p>The extra query speed does not come |
|
255 for free, however. Adding additional columns to an index makes the database |
|
256 file larger. So when developing an application, the programmer will need to |
|
257 make a space versus time trade-off to determine whether the extra columns |
|
258 should be added to the index or not. </p> <p>Note that if any column of the |
|
259 result must be obtained from the original table, then the table row will have |
|
260 to be searched for anyhow. There will be no speed advantage, so you might |
|
261 as well omit the extra columns from the end of the index and save on storage |
|
262 space. The speed-up described in this section can only be realized when every |
|
263 column in a table is obtainable from the index. </p> <p>Taking into account |
|
264 the results of the previous few sections, the best set of columns to put in |
|
265 an index can be described as follows: </p> <ul> |
|
266 <li id="GUID-EBF4DEFB-2F5F-5D78-92FA-06FEAB0C3650"><p>The first columns in |
|
267 the index should be columns that have equality constraints in the WHERE clause |
|
268 of the query. </p> </li> |
|
269 <li id="GUID-E5CB725C-6304-5946-9E18-E69B5F1A6A88"><p>The second group of |
|
270 columns should match the columns specified in the ORDER BY clause. </p> </li> |
|
271 <li id="GUID-FBC00251-C3AD-5AC0-9102-EF66EA37DE4E"><p>Add additional columns |
|
272 to the end of the index that are used in the result set of the query. </p> </li> |
|
273 </ul> </section> |
|
274 <section id="GUID-D7B5B389-E031-5512-8186-235B22F0D9C1"><title>Resolve Indexing |
|
275 Ambiguities Using the Unary “+” Operator</title> <p>The SQLite query optimizer |
|
276 usually does a good job of choosing the best index to use for a particular |
|
277 query, especially if ANALYZE has been run to provide it with index performance |
|
278 statistics. But occasions do arise where it is useful to give the optimizer |
|
279 hints. </p> <p>One of the easiest ways to control the operation of the optimizer |
|
280 is to disqualify terms in the WHERE clause or ORDER BY clause as candidates |
|
281 for optimization by using the unary “+” operator. </p> <p>In SQLite, a unary |
|
282 “+” operator is a no-op. It makes no change to its operand, even if the operand |
|
283 is something other than a number. So you can always prefix a “+” to an expression |
|
284 in without changing the meaning of the expression. As the optimizer will only |
|
285 use terms in WHERE, HAVING, or ON clauses that have an index column name on |
|
286 one side of a comparison operator, you can prevent such a term from being |
|
287 used by the optimizer by prefixing the column name with a “+”. </p> <p>For |
|
288 example, suppose you have a database with a schema like this: </p> <codeblock id="GUID-E7747EFD-FE58-5EA4-88B3-097C0A303F52" xml:space="preserve"> |
|
289 CREATE TABLE demo321(a,b,c,data); |
|
290 CREATE INDEX idx321a ON demo321(a); |
|
291 CREATE INDEX idx321b ON demo321(b); |
|
292 </codeblock> <p>If you issue a query such as this: </p> <codeblock id="GUID-87BD59FC-33A8-598B-B91F-607B26F7349D" xml:space="preserve"> |
|
293 SELECT data FROM demo321 WHERE a=5 AND b=11; |
|
294 </codeblock> <p>The query optimizer might use the “a=5” term with idx321a |
|
295 or it might use the “b=11” term with the idx321b index. But if you want to |
|
296 force the use of the idx321a index you can accomplish that by disqualifying |
|
297 the second term of the WHERE clause as a candidate for optimization using |
|
298 a unary “+” like this: </p> <codeblock id="GUID-E6EAB459-726A-5FE4-8065-6C46AC2C5B5C" xml:space="preserve"> |
|
299 SELECT data FROM demo321 WHERE a=5 AND +b=11; |
|
300 </codeblock> <p>The “+” in front of the “b=11” turns the left-hand side of |
|
301 the equals comparison operator into an expression instead of an indexed column |
|
302 name. The optimizer will then not recognize that the second term can be used |
|
303 with an index and so the optimizer is compelled to use the first “a=5” term. </p> <p>The |
|
304 unary “+” operator can also be used to disable ORDER BY clause optimizations. |
|
305 Consider this query: </p> <codeblock id="GUID-0488D466-77B7-50E0-AB85-FF033A2D75DC" xml:space="preserve"> |
|
306 SELECT data FROM demo321 WHERE a=5 ORDER BY b; |
|
307 </codeblock> <p>The optimizer has the choice of using the “a=5” term of the |
|
308 WHERE clause with idx321a to restrict the search. Or it might choose to use |
|
309 do a full table scan with idx321b to satisfy the ORDER BY clause and thus |
|
310 avoid a separate sorting pass. You can force one choice or the other using |
|
311 a unary “+”. </p> <p>To force the use of idx321a on the WHERE clause, add |
|
312 the unary “+” in from of the “b” in the ORDER BY clause: </p> <codeblock id="GUID-E55A085F-D91F-58E0-B964-317BB3A9D7ED" xml:space="preserve"> |
|
313 SELECT data FROM demo321 WHERE a=5 ORDER BY +b; |
|
314 </codeblock> <p>To go the other way and force the idx321b index to be used |
|
315 to satisfy the ORDER BY clause, disqualify the WHERE term by prefixing with |
|
316 a unary “+”: </p> <codeblock id="GUID-D97EF52A-1F74-57EB-AC11-7911B4E088B3" xml:space="preserve"> |
|
317 SELECT data FROM demo321 WHERE +a=5 ORDER BY b; |
|
318 </codeblock> <p>The reader is cautioned not to overuse the unary “+” operator. |
|
319 The SQLite query optimizer usually picks the best index without any outside |
|
320 help. Premature use of unary “+” can confuse the optimizer and cause less |
|
321 than optimal performance. But in some cases it is useful to be able override |
|
322 the decisions of the optimizer, and the unary “+” operator is an excellent |
|
323 way to do this when it becomes necessary. </p> </section> |
|
324 <section id="GUID-7BEBC49C-0528-5D58-9626-2A92F3D0D9E8"><title>Avoid Indexing |
|
325 Large BLOBs and CLOBs</title> <p>SQLite stores indexes as b-trees. Each b-tree |
|
326 node uses one page of the database file. In order to maintain an acceptable |
|
327 fan-out, the b-tree module within SQLite requires that at least 4 entries |
|
328 must fit on each page of a b-tree. There is also some overhead associated |
|
329 with each b-tree page. So at the most there is about 250 bytes of space available |
|
330 on the main b-tree page for each index entry. </p> <p>If an index entry exceeds |
|
331 this allotment of approximately 250 bytes excess bytes are spilled to overflow |
|
332 pages. There is no arbitrary limit on the number of overflow pages or on the |
|
333 length of a b-tree entry, but for maximum efficiency it is best to avoid overflow |
|
334 pages, especially in indexes. This means that you should strive to keep the |
|
335 number of bytes in each index entry below 250. </p> <p>If you keep the size |
|
336 of indexes significantly smaller than 250 bytes, then the b-tree fan-out is |
|
337 increased and the binary search algorithm used to search for entries in an |
|
338 index has fewer pages to examine and therefore runs faster. So the fewer bytes |
|
339 used in each index entry the better, at least from a performance perspective. </p> <p>For |
|
340 these reasons, it is recommended that you avoid indexing large BLOBs and CLOBs. |
|
341 SQLite will continue to work when large BLOBs and CLOBs are indexed, but there |
|
342 will be a performance impact. </p> <p>On the other hand, if you need to lookup |
|
343 entries using a large BLOB or CLOB as the key, then by all means use an index. |
|
344 An index on a large BLOB or CLOB is not as fast as an index using more compact |
|
345 data types such as integers, but it is still many order of magnitude faster |
|
346 than doing a full table scan. So to be more precise, the advice of this section |
|
347 is that you should design your applications so that you do not need to lookup |
|
348 entries using a large BLOB or CLOB as the key. Try to arrange to have compact |
|
349 keys consisting of short strings or integers. </p> <p>Note that many other |
|
350 SQL database engines disallow the indexing of BLOBs and CLOBs in the first |
|
351 place. You simple cannot do it. SQLite is more flexible that most in that |
|
352 it does allow BLOBs and CLOBs to be indexed and it will use those indexes |
|
353 when appropriate. But for maximum performance, it is best to use smaller search |
|
354 keys. </p> </section> |
|
355 <section id="GUID-DD40F29F-DF93-536E-9B52-F9B9FF45155D"><title>Avoid Excess |
|
356 Indexes</title> <p>Some developers approach SQL-based application development |
|
357 with the attitude that indexes never hurt and that the more indexes you have, |
|
358 the faster your application will run. This is definitely not the case. There |
|
359 is a costs associated with each new index you create: </p> <ul> |
|
360 <li id="GUID-FD257BF7-F938-54B5-AC03-9536712D6281"><p>Each new index takes |
|
361 up additional space in the database file. The more indexes you have, the larger |
|
362 your database files will become for the same amount of data. </p> </li> |
|
363 <li id="GUID-E1B74FB6-246A-5148-AF06-04E1B4B949F1"><p>Every INSERT and UPDATE |
|
364 statement modifies both the original table and all indexes on that table. |
|
365 So the performance of INSERT and UPDATE decreases linearly with the number |
|
366 of indexes. </p> </li> |
|
367 <li id="GUID-56AAE2D1-71D6-5A23-8190-B0C80B204DED"><p>Compiling new SQL statements |
|
368 using <codeph>Prepare()</codeph> takes longer when there are more indexes |
|
369 for the optimizer to choose between. </p> </li> |
|
370 <li id="GUID-24B7F7D8-FAA9-5C78-B3C7-B886FA774C0B"><p>Surplus indexes give |
|
371 the optimizer more opportunities to make a bad choice. </p> </li> |
|
372 </ul> <p>Your policy on indexes should be to avoid them wherever you can. |
|
373 Indexes are powerful medicine and can work wonders to improve the performance |
|
374 of a program. But just as too many drugs can be worse than none at all, so |
|
375 also can too many indexes cause more harm than good. </p> <p>When building |
|
376 a new application, a good approach is to omit all explicitly declared indexes |
|
377 in the beginning and only add indexes as needed to address specific performance |
|
378 problems. </p> <p>Take care to avoid redundant indexes. For example, consider |
|
379 this schema: </p> <codeblock id="GUID-89F20101-1628-5783-82B0-2ABE84078C7D" xml:space="preserve"> |
|
380 CREATE TABLE demo323a(a,b,c); |
|
381 CREATE INDEX idx323a1 ON demo323(a); |
|
382 CREATE INDEX idx323a2 ON demo323(a,b); |
|
383 </codeblock> <p>The idx323a1 index is redundant and can be eliminated. Anything |
|
384 that the idx323a1 index can do the idx323a2 index can do better. </p> <p>Other |
|
385 redundancies are not quite as apparent as the above. Recall that any column |
|
386 or columns that are declared UNIQUE or PRIMARY KEY (except for the special |
|
387 case of INTEGER PRIMARY KEY) are automatically indexed. So in the following |
|
388 schema: </p> <codeblock id="GUID-2FE7B726-4027-518C-9217-B4BD1ECDA991" xml:space="preserve"> |
|
389 CREATE TABLE demo323b(x TEXT PRIMARY KEY, y INTEGER UNIQUE); |
|
390 CREATE INDEX idx323b1 ON demo323b(x); |
|
391 CREATE INDEX idx323b2 ON demo323b(y); |
|
392 </codeblock> <p>Both indexes are redundant and can be eliminated with no loss |
|
393 in query performance. Occasionally one sees a novice SQL programmer use both |
|
394 UNIQUE and PRIMARY KEY on the same column: </p> <codeblock id="GUID-CDE12649-BDB4-58D4-8981-02628BDF5C79" xml:space="preserve"> |
|
395 CREATE TABLE demo323c(p TEXT UNIQUE PRIMARY KEY, q); |
|
396 </codeblock> <p>This has the effect of creating two indexes on the “p” column |
|
397 – one for the UNIQUE keywords and another for the PRIMARY KEY keyword. Both |
|
398 indexes are identical so clearly one can be omitted. A PRIMARY KEY is guaranteed |
|
399 to always be unique so the UNIQUE keyword can be removed from the demo323c |
|
400 table definition with no ambiguity or loss of functionality. </p> <p>It is |
|
401 not a fatal error to create too many indexes or redundant indexes. SQLite |
|
402 will continue to generate the correct answers but it may take longer to produce |
|
403 those answers and the resulting database files might be a little larger. So |
|
404 for best results, keep the number of indexes to a minimum. </p> </section> |
|
405 <section id="GUID-9337E315-BB5A-56D0-8319-6C398D26151F"><title>Avoid Tables |
|
406 and Indexes with an Excessive Number of Columns</title> <p>SQLite places no |
|
407 arbitrary limits on the number of columns in a table or index. There are known |
|
408 commercial applications using SQLite that construct tables with tens of thousands |
|
409 of columns each. And these applications actually work. </p> <p>However the |
|
410 database engine is optimized for the common case of tables with no more than |
|
411 a few dozen columns. For best performance you should try to stay in the optimized |
|
412 region. Furthermore, we note that relational databases with a large number |
|
413 of columns are usually not well normalized. So even apart from performance |
|
414 considerations, if you find your design has tables with more than a dozen |
|
415 or so columns, you really need to rethink how you are building your application. </p> <p>There |
|
416 are a number of places in <codeph>Prepare()</codeph> that run in time O(N<sup>2</sup>) |
|
417 where N is the number of columns in the table. The constant of proportionality |
|
418 is small in these cases so you should not have any problems for N of less |
|
419 than one hundred but for N on the order of a thousand, the time to run <codeph>Prepare()</codeph> can |
|
420 start to become noticeable. </p> <p>When the bytecode is running and it needs |
|
421 to access the i-th column of a table, the values of the previous i-1 columns |
|
422 must be accessed first. So if you have a large number of columns, accessing |
|
423 the last column can be an expensive operation. This fact also argues for putting |
|
424 smaller and more frequently accessed columns early in the table. </p> <p>There |
|
425 are certain optimizations that will only work if the table has 30 or fewer |
|
426 columns. The optimization that extracts all necessary information from an |
|
427 index and never refers to the underlying table works this way. So in some |
|
428 cases, keeping the number of columns in a table at or below 30 can result |
|
429 in a 2-fold speed improvement. </p> <p>Indexes will only be used if they contain |
|
430 30 or fewer columns. You can put as many columns in an index as you want, |
|
431 but if the number is greater than 30, the index will never improve performance |
|
432 and will never do anything but take up space in your database file. </p> </section> |
|
433 </conbody><related-links> |
|
434 <link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext> |
|
435 </link> |
|
436 <link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server |
|
437 Guide</linktext></link> |
|
438 <link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext> |
|
439 </link> |
|
440 <link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext> |
|
441 </link> |
|
442 <link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient |
|
443 Tables</linktext></link> |
|
444 <link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile |
|
445 Corruption</linktext></link> |
|
446 <link><linktext/></link> |
|
447 <link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion |
|
448 Tips</linktext></link> |
|
449 <link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema |
|
450 Tips</linktext></link> |
|
451 <link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext> |
|
452 </link> |
|
453 <link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement |
|
454 Tips</linktext></link> |
|
455 <link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL Joins</linktext> |
|
456 </link> |
|
457 <link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE |
|
458 Command</linktext></link> |
|
459 <link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE |
|
460 CLause Tips</linktext></link> |
|
461 </related-links></concept> |