Symbian3/PDK/Source/GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita
changeset 5 f345bda72bc4
parent 3 46218c8b8afa
child 14 578be2adaf3e
equal deleted inserted replaced
4:4816d766a08a 5:f345bda72bc4
     7     Nokia Corporation - initial contribution.
     7     Nokia Corporation - initial contribution.
     8 Contributors: 
     8 Contributors: 
     9 -->
     9 -->
    10 <!DOCTYPE concept
    10 <!DOCTYPE concept
    11   PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
    11   PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
    12 <concept xml:lang="en" id="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04"><title>SQL Joins</title><prolog><metadata><keywords/></metadata></prolog><conbody><p>This guide explains how to use CROSS JOIN phrases to override the optimizer's ordering of tables. </p> <section><title>Introduction</title> <p>SQLite uses the “CROSS JOIN” phrase as a means to override the table reordering decisions of the query optimizer. The CROSS JOIN connector is rarely needed and should probably never be used prior to the final performance tuning phase of application development. Even then, SQLite usually gets the order of tables in a join right without any extra help. But on those rare occasions when SQLite gets it wrong, the CROSS JOIN connector is an invaluable way of tweaking the optimizer to do what you want. </p> <p><b>Intended audience:</b> </p> <p>This document is intended to be used by Symbian OS licensees and third party application developers. </p> </section> <section id="GUID-8B5F579E-B94D-527C-9530-367B5B0729C9"><title>Use CROSS JOIN to Force a Particular Join Ordering</title> <p>The SQLite query optimizer will attempt to reorder the tables in a join in order to find the most efficient way to evaluate the join. The optimizer usually does this job well, but occasionally it will make a bad choice. When that happens, it might be necessary to override the optimizer's choice by explicitly specifying the order of tables in the SELECT statement. </p> <p>To illustrate the problem, consider the following schema: </p> <codeblock id="GUID-8B2D4E3C-7231-5D63-B8CD-CD687750CD1E" xml:space="preserve">
    12 <concept id="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04" xml:lang="en"><title>SQL Joins</title><shortdesc>This guide explains how to use CROSS JOIN phrases to override the
       
    13 optimizer's ordering of tables.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
       
    14 <section id="GUID-C6BDCE52-468D-459F-980C-A9041B56BDF1"><title>Introduction</title> <p>SQLite uses the “CROSS JOIN” phrase
       
    15 as a means to override the table reordering decisions of the query optimizer.
       
    16 The CROSS JOIN connector is rarely needed and should probably never be used
       
    17 prior to the final performance tuning phase of application development. Even
       
    18 then, SQLite usually gets the order of tables in a join right without any
       
    19 extra help. But on those rare occasions when SQLite gets it wrong, the CROSS
       
    20 JOIN connector is an invaluable way of tweaking the optimizer to do what you
       
    21 want. </p> <p><b>Intended
       
    22 audience:</b> </p> <p>This document is intended to be used by Symbian platform
       
    23 licensees and third party application developers. </p> </section>
       
    24 <section id="GUID-8B5F579E-B94D-527C-9530-367B5B0729C9"><title>Use CROSS JOIN
       
    25 to Force a Particular Join Ordering</title> <p>The SQLite query optimizer
       
    26 will attempt to reorder the tables in a join in order to find the most efficient
       
    27 way to evaluate the join. The optimizer usually does this job well, but occasionally
       
    28 it will make a bad choice. When that happens, it might be necessary to override
       
    29 the optimizer's choice by explicitly specifying the order of tables in the
       
    30 SELECT statement. </p> <p>To illustrate the problem, consider the following
       
    31 schema: </p> <codeblock id="GUID-8B2D4E3C-7231-5D63-B8CD-CD687750CD1E" xml:space="preserve">
    13 CREATE TABLE node(
    32 CREATE TABLE node(
    14     id INTEGER PRIMARY KEY,
    33     id INTEGER PRIMARY KEY,
    15     name TEXT
    34     name TEXT
    16 );
    35 );
    17 
    36 
    22     dest INTEGER REFERENCES node,
    41     dest INTEGER REFERENCES node,
    23     PRIMARY KEY(orig, dest)
    42     PRIMARY KEY(orig, dest)
    24 );
    43 );
    25 
    44 
    26 CREATE INDEX edge_idx ON edge(dest,orig);
    45 CREATE INDEX edge_idx ON edge(dest,orig);
    27 </codeblock> <p>This schema defines a directed graph with the ability to store a name on each node of the graph. Similar designs (though usually more complicated) arise frequently in application development. Now consider a three-way join against the above schema: </p> <codeblock id="GUID-61383F86-E84F-58F4-B2B1-81CD25635B88" xml:space="preserve">
    46 </codeblock> <p>This schema defines a directed graph with the ability to store
       
    47 a name on each node of the graph. Similar designs (though usually more complicated)
       
    48 arise frequently in application development. Now consider a three-way join
       
    49 against the above schema: </p> <codeblock id="GUID-61383F86-E84F-58F4-B2B1-81CD25635B88" xml:space="preserve">
    28 SELECT e.*
    50 SELECT e.*
    29 FROM edge AS e,
    51 FROM edge AS e,
    30  node AS n1,
    52  node AS n1,
    31  node AS n2
    53  node AS n2
    32 WHERE n1.name = 'alice'
    54 WHERE n1.name = 'alice'
    33  AND n2.name = 'bob'
    55  AND n2.name = 'bob'
    34  AND e.orig = n1.id
    56  AND e.orig = n1.id
    35  AND e.dest = n2.id;
    57  AND e.dest = n2.id;
    36 </codeblock> <p>This query asks for information about all edges that go from nodes labelled “alice” over to nodes labelled “bob”. </p> <p>There are many ways that the optimizer might choose to implement this query, but they all boil down to two basic designs. The first option looks for edges between all pairs of nodes. The following pseudocode illustrates: </p> <codeblock id="GUID-19A9F537-D791-5894-9FA9-E205B3768CF7" xml:space="preserve">
    58 </codeblock> <p>This query asks for information about all edges that go from
       
    59 nodes labelled “alice” over to nodes labelled “bob”. </p> <p>There are many
       
    60 ways that the optimizer might choose to implement this query, but they all
       
    61 boil down to two basic designs. The first option looks for edges between all
       
    62 pairs of nodes. The following pseudocode illustrates: </p> <codeblock id="GUID-19A9F537-D791-5894-9FA9-E205B3768CF7" xml:space="preserve">
    37 foreach n1 where n1.name='alice' do:
    63 foreach n1 where n1.name='alice' do:
    38     foreach n2 where n2.name='bob' do:
    64     foreach n2 where n2.name='bob' do:
    39         foreach e where e.orig=n1.id and e.dest=n2.id do:
    65         foreach e where e.orig=n1.id and e.dest=n2.id do:
    40             return e.*
    66             return e.*
    41         end
    67         end
    42     end
    68     end
    43 end
    69 end
    44 </codeblock> <p>The second design is to loop over all 'alice' nodes and follow edges off of those nodes looking for nodes named 'bob'. (The roles of 'alice' and 'bob' might be reversed here without changing the fundamental character or the algorithm): </p> <codeblock id="GUID-02A72F7B-D5AB-5306-95F9-9832EF05FB25" xml:space="preserve">
    70 </codeblock> <p>The second design is to loop over all 'alice' nodes and follow
       
    71 edges off of those nodes looking for nodes named 'bob'. (The roles of 'alice'
       
    72 and 'bob' might be reversed here without changing the fundamental character
       
    73 or the algorithm): </p> <codeblock id="GUID-02A72F7B-D5AB-5306-95F9-9832EF05FB25" xml:space="preserve">
    45 foreach n1 where n1.name='alice' do:
    74 foreach n1 where n1.name='alice' do:
    46     foreach e where e.orig=n1.id do:
    75     foreach e where e.orig=n1.id do:
    47         foreach n2 where n2.id=e.dest and n2.name='bob' do:
    76         foreach n2 where n2.id=e.dest and n2.name='bob' do:
    48             return e.*
    77             return e.*
    49         end
    78         end
    50     end
    79     end
    51 end
    80 end
    52 </codeblock> <p>The first algorithm above corresponds to a join order of n1-n2-e. The second algorithm corresponds to a join order of n1-e-n2. </p> <p>The question the optimizer has to answer is which of these two algorithms is likely to give the fastest result, and it turns out that the answer depends on the nature of the data stored in the database. </p> <p>Let the number of alice nodes be M and the number of bob nodes be N. Consider two scenarios: In the first scenario, M and N are both 2 but there are thousands of edges on each node. In this case, the first algorithm is preferred. With the first algorithm, the inner loop checks for the existence of an edge between a pair of nodes and outputs the result if found. But because there are only 2 alice and bob nodes each, the inner loop only has to run 4 times and the query is very quick. </p> <p>The second algorithm would take much longer here. The outer loop of the second algorithm only executes twice, but because there are a large number of edges leaving each 'alice' node, the middle loop has to iterate many thousands of times. So in the first scenario, we prefer to use the first algorithm. </p> <p>Now consider the case where M and N are both 3500. But suppose each of these nodes is connected by only one or two edges. In this case, the second algorithm is preferred. </p> <p>With the second algorithm, the outer loop still has to run 3500 times, but the middle loop only runs once or twice for each outer loop and the inner loop will only run once for each middle loop, if at all. So the total number of iterations of the inner loop is around 7000. </p> <p>The first algorithm, on the other hand, has to run both its outer loop and its middle loop 3500 times each, resulting in 12 million iterations of the middle loop. Thus in the second scenario, second algorithm is nearly 2000 times faster than the first. </p> <p>In this particular example, if you run ANALYZE on your database to collect statistics on the tables, the optimizer will be able to figure out the best algorithm to use. But if you do not want to run ANALYZE or if you do not want to waste database space storing the SQLITE_STAT1 statistics table that ANALYZE generates, you can manually override the decision of the optimizer by specifying a particular order for tables in a join. You do this by substituting the keyword phrase “CROSS JOIN” in place of commas in the FROM clause. </p> <p>The “CROSS JOIN” phrase forces the table to the left to be used before the table to the right. For example, to force the first algorithm, write the query this way: </p> <codeblock id="GUID-E0FE82CE-0D32-535F-A4F7-EF582BFA7AE9" xml:space="preserve">
    81 </codeblock> <p>The first algorithm above corresponds to a join order of n1-n2-e.
       
    82 The second algorithm corresponds to a join order of n1-e-n2. </p> <p>The question
       
    83 the optimizer has to answer is which of these two algorithms is likely to
       
    84 give the fastest result, and it turns out that the answer depends on the nature
       
    85 of the data stored in the database. </p> <p>Let the number of alice nodes
       
    86 be M and the number of bob nodes be N. Consider two scenarios: In the first
       
    87 scenario, M and N are both 2 but there are thousands of edges on each node.
       
    88 In this case, the first algorithm is preferred. With the first algorithm,
       
    89 the inner loop checks for the existence of an edge between a pair of nodes
       
    90 and outputs the result if found. But because there are only 2 alice and bob
       
    91 nodes each, the inner loop only has to run 4 times and the query is very quick. </p> <p>The
       
    92 second algorithm would take much longer here. The outer loop of the second
       
    93 algorithm only executes twice, but because there are a large number of edges
       
    94 leaving each 'alice' node, the middle loop has to iterate many thousands of
       
    95 times. So in the first scenario, we prefer to use the first algorithm. </p> <p>Now
       
    96 consider the case where M and N are both 3500. But suppose each of these nodes
       
    97 is connected by only one or two edges. In this case, the second algorithm
       
    98 is preferred. </p> <p>With the second algorithm, the outer loop still has
       
    99 to run 3500 times, but the middle loop only runs once or twice for each outer
       
   100 loop and the inner loop will only run once for each middle loop, if at all.
       
   101 So the total number of iterations of the inner loop is around 7000. </p> <p>The
       
   102 first algorithm, on the other hand, has to run both its outer loop and its
       
   103 middle loop 3500 times each, resulting in 12 million iterations of the middle
       
   104 loop. Thus in the second scenario, second algorithm is nearly 2000 times faster
       
   105 than the first. </p> <p>In this particular example, if you run ANALYZE on
       
   106 your database to collect statistics on the tables, the optimizer will be able
       
   107 to figure out the best algorithm to use. But if you do not want to run ANALYZE
       
   108 or if you do not want to waste database space storing the SQLITE_STAT1 statistics
       
   109 table that ANALYZE generates, you can manually override the decision of the
       
   110 optimizer by specifying a particular order for tables in a join. You do this
       
   111 by substituting the keyword phrase “CROSS JOIN” in place of commas in the
       
   112 FROM clause. </p> <p>The “CROSS JOIN” phrase forces the table to the left
       
   113 to be used before the table to the right. For example, to force the first
       
   114 algorithm, write the query this way: </p> <codeblock id="GUID-E0FE82CE-0D32-535F-A4F7-EF582BFA7AE9" xml:space="preserve">
    53 SELECT *
   115 SELECT *
    54 FROM node AS n1 CROSS JOIN
   116 FROM node AS n1 CROSS JOIN
    55  node AS n2 CROSS JOIN
   117  node AS n2 CROSS JOIN
    56  edge AS e
   118  edge AS e
    57 WHERE n1.name = 'alice'
   119 WHERE n1.name = 'alice'
    65  node AS n2
   127  node AS n2
    66 WHERE n1.name = 'alice'
   128 WHERE n1.name = 'alice'
    67  AND n2.name = 'bob'
   129  AND n2.name = 'bob'
    68  AND e.orig = n1.id
   130  AND e.orig = n1.id
    69  AND e.dest = n2.id;
   131  AND e.dest = n2.id;
    70 </codeblock> <p>The CROSS JOIN keyword phrase is perfectly valid SQL syntax according to the SQL standard, but it is syntax that is rarely if ever used in real-world SQL statements. Because it is so rarely used otherwise, SQLite has appropriated the phrase as a means to override the table reordering decisions of the query optimizer. </p> <p>The CROSS JOIN connector is rarely needed and should probably never be used prior to the final performance tuning phase of application development. Even then, SQLite usually gets the order of tables in a join right without any extra help. But on those rare occasions when SQLite gets it wrong, the CROSS JOIN connector is an invaluable way of tweaking the optimizer to do what you want. </p> </section> </conbody><related-links><link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext> </link> <link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server Guide</linktext> </link> <link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext> </link> <link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext> </link> <link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
   132 </codeblock> <p>The CROSS JOIN keyword phrase is perfectly valid SQL syntax
    71                 Tables</linktext> </link> <link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent
   133 according to the SQL standard, but it is syntax that is rarely if ever used
    72                 Datafile Corruption</linktext> </link> <link href="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita"><linktext>SQL Index
   134 in real-world SQL statements. Because it is so rarely used otherwise, SQLite
    73                 Tips</linktext> </link> <link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion
   135 has appropriated the phrase as a means to override the table reordering decisions
    74                 Tips</linktext> </link> <link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
   136 of the query optimizer. </p> <p>The CROSS JOIN connector is rarely needed
    75                 Tips</linktext> </link> <link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL
   137 and should probably never be used prior to the final performance tuning phase
    76                 Expressions</linktext> </link> <link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
   138 of application development. Even then, SQLite usually gets the order of tables
    77                 Tips</linktext> </link> <link><linktext/></link><link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
   139 in a join right without any extra help. But on those rare occasions when SQLite
    78                 Command</linktext> </link> <link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE CLause
   140 gets it wrong, the CROSS JOIN connector is an invaluable way of tweaking the
    79                 Tips</linktext> </link> </related-links></concept>
   141 optimizer to do what you want. </p> </section>
       
   142 </conbody><related-links>
       
   143 <link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext>
       
   144 </link>
       
   145 <link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server
       
   146 Guide</linktext></link>
       
   147 <link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext>
       
   148 </link>
       
   149 <link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
       
   150 Tables</linktext></link>
       
   151 <link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile
       
   152 Corruption</linktext></link>
       
   153 <link href="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita"><linktext>SQL Index
       
   154 Tips</linktext></link>
       
   155 <link href="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita"><linktext>SQL Insertion
       
   156 Tips</linktext></link>
       
   157 <link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
       
   158 Tips</linktext></link>
       
   159 <link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext>
       
   160 </link>
       
   161 <link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
       
   162 Tips</linktext></link>
       
   163 <link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
       
   164 Command</linktext></link>
       
   165 <link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE
       
   166 CLause Tips</linktext></link>
       
   167 </related-links></concept>