Avoid +Transient Tables

Avoid Queries +that Require Transient Tables

Complex queries sometimes require +SQLite to compute intermediate results that are stored in transient tables. +Transient tables are nameless entities that exist for the duration of a single +SQL statement and are automatically deleted at the conclusion of the statement.

We +use the term transient to describe these tables rather than temporary to avoid +confusion with TEMP tables. TEMP tables are tables that are private to a particular +database connection and which persist for the duration of that database connection. +TEMP tables have names and work like regular tables with the exceptions that +they are only visible to the current database connection and are automatically +deleted when the connection is closed. Transient tables, in contrast, are +invisible internal holding areas for intermediate results that exist only +until the end of the current SQL statement.

Transient tables will +be stored either in RAM or on disk depending on device configuration. The +Symbian default configuration is to store transient tables on disk although +downstream device manufacturers may alter this.

When configured to +store transient tables on disk, in reality a combination of memory and file +space is used to hold the table. Transient tables go through the same page +cache mechanism that regular database files go through. So as long as the +transient tables do not grow too large, they will always be held in memory +in the page cache rather than being written to disk. Information is only written +to disk when the cache overflows.

When transient tables are configured +to be stored in memory rather than in a file, the cache is considered infinitely +large and will never overflow and thus nothing will ever be written out to +the file system. Storing transient tables in memory rather than in files avoids +the overhead of file I/O but requires potentially unbounded amounts of memory. +Using file-based transient tables puts an upper bound on the amount of memory +required but adds file I/O overhead.

One strategy for dealing with +transient tables is to avoid them all together. If you never use transient +tables then it does not really matter if they are stored in memory or on disk. +The remainder of this section will enumerate all of the things that might +provoke SQLite to create a transient table. If you avoid all of these things +in your code, then you never need to worry about where your transient tables +are stored.

A transient table is created whenever you use the DISTINCT +keyword in a query:

+SELECT DISTINCT name FROM artist; +

The DISTINCT keyword guarantees that each row of the result +set will be different from all other rows. In order to enforce this SQLite +creates a transient table that stores all prior output rows from the query. +If a new row can be found in the transient table then the new row is skipped. +The same situation occurs when DISTINCT is used within an aggregate function:

+SELECT avg(DISTINCT cnt) FROM playlist; +

In this context the DISTINCT keyword means that the aggregate +function is only applied to distinct elements of the result. As before, a +transient table is used to record prior values of the result so that SQLite +can tell if new results have been seen before.

The UNION, INTERSECT, +and EXCEPT operators used to create compound queries always generate a distinct +set of rows. Even though the DISTINCT keyword does not appear, it is implied +for these connectors, and a transient table is used to enforce the distinctness. +In contrast, the UNION ALL operator does not require a transient table.

A +transient table might be used to implement an ORDER BY or GROUP BY clause. +SQLite always tries to use an index to satisfy an ORDER BY or GROUP BY clause +if it can. But if no indexes are available which can satisfy the ORDER BY +or GROUP BY, then the entire results set is loaded into a transient table +and sorted there.

Subqueries on the right-hand side of the IN operator +use a transient table. Consider an example:

+SELECT * FROM ex334a WHERE id IN (SELECT id FROM ex334b WHERE amt>7); +

The results of the (SELECT id FROM ex334b WHERE amt>7) subquery +are stored in a transient table. Then the value of the id column is +checked against this table for each row of the outer query to determine if +that row should be included in the result set.

Sometimes subqueries +in the FROM clause of a query will result in a transient table. This is not +always the case because SQLite tries very hard to convert subqueries in the +FROM clause into a join that does not use subqueries. SQLite uses the term flattening to +describe the conversion of FROM clause subqueries into joins. Flattening is +an optimization that makes queries run faster. But in some cases, flattening +cannot occur. When the flattening optimization is inhibited, the results of +the subqueries are stored in transient tables and then a separate query is +run against those transient tables to generate the final results.

Consider +the following schema and query:

+CREATE TABLE t1(a,b,c); +CREATE TABLE t2(x,y,z); +SELECT * FROM t1 JOIN (SELECT x,y FROM t2); +

The subquery in the FROM clause is plainly visible in the +SELECT statement above. But if the subquery were disguised as a view, it might +be less noticeable. A view is really just a macro that serves as a place-holder +for a subquery. So the SELECT statement above is equivalent to the following:

+CREATE VIEW v2 AS SELECT x, y FROM t2; +SELECT * FROM t1 JOIN v2; +

In either case above, whether the subquery is stated explicitly +or is implied by the use of a view, flattening occurs and the query is converted +into this:

+SELECT a,b,c,x,y FROM t1 JOIN t2; +

Had flattening not occurred, it would have been necessary +to evaluate the v2 view or the subquery into a transient table then execute +the outer query using the transient table as one of the two tables in the +join. SQLite prefers to flatten the query because a flattened query generally +uses fewer resources and is better able to take advantage of indexes. The +rules for determining when flattening occurs and when it does not are complex. +Flattening occurs if all of the following conditions in the outer query and +in the subquery are satisfied:

The subquery and the +outer query do not both use aggregates.
The subquery is not +an aggregate or the outer query is not a join.
The subquery is not +the right operand of a left outer join, or the subquery is not itself a join.
The subquery is not +DISTINCT or the outer query is not a join.
The subquery is not +DISTINCT or the outer query does not use aggregates.
The subquery does not +use aggregates or the outer query is not DISTINCT.
The subquery has a FROM +clause.
The subquery does not +use LIMIT or the outer query is not a join.
The subquery does not +use LIMIT or the outer query does not use aggregates.
The subquery does not +use aggregates or the outer query does not use LIMIT.
The subquery and the +outer query do not both have ORDER BY clauses.
The subquery is not +the right term of a LEFT OUTER JOIN or the subquery has no WHERE clause.
The subquery and outer +query do not both use LIMIT
The subquery does not +use OFFSET

Nobody really expects a programmer to memorize or even understand +the above set of flattening rules. As a short-cut, perhaps it is best to remember +that a complicated subquery or view in the FROM clause of a complicated query +might defeat the flattening optimization and thus require the use of transient +tables.

One other obscure use of transient tables is when there is +an INSTEAD OF DELETE or INSTEAD OF UPDATE trigger on a view. When such triggers +exists and a DELETE or an UPDATE is executed against that view then a transient +table is created which stores copies of the rows to be deleted or updated. +Since it is unusual to have INSTEAD OF triggers in the first place this case +rarely arises.

In summary, transient tables are used to implement +the following features:

The DISTINCT keyword +or other situations where distinct results are required such as compound queries +using UNION, INTERSECT, or EXCEPT.
ORDER BY or GROUP BY +clauses that cannot be satisfied by indexes.
Subqueries on the right-hand +side of the IN operator.
Subqueries or views +in the FROM clause of a query that cannot be flattened.
DELETE or UPDATE against +a view with INSTEAD OF triggers.

The resource-limited nature of the environment, where file I/O is +expensive and memory is scarce, means you will be well served to avoid these +constructs and thus avoid the need for transient tables.