ANALYZE Command

ANALYZE CommandThis guide gives details of how to use the ANALYZE command to help -the optimizer. -

Introduction

The ANALYZE command is used to provide -the query optimizer with statistical information about the distribution of -values in a database so that it can make better decisions about which indexes -to use.

Intended -audience:

This document is intended to be used by Symbian platfomr -licensees and third party application developers.

Use the ANALYZE -Command to Help the Optimizer

A big part of the job of the SQL -compiler is to choose which of many implementation options should be used -for a particular SQL statement.

Often a table will have two or more -indexes and the optimizer will need to choose which one of those indexes to -use. Or the optimizer will have to decide between using an index to implement -the WHERE clause or the ORDER BY clause. For a join, the optimizer has to -pick an appropriate order for the table in the join. And so forth.

By -default, the only information the optimizer has to go on when trying to choose -between two or more indexes is the database schema. In many cases the schema -alone is not sufficient to make a wise choice. Suppose, for example, you have -a table with two indexes:

-CREATE TABLE demo318(a,b,c,data); -CREATE INDEX idx318a ON demo381(a); -CREATE INDEX idx318b ON demo381(b); -

And further suppose SQLite is trying to compile a statement -such as the following:

-SELECT data FROM demo318 WHERE a=0 AND b=11; -

The optimizer has to decide whether it should use idx318a -with the “a=5” term of the WHERE clause or if it should use idx318b with the -“b=11” term.

Without additional information, the two approaches are -equivalent and so the choice is arbitrary. But the choice of indexes might -have a big impact on performance.

Suppose that the demo318 table contains -one million rows of which 55% have a=0, 40% have a=1, and the rest have a=2 -or a=3. If the idx381a index is chosen, it will only narrow down the set of -candidate rows by half. In this situation you are unlikely to notice any performance -gain at all. In fact, in this extreme case, using the index is likely to make -the query slower!

On the other hand, let us assume that there are -never more than 2 or 3 rows with the same value for column b. In this case -using index idx381b allows us to reduce the set of candidate rows from one -million to 2 or 3 which will yield a huge speed improvement.

So clearly, -in this example we hope that the optimizer chooses index idx318b. But without -knowledge of the table contents and the statistical distributions of values -for the “a” and “b” columns, the optimizer has no way of knowing which of -the idx318a and idx318b indexes will work better.

The ANALYZE command -is used to provide the query optimizer with statistical information about -the distribution of values in a database so that it can make better decisions -about which indexes to use.

When you run the ANALYZE command, SQLite -creates a special table named SQLITE_STAT1. It then reads the entire database, -gathers statistics and puts those statistics in the SQLITE_STAT1 table so -that the optimizer can find them. ANALYZE is a relatively slow operation, -since it has to read every single row of every single table in your database.

The -statistical information generated by ANALYZE is not kept up-to-date by subsequent -changes to the database. But many databases have a relatively constant statistical -profile so running ANALYZE once and using the same statistical profile forever -thereafter is often sufficient.

For embedded applications, you can -often get by without ever having to run ANALYZE on a real database on the -embedded device. Instead, generate a template database that contains typical -data for your application and then run ANALYZE on that database on a developer -workstation. Then just copy the content of the resulting SQLITE_STAT1 table -over to the embedded device.

The SQLITE_STAT1 table is special in -that you cannot CREATE or DROP it, but you can still SELECT, DELETE, and UPDATE -the SQLITE_STAT1 table. So to create the SQLITE_STAT1 table on the embedded -device, just run ANALYZE when the database is empty. That will take virtually -no time and will leave you with an empty SQLITE_STAT1 table. Then copy the -content of the SQLITE_STAT1 table that was created on the developer workstation -into the database on the embedded device and the optimizer will begin using -it just as if it had been created locally.

There is one entry in the -SQLITE_STAT1 table for each index in your schema. Each entry contains the -name of the table being indexed, the name of the index itself, and a string -composed of two or more integers separated by spaces. These integers are the -statistics that the ANALYZE command collects for each index.

The first -integer is the total number of entries in the table. The second integer is -the average number of rows in a result from a query that had an equality constraint -on the first term of the index. The third integer (if there is one) is the -average number of rows that would result if the index were used with equality -constraints on the first two terms of the index. And so forth for as many -terms as there are in the index.

The smaller the second and subsequent -integers are, the more selective an index is. And the more selective the index -is, the fewer rows have to be processed when using that index. So the optimizer -tries to use indexes with small second and subsequent numbers.

The -details are complex and do not really add any understanding to what is going -on. The key point is that the second and subsequent integers in the SQLITE_STAT1 -table are a measure of the selectivity of an index and that smaller numbers -are better.

The statistics gathered by the ANALYZE command reflect -the state of the database at a single point in time. While making the final -performance tweaks on an application, some developers may want to adjust these -statistics manually to better reflect the kinds of data they expect to encounter -in the field. This is easy to do. The SQLITE_STAT1 table is readable and writable -using ordinary SQL statements. You are free to adjust or modify the statistics -in anyway that seems to performance benefit.

-SQL Overview - -SQL Server -Guide -SQL DB Overview - -Avoid Transient -Tables -Prevent Datafile -Corruption -SQL Index -Tips -SQL Insertion -Tips -SQL Schema -Tips -SQL Expressions - -SQL Statement -Tips -SQL Joins - - SQL WHERE -CLause Tips + + + + + +ANALYZE CommandThis guide gives details of how to use the ANALYZE command to help +the optimizer. +

Introduction

The ANALYZE command is used to provide +the query optimizer with statistical information about the distribution of +values in a database so that it can make better decisions about which indexes +to use.

Intended +audience:

This document is intended to be used by Symbian platfomr +licensees and third party application developers.

Use the ANALYZE +Command to Help the Optimizer

A big part of the job of the SQL +compiler is to choose which of many implementation options should be used +for a particular SQL statement.

Often a table will have two or more +indexes and the optimizer will need to choose which one of those indexes to +use. Or the optimizer will have to decide between using an index to implement +the WHERE clause or the ORDER BY clause. For a join, the optimizer has to +pick an appropriate order for the table in the join. And so forth.

By +default, the only information the optimizer has to go on when trying to choose +between two or more indexes is the database schema. In many cases the schema +alone is not sufficient to make a wise choice. Suppose, for example, you have +a table with two indexes:

+CREATE TABLE demo318(a,b,c,data); +CREATE INDEX idx318a ON demo381(a); +CREATE INDEX idx318b ON demo381(b); +

And further suppose SQLite is trying to compile a statement +such as the following:

+SELECT data FROM demo318 WHERE a=0 AND b=11; +

The optimizer has to decide whether it should use idx318a +with the “a=5” term of the WHERE clause or if it should use idx318b with the +“b=11” term.

Without additional information, the two approaches are +equivalent and so the choice is arbitrary. But the choice of indexes might +have a big impact on performance.

Suppose that the demo318 table contains +one million rows of which 55% have a=0, 40% have a=1, and the rest have a=2 +or a=3. If the idx381a index is chosen, it will only narrow down the set of +candidate rows by half. In this situation you are unlikely to notice any performance +gain at all. In fact, in this extreme case, using the index is likely to make +the query slower!

On the other hand, let us assume that there are +never more than 2 or 3 rows with the same value for column b. In this case +using index idx381b allows us to reduce the set of candidate rows from one +million to 2 or 3 which will yield a huge speed improvement.

So clearly, +in this example we hope that the optimizer chooses index idx318b. But without +knowledge of the table contents and the statistical distributions of values +for the “a” and “b” columns, the optimizer has no way of knowing which of +the idx318a and idx318b indexes will work better.

The ANALYZE command +is used to provide the query optimizer with statistical information about +the distribution of values in a database so that it can make better decisions +about which indexes to use.

When you run the ANALYZE command, SQLite +creates a special table named SQLITE_STAT1. It then reads the entire database, +gathers statistics and puts those statistics in the SQLITE_STAT1 table so +that the optimizer can find them. ANALYZE is a relatively slow operation, +since it has to read every single row of every single table in your database.

The +statistical information generated by ANALYZE is not kept up-to-date by subsequent +changes to the database. But many databases have a relatively constant statistical +profile so running ANALYZE once and using the same statistical profile forever +thereafter is often sufficient.

For embedded applications, you can +often get by without ever having to run ANALYZE on a real database on the +embedded device. Instead, generate a template database that contains typical +data for your application and then run ANALYZE on that database on a developer +workstation. Then just copy the content of the resulting SQLITE_STAT1 table +over to the embedded device.

The SQLITE_STAT1 table is special in +that you cannot CREATE or DROP it, but you can still SELECT, DELETE, and UPDATE +the SQLITE_STAT1 table. So to create the SQLITE_STAT1 table on the embedded +device, just run ANALYZE when the database is empty. That will take virtually +no time and will leave you with an empty SQLITE_STAT1 table. Then copy the +content of the SQLITE_STAT1 table that was created on the developer workstation +into the database on the embedded device and the optimizer will begin using +it just as if it had been created locally.

There is one entry in the +SQLITE_STAT1 table for each index in your schema. Each entry contains the +name of the table being indexed, the name of the index itself, and a string +composed of two or more integers separated by spaces. These integers are the +statistics that the ANALYZE command collects for each index.

The first +integer is the total number of entries in the table. The second integer is +the average number of rows in a result from a query that had an equality constraint +on the first term of the index. The third integer (if there is one) is the +average number of rows that would result if the index were used with equality +constraints on the first two terms of the index. And so forth for as many +terms as there are in the index.

The smaller the second and subsequent +integers are, the more selective an index is. And the more selective the index +is, the fewer rows have to be processed when using that index. So the optimizer +tries to use indexes with small second and subsequent numbers.

The +details are complex and do not really add any understanding to what is going +on. The key point is that the second and subsequent integers in the SQLITE_STAT1 +table are a measure of the selectivity of an index and that smaller numbers +are better.

The statistics gathered by the ANALYZE command reflect +the state of the database at a single point in time. While making the final +performance tweaks on an application, some developers may want to adjust these +statistics manually to better reflect the kinds of data they expect to encounter +in the field. This is easy to do. The SQLITE_STAT1 table is readable and writable +using ordinary SQL statements. You are free to adjust or modify the statistics +in anyway that seems to performance benefit.

+SQL Overview + +SQL Server +Guide +SQL DB Overview + +Avoid Transient +Tables +Prevent Datafile +Corruption +SQL Index +Tips +SQL Insertion +Tips +SQL Schema +Tips +SQL Expressions + +SQL Statement +Tips +SQL Joins + + SQL WHERE +CLause Tips

\ No newline at end of file