Symbian3/PDK/Source/GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita
author Graeme Price <GRAEME.PRICE@NOKIA.COM>
Fri, 15 Oct 2010 14:32:18 +0100
changeset 15 307f4279f433
parent 14 578be2adaf3e
permissions -rw-r--r--
Initial contribution of the Adaptation Documentation.

<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2007-2010 Nokia Corporation and/or its subsidiary(-ies) All rights reserved. -->
<!-- This component and the accompanying materials are made available under the terms of the License 
"Eclipse Public License v1.0" which accompanies this distribution, 
and is available at the URL "http://www.eclipse.org/legal/epl-v10.html". -->
<!-- Initial Contributors:
    Nokia Corporation - initial contribution.
Contributors: 
-->
<!DOCTYPE concept
  PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="GUID-B994E6F7-228A-5433-B87F-91857C5D93D6" xml:lang="en"><title>SQL Insertion
Tips</title><shortdesc>This guide provides some tips for using COMMIT statements.</shortdesc><prolog><metadata><keywords/></metadata></prolog><conbody>
<section id="GUID-A3DC5F3F-92C6-4847-A9C5-A227F777D7D3"><title>Introduction</title> <p>INSERT, UPDATE and DELETE operations
are all very fast. However, COMMIT statements are very slow. You need to consider
several techniques to ensure you make the best use of COMMIT statements. </p> <p><b>Intended audience:</b> </p> <p>This document is intended to be used by
Symbian platfomr licensees and third party application developers. </p> </section>
<section id="GUID-C12416F0-87D5-59AA-A08F-A1741A3FD3ED"><title>INSERT and
UPDATE are Fast but COMMIT is Slow</title> <p>A programmer migrating to this
database engine might write a test program to see how many INSERT statements
per second it can do. They create an empty database with a single empty table.
Then they write a loop that runs a few thousand times and does a single INSERT
statement on each iteration. Upon timing this program they find that it appears
to only be doing a couple of dozen INSERTs per second. </p> <p>“Everybody
I talked to says SQLite is suppose to be really fast”, the new programmer
will typically complain, “But I'm only getting 20 or 30 INSERTs per second!” </p> <p>In
reality, SQLite can achieve around 50000 or more INSERTs per second on a modern
workstation, although less on a typical embedded platform. But the characteristics
of the underlying storage medium and the fact that the database engine guarantees
atomic updates to the database mean that it can only achieve a few dozen COMMIT
operations per second. </p> <p>Unless you take specific action to tell SQLite
to do otherwise, it will automatically insert a COMMIT operation after every
insert. So the programmers described above are really measuring the number
of transactions per second, not the number of INSERTs. This is a very important
distinction. </p> <p>Why is COMMIT so much slower than INSERT? SQLite guarantees
that changes to a database are ACID – Atomic, Consistent, Isolated, and Durable.
The Atomic and Durable parts are what take the time. </p> <p>In order to be
Atomic, the database engine has to go through an elaborate protocol with the
underlying file system, which ultimately means that every modified page of
the database file must be written twice. </p> <p>In order to be durable, the
COMMIT operation must not return until all content has been safely written
to nonvolatile media. At least two consecutive non-concurrent writes to flash
memory must occur in order to COMMIT. </p> <p>An atomic and durable COMMIT
is a very powerful feature that can help you to build a system that is resilient,
even in the face of unplanned system crashes or power failures. But the price
of this resilience is that COMMIT is a relatively slow operation. Hence if
performance is a priority you should strive to minimize the number of COMMITs. </p> <p>If
you need to do more than one INSERT or UPDATE or DELETE operation, you are
advised to put them all inside a single explicit transaction by running the
BEGIN statement prior to the first changes and executing COMMIT once all changes
have finished. In this way, all your changes occur within a single transaction
and only a single time-consuming COMMIT operation must occur. </p> <p>If you
omit the explicit BEGIN...COMMIT, then SQLite automatically inserts an implicit
BEGIN...COMMIT around each of your INSERT, UPDATE, and DELETE statements,
which means you end of doing many COMMITs which will always be much slower
than doing just one. </p> </section>
<section id="GUID-0E57010E-E3AB-573E-B550-72DE36725D1B"><title>Batch INSERT,
UPDATE, and DELETE Operations Using TEMP Tables</title> <p>As described above,
when you have many changes to make to a database, you are advised to make
all those changes within a single explicit transaction by preceding the first
change with a BEGIN statement and concluding the changes with a COMMIT statement. </p> <p>The
problem with BEGIN...COMMIT is that BEGIN acquires an exclusive lock on the
database file which is not released until the COMMIT completes. That means
that only a single connection to the database can be in the middle of a BEGIN...COMMIT
at one time. If another thread or process tries to start a BEGIN...COMMIT
while the first is busy, the second has to wait. To avoid holding up other
threads and processes, therefore, every BEGIN should be followed by a COMMIT
as quickly as possible. </p> <p>But sometimes you run into a situation where
you have to make periodic INSERTs or UPDATEs to a database based on timed
or external events. For example, you may want to do an INSERT into an event
log table once every 250 milliseconds or so. You could do a separate INSERT
for each event, but that would mean doing a separate COMMIT four times per
second, which is perhaps more overhead than you desire. On the other hand,
if you did a BEGIN and accumulated several seconds worth of INSERTs you could
avoid doing a COMMIT except for every 10<sup>th</sup> second or so. The trouble
there is that other threads and processes are unable to write to the database
while the event log is holding its transaction open. </p> <p>The usual method
for avoiding this dilemma is to store all of the INSERTs in a separate TEMP
table, then periodically flush the content of the TEMP table into the main
database with a single operation. </p> <p>A TEMP table works just like a regular
database table except that a TEMP table is only visible to the database connection
that creates it, and the TEMP table is automatically dropped when the database
connection is closed. You create a TEMP table by inserting the “TEMP” or “TEMPORARY”
keyword in between “CREATE” and “TABLE”, like this: </p> <codeblock id="GUID-90068863-645B-551F-8A85-A12ED647F1AA" xml:space="preserve">
CREATE TEMP TABLE event_accumulator(
        eventId INTEGER,
        eventArg TEXT
    );
</codeblock> <p>Because TEMP tables are ephemeral (meaning that they do not
persist after the database connection closes) SQLite does not need to worry
about making writes to a TEMP table atomic or durable. Hence a COMMIT to a
TEMP table is very quick. </p> <p>A process can do multiple INSERTs into a
TEMP table without having to enclose those INSERTs within an explicit BEGIN...COMMIT
for efficiency. Writes to a TEMP table are always efficient regardless of
whether or not they are enclosed in an explicit transaction. </p> <p>So as
events arrive, they can be written into the TEMP table using isolated INSERT
statements. But because the TEMP table is ephemeral, one must take care to
periodically flush the contents of the TEMP table into the main database where
they will persist. So every 10 seconds or so (depending on the application
requirements) you can run code like this: </p> <codeblock id="GUID-F51D81FE-5F85-5C22-96AC-4CC69AF00BC4" xml:space="preserve">
BEGIN;
INSERT INTO event_log SELECT * FROM event_accumulator;
DELETE FROM event_accumulator;
COMMIT;
</codeblock> <p>These statements transfer the content of the ephemeral event_accumulator
table over to the persistent event_log table as a single atomic operation.
Since this transfer occurs relatively infrequently, minimal database overhead
is incurred. </p> </section>
<section id="GUID-83C876AB-7C3F-5BFC-8F02-6503A0B3D8D6"><title>Use Bound Parameters</title> <p>Suppose
you have a descriptor, nameDes, and you want to insert that value into the
namelist table of a database. One way to proceed is to construct an appropriate
INSERT statement that contains the desired string value as a SQL string literal,
then run that INSERT statement. Pseudo-code for this approach follows: </p> <codeblock id="GUID-B96D98B4-ED28-566D-A0AA-073E4BEC6954" xml:space="preserve">
_LIT(KSql, “INSERT INTO namelist VALUES('%S')”);
sqlBuf.Format(KSql, nameDes);
sqlDatabase.Execute(sql);
</codeblock> <p>The INSERT statement is constructed by the call to <codeph>Format()</codeph> on
the second line of the example above. The first argument is a template for
the SQL statement. The value of the nameDes descriptor is inserted where the
%S occurs in the template. Notice that the %S is surrounded by single quotes
so that the string will be properly contained in SQL standard quotes. </p> <p>This
approach works as long as the value in nameDes does not contain any single-quote
characters. If nameDes does contain one or more single-quotes, then the string
literal in the INSERT statement will not be well-formed and a syntax error
might occur. Or worse, if a hostile user is able to control the content of
nameDes, they might be able to put text in nameDes that looked something like
this: </p> <codeblock id="GUID-12627B2F-D98A-5170-9E6F-6C3EFA33EA87" xml:space="preserve">
hi'); DELETE FROM critical_table; SELECT 'hi
</codeblock> <p>This would result in the sqlBuf variable holding </p> <codeblock id="GUID-FD6A2B42-8260-535B-80D3-B99C726B02FC" xml:space="preserve">
INSERT INTO namelist VALUES('hi'); DELETE FROM critical_table; SELECT 'hi'
</codeblock> <p>Your adversary has managed to convert your single INSERT statement
into three separate SQL statements, one of which does things that you probably
do not want to happen. This is called an “SQL Injection Attack”. You want
to be very, very careful to avoid SQL injection attacks as they can seriously
compromise the security of your application. </p> <p>SQLite allows you to
specify parameters in SQL statements and then substitute values for those
parameters prior to running the SQL. Parameters can take several forms, including: </p> <codeblock id="GUID-8255AFE5-CFA2-5DAD-A168-2A7294021AC6" xml:space="preserve">
?
?NNN
:AAA
@AAA
$AAA
</codeblock> <p>In the above, NNN means any sequence of digits and AAA means
any sequence of alphanumeric characters and underscores. In this example we
will stick with the first and simplest form – the question mark. The operation
above would be rewritten as shown below. (Error checking is omitted from this
example for brevity.) </p> <codeblock id="GUID-292857E5-9219-5AD5-B05A-9C9A47055741" xml:space="preserve">
_LIT(KSql, “INSERT INTO namelist VALUES(?)”);
RSqlStatement s;
s.PrepareL(db, KSql);
s.BindText(1, nameDes);
s.Exec();
s.Close();
</codeblock> <p> <codeph> PrepareL()</codeph> compiles the SQL statement held
in the literal KSql. This statement contains a single parameter. The value
for this parameter is initially NULL. </p> <p>The <codeph>BindText()</codeph> sets
the value of this parameter to the content of the <codeph>nameDes</codeph> descriptor
and then <codeph>Exec()</codeph> executes the SQL statement with the bound
parameter value. </p> <p>There are variations of <codeph>BindXxx()</codeph> to
bind other kinds of values such as integers, floating point numbers, and binary
large objects (BLOBs). The key point to observe is that none of these values
need to be quoted or escaped in any way. And there is no possibility of being
vulnerable to an SQL injection attack. </p> <p>Besides reducing your vulnerability
to SQL injection attacks, the use of bound parameters also happens to be more
efficient that constructing SQL statements from scratch, especially when inserting
large strings or BLOBs. </p> </section>
<section id="GUID-3EC5A35D-67E0-5C85-9DA3-CD6AA40BB6A7"><title>Cache and Reuse
Prepared Statements</title> <p>Using <xref href="GUID-0176BF07-DF94-3259-8F90-DE030E35CE9A.dita"><apiname>RSqlStatement</apiname></xref> is a
two-step process. Firstly the statement must be compiled using <codeph>Prepare()</codeph>.
Then the resulting prepared statement is run using either <codeph>Exec()</codeph> or <codeph>Next()</codeph>. </p> <p>The
relative amount of time spent doing each of these steps depends on the nature
of the SQL statement. SELECT statements that return a large result set or
UPDATE or DELETE statements that touch many rows of a table will normally
spend most of their time in the Virtual Machine module and relatively little
time being compiled. But simple INSERT statements on the other hand, can take
twice as long to compile as they take to run in the virtual machine. </p> <p>A
simple way to reduce the CPU load of an application that uses SQLite is to
cache the prepared statements and reuse them. Of course, one rarely needs
to run the exact same SQL statement more than once. But if a statement contains
one or more bound parameters, you can bind new values to the parameters prior
to each run and thus accomplish something different with each invocation. </p> <p>This
technique is especially effective when doing multiple INSERTs into the same
table. Instead of preparing a separate insert for each row, create a single
generic insert statement like this: </p> <codeblock id="GUID-C66EEE5F-27C3-59A8-9DD4-2E32A49367AF" xml:space="preserve">
INSERT INTO important_table VALUES(?,?,?,?,?)
</codeblock> <p>Then for each row to be inserted, use one or more of the <codeph>BindXxx()</codeph> interfaces
to bind values to the parameters in the insert statement, and call <codeph>Exec()</codeph> to
do the insert, then call <codeph>Reset()</codeph> to rewind the program counter
of the internal bytecode in preparation for the next run. </p> <p>For INSERT
statements, reusing a single prepared statement in this way will typically
make your code run two or three times faster. </p> <p>You can manually manage
a cache of prepared statements, keeping around only those prepared statements
that you know will be needed again and closing prepared statements using <codeph>Close()</codeph> when
you are done with them or when they are about to fall out of scope. But depending
on the application, it can be more convenient to create a wrapper class around
the SQL interface that manages the cache automatically. </p> <p>A wrapper
class can keep around the 5 or 10 most recently used prepared statements and
reuse those statements if the same SQL is requested. Handling the prepared
statement cache automatically in a wrapper has the advantage that it frees
you to focus more mental energy on writing a great application and less effort
on operating the database interface. It also makes the programming task less
error prone since with an automatic class, there is no chance of accidentally
omitting a call to <codeph>Close()</codeph> and leaking prepared statements. </p> <p>The
downside is that a cache wrapper will not have the foresight of a human programmer
and will often cache prepared statements that are no longer needed, thus using
excess memory, or sometimes discard prepared statements just before they are
needed again. </p> <p>This is a classic ease-of-programming versus performance
trade-off. For applications that are intended for a high-power workstation,
it can be best to go with a wrapper class that handles the cache automatically.
But when designing an application for a resource constrained devices where
performance is critical and engineering design talent is plentiful, it may
be better to manage the cache manually. </p> <p>Regardless of whether or not
the prepared statement cache is managed manually or automatically using a
wrapper class, reusing prepared statements is always a good thing, and can
in some cases double or triple the performance of the application. </p> </section>
</conbody><related-links>
<link href="GUID-22844C28-AB5B-5A6F-8863-7269464684B4.dita"><linktext>SQL Overview</linktext>
</link>
<link href="GUID-78773BCA-ADF6-53E6-AC80-5CB2AE1F8BCC.dita"><linktext>SQL Server
Guide</linktext></link>
<link href="GUID-E51836E1-D33E-506C-B75B-19B8E3CC313A.dita"><linktext>SQLite</linktext>
</link>
<link href="GUID-1F12E3F5-45B2-55EC-B021-00338277C608.dita"><linktext>SQL DB Overview</linktext>
</link>
<link href="GUID-43CA02E7-0101-5824-B91B-E15EE20C829A.dita"><linktext>Avoid Transient
Tables</linktext></link>
<link href="GUID-49A3419F-D20A-5C5D-B2FF-51724EF37704.dita"><linktext>Prevent Datafile
Corruption</linktext></link>
<link href="GUID-C2FAEBB2-4A1A-5BB0-9670-4801525CBC6A.dita"><linktext>SQL Index
Tips</linktext></link>
<link href="GUID-4FC23DB7-4758-5DA4-81FF-0DAB169E2757.dita"><linktext>SQL Schema
Tips</linktext></link>
<link href="GUID-2A2920E0-5D40-5358-BC0C-8572CEFE078C.dita"><linktext>SQL Expressions</linktext>
</link>
<link href="GUID-126FCCCC-0E7D-59AE-959A-2F94A7319C4B.dita"><linktext>SQL Statement
Tips</linktext></link>
<link href="GUID-ACCCB148-DAF9-59EC-B585-8EF632B9BF04.dita"><linktext>SQL Joins</linktext>
</link>
<link href="GUID-B7E978C1-45CA-554C-8028-D901B97BA2E0.dita"><linktext> ANALYZE
Command</linktext></link>
<link href="GUID-AF5A75D7-0687-546C-87B2-0B7DF7D33217.dita"><linktext> SQL WHERE
CLause Tips</linktext></link>
</related-links></concept>