diff -r 43e37759235e -r 51a74ef9ed63 Symbian3/SDK/Source/GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Symbian3/SDK/Source/GUID-B994E6F7-228A-5433-B87F-91857C5D93D6.dita Wed Mar 31 11:11:55 2010 +0100 @@ -0,0 +1,243 @@ + + + + + +SQL Insertion +TipsThis guide provides some tips for using COMMIT statements. +
Introduction

INSERT, UPDATE and DELETE operations +are all very fast. However, COMMIT statements are very slow. You need to consider +several techniques to ensure you make the best use of COMMIT statements.

Intended audience:

This document is intended to be used by +Symbian platfomr licensees and third party application developers.

+
INSERT and +UPDATE are Fast but COMMIT is Slow

A programmer migrating to this +database engine might write a test program to see how many INSERT statements +per second it can do. They create an empty database with a single empty table. +Then they write a loop that runs a few thousand times and does a single INSERT +statement on each iteration. Upon timing this program they find that it appears +to only be doing a couple of dozen INSERTs per second.

“Everybody +I talked to says SQLite is suppose to be really fast”, the new programmer +will typically complain, “But I'm only getting 20 or 30 INSERTs per second!”

In +reality, SQLite can achieve around 50000 or more INSERTs per second on a modern +workstation, although less on a typical embedded platform. But the characteristics +of the underlying storage medium and the fact that the database engine guarantees +atomic updates to the database mean that it can only achieve a few dozen COMMIT +operations per second.

Unless you take specific action to tell SQLite +to do otherwise, it will automatically insert a COMMIT operation after every +insert. So the programmers described above are really measuring the number +of transactions per second, not the number of INSERTs. This is a very important +distinction.

Why is COMMIT so much slower than INSERT? SQLite guarantees +that changes to a database are ACID – Atomic, Consistent, Isolated, and Durable. +The Atomic and Durable parts are what take the time.

In order to be +Atomic, the database engine has to go through an elaborate protocol with the +underlying file system, which ultimately means that every modified page of +the database file must be written twice.

In order to be durable, the +COMMIT operation must not return until all content has been safely written +to nonvolatile media. At least two consecutive non-concurrent writes to flash +memory must occur in order to COMMIT.

An atomic and durable COMMIT +is a very powerful feature that can help you to build a system that is resilient, +even in the face of unplanned system crashes or power failures. But the price +of this resilience is that COMMIT is a relatively slow operation. Hence if +performance is a priority you should strive to minimize the number of COMMITs.

If +you need to do more than one INSERT or UPDATE or DELETE operation, you are +advised to put them all inside a single explicit transaction by running the +BEGIN statement prior to the first changes and executing COMMIT once all changes +have finished. In this way, all your changes occur within a single transaction +and only a single time-consuming COMMIT operation must occur.

If you +omit the explicit BEGIN...COMMIT, then SQLite automatically inserts an implicit +BEGIN...COMMIT around each of your INSERT, UPDATE, and DELETE statements, +which means you end of doing many COMMITs which will always be much slower +than doing just one.

+
Batch INSERT, +UPDATE, and DELETE Operations Using TEMP Tables

As described above, +when you have many changes to make to a database, you are advised to make +all those changes within a single explicit transaction by preceding the first +change with a BEGIN statement and concluding the changes with a COMMIT statement.

The +problem with BEGIN...COMMIT is that BEGIN acquires an exclusive lock on the +database file which is not released until the COMMIT completes. That means +that only a single connection to the database can be in the middle of a BEGIN...COMMIT +at one time. If another thread or process tries to start a BEGIN...COMMIT +while the first is busy, the second has to wait. To avoid holding up other +threads and processes, therefore, every BEGIN should be followed by a COMMIT +as quickly as possible.

But sometimes you run into a situation where +you have to make periodic INSERTs or UPDATEs to a database based on timed +or external events. For example, you may want to do an INSERT into an event +log table once every 250 milliseconds or so. You could do a separate INSERT +for each event, but that would mean doing a separate COMMIT four times per +second, which is perhaps more overhead than you desire. On the other hand, +if you did a BEGIN and accumulated several seconds worth of INSERTs you could +avoid doing a COMMIT except for every 10th second or so. The trouble +there is that other threads and processes are unable to write to the database +while the event log is holding its transaction open.

The usual method +for avoiding this dilemma is to store all of the INSERTs in a separate TEMP +table, then periodically flush the content of the TEMP table into the main +database with a single operation.

A TEMP table works just like a regular +database table except that a TEMP table is only visible to the database connection +that creates it, and the TEMP table is automatically dropped when the database +connection is closed. You create a TEMP table by inserting the “TEMP” or “TEMPORARY” +keyword in between “CREATE” and “TABLE”, like this:

+CREATE TEMP TABLE event_accumulator( + eventId INTEGER, + eventArg TEXT + ); +

Because TEMP tables are ephemeral (meaning that they do not +persist after the database connection closes) SQLite does not need to worry +about making writes to a TEMP table atomic or durable. Hence a COMMIT to a +TEMP table is very quick.

A process can do multiple INSERTs into a +TEMP table without having to enclose those INSERTs within an explicit BEGIN...COMMIT +for efficiency. Writes to a TEMP table are always efficient regardless of +whether or not they are enclosed in an explicit transaction.

So as +events arrive, they can be written into the TEMP table using isolated INSERT +statements. But because the TEMP table is ephemeral, one must take care to +periodically flush the contents of the TEMP table into the main database where +they will persist. So every 10 seconds or so (depending on the application +requirements) you can run code like this:

+BEGIN; +INSERT INTO event_log SELECT * FROM event_accumulator; +DELETE FROM event_accumulator; +COMMIT; +

These statements transfer the content of the ephemeral event_accumulator +table over to the persistent event_log table as a single atomic operation. +Since this transfer occurs relatively infrequently, minimal database overhead +is incurred.

+
Use Bound Parameters

Suppose +you have a descriptor, nameDes, and you want to insert that value into the +namelist table of a database. One way to proceed is to construct an appropriate +INSERT statement that contains the desired string value as a SQL string literal, +then run that INSERT statement. Pseudo-code for this approach follows:

+_LIT(KSql, “INSERT INTO namelist VALUES('%S')”); +sqlBuf.Format(KSql, nameDes); +sqlDatabase.Execute(sql); +

The INSERT statement is constructed by the call to Format() on +the second line of the example above. The first argument is a template for +the SQL statement. The value of the nameDes descriptor is inserted where the +%S occurs in the template. Notice that the %S is surrounded by single quotes +so that the string will be properly contained in SQL standard quotes.

This +approach works as long as the value in nameDes does not contain any single-quote +characters. If nameDes does contain one or more single-quotes, then the string +literal in the INSERT statement will not be well-formed and a syntax error +might occur. Or worse, if a hostile user is able to control the content of +nameDes, they might be able to put text in nameDes that looked something like +this:

+hi'); DELETE FROM critical_table; SELECT 'hi +

This would result in the sqlBuf variable holding

+INSERT INTO namelist VALUES('hi'); DELETE FROM critical_table; SELECT 'hi' +

Your adversary has managed to convert your single INSERT statement +into three separate SQL statements, one of which does things that you probably +do not want to happen. This is called an “SQL Injection Attack”. You want +to be very, very careful to avoid SQL injection attacks as they can seriously +compromise the security of your application.

SQLite allows you to +specify parameters in SQL statements and then substitute values for those +parameters prior to running the SQL. Parameters can take several forms, including:

+? +?NNN +:AAA +@AAA +$AAA +

In the above, NNN means any sequence of digits and AAA means +any sequence of alphanumeric characters and underscores. In this example we +will stick with the first and simplest form – the question mark. The operation +above would be rewritten as shown below. (Error checking is omitted from this +example for brevity.)

+_LIT(KSql, “INSERT INTO namelist VALUES(?)”); +RSqlStatement s; +s.PrepareL(db, KSql); +s.BindText(1, nameDes); +s.Exec(); +s.Close(); +

PrepareL() compiles the SQL statement held +in the literal KSql. This statement contains a single parameter. The value +for this parameter is initially NULL.

The BindText() sets +the value of this parameter to the content of the nameDes descriptor +and then Exec() executes the SQL statement with the bound +parameter value.

There are variations of BindXxx() to +bind other kinds of values such as integers, floating point numbers, and binary +large objects (BLOBs). The key point to observe is that none of these values +need to be quoted or escaped in any way. And there is no possibility of being +vulnerable to an SQL injection attack.

Besides reducing your vulnerability +to SQL injection attacks, the use of bound parameters also happens to be more +efficient that constructing SQL statements from scratch, especially when inserting +large strings or BLOBs.

+
Cache and Reuse +Prepared Statements

Using RSqlStatement is a +two-step process. Firstly the statement must be compiled using Prepare(). +Then the resulting prepared statement is run using either Exec() or Next().

The +relative amount of time spent doing each of these steps depends on the nature +of the SQL statement. SELECT statements that return a large result set or +UPDATE or DELETE statements that touch many rows of a table will normally +spend most of their time in the Virtual Machine module and relatively little +time being compiled. But simple INSERT statements on the other hand, can take +twice as long to compile as they take to run in the virtual machine.

A +simple way to reduce the CPU load of an application that uses SQLite is to +cache the prepared statements and reuse them. Of course, one rarely needs +to run the exact same SQL statement more than once. But if a statement contains +one or more bound parameters, you can bind new values to the parameters prior +to each run and thus accomplish something different with each invocation.

This +technique is especially effective when doing multiple INSERTs into the same +table. Instead of preparing a separate insert for each row, create a single +generic insert statement like this:

+INSERT INTO important_table VALUES(?,?,?,?,?) +

Then for each row to be inserted, use one or more of the BindXxx() interfaces +to bind values to the parameters in the insert statement, and call Exec() to +do the insert, then call Reset() to rewind the program counter +of the internal bytecode in preparation for the next run.

For INSERT +statements, reusing a single prepared statement in this way will typically +make your code run two or three times faster.

You can manually manage +a cache of prepared statements, keeping around only those prepared statements +that you know will be needed again and closing prepared statements using Close() when +you are done with them or when they are about to fall out of scope. But depending +on the application, it can be more convenient to create a wrapper class around +the SQL interface that manages the cache automatically.

A wrapper +class can keep around the 5 or 10 most recently used prepared statements and +reuse those statements if the same SQL is requested. Handling the prepared +statement cache automatically in a wrapper has the advantage that it frees +you to focus more mental energy on writing a great application and less effort +on operating the database interface. It also makes the programming task less +error prone since with an automatic class, there is no chance of accidentally +omitting a call to Close() and leaking prepared statements.

The +downside is that a cache wrapper will not have the foresight of a human programmer +and will often cache prepared statements that are no longer needed, thus using +excess memory, or sometimes discard prepared statements just before they are +needed again.

This is a classic ease-of-programming versus performance +trade-off. For applications that are intended for a high-power workstation, +it can be best to go with a wrapper class that handles the cache automatically. +But when designing an application for a resource constrained devices where +performance is critical and engineering design talent is plentiful, it may +be better to manage the cache manually.

Regardless of whether or not +the prepared statement cache is managed manually or automatically using a +wrapper class, reusing prepared statements is always a good thing, and can +in some cases double or triple the performance of the application.

+
+SQL Overview + +SQL Server +Guide +SQLite + +SQL DB Overview + +Avoid Transient +Tables +Prevent Datafile +Corruption +SQL Index +Tips +SQL Schema +Tips +SQL Expressions + +SQL Statement +Tips +SQL Joins + + ANALYZE +Command + SQL WHERE +CLause Tips +
\ No newline at end of file