SQL Insertion -Tips

SQL Insertion -TipsThis guide provides some tips for using COMMIT statements. -

Introduction

INSERT, UPDATE and DELETE operations -are all very fast. However, COMMIT statements are very slow. You need to consider -several techniques to ensure you make the best use of COMMIT statements.

Intended audience:

This document is intended to be used by -Symbian platfomr licensees and third party application developers.

INSERT and -UPDATE are Fast but COMMIT is Slow

A programmer migrating to this -database engine might write a test program to see how many INSERT statements -per second it can do. They create an empty database with a single empty table. -Then they write a loop that runs a few thousand times and does a single INSERT -statement on each iteration. Upon timing this program they find that it appears -to only be doing a couple of dozen INSERTs per second.

“Everybody -I talked to says SQLite is suppose to be really fast”, the new programmer -will typically complain, “But I'm only getting 20 or 30 INSERTs per second!”

In -reality, SQLite can achieve around 50000 or more INSERTs per second on a modern -workstation, although less on a typical embedded platform. But the characteristics -of the underlying storage medium and the fact that the database engine guarantees -atomic updates to the database mean that it can only achieve a few dozen COMMIT -operations per second.

Unless you take specific action to tell SQLite -to do otherwise, it will automatically insert a COMMIT operation after every -insert. So the programmers described above are really measuring the number -of transactions per second, not the number of INSERTs. This is a very important -distinction.

Why is COMMIT so much slower than INSERT? SQLite guarantees -that changes to a database are ACID – Atomic, Consistent, Isolated, and Durable. -The Atomic and Durable parts are what take the time.

In order to be -Atomic, the database engine has to go through an elaborate protocol with the -underlying file system, which ultimately means that every modified page of -the database file must be written twice.

In order to be durable, the -COMMIT operation must not return until all content has been safely written -to nonvolatile media. At least two consecutive non-concurrent writes to flash -memory must occur in order to COMMIT.

An atomic and durable COMMIT -is a very powerful feature that can help you to build a system that is resilient, -even in the face of unplanned system crashes or power failures. But the price -of this resilience is that COMMIT is a relatively slow operation. Hence if -performance is a priority you should strive to minimize the number of COMMITs.

If -you need to do more than one INSERT or UPDATE or DELETE operation, you are -advised to put them all inside a single explicit transaction by running the -BEGIN statement prior to the first changes and executing COMMIT once all changes -have finished. In this way, all your changes occur within a single transaction -and only a single time-consuming COMMIT operation must occur.

If you -omit the explicit BEGIN...COMMIT, then SQLite automatically inserts an implicit -BEGIN...COMMIT around each of your INSERT, UPDATE, and DELETE statements, -which means you end of doing many COMMITs which will always be much slower -than doing just one.

Batch INSERT, -UPDATE, and DELETE Operations Using TEMP Tables

As described above, -when you have many changes to make to a database, you are advised to make -all those changes within a single explicit transaction by preceding the first -change with a BEGIN statement and concluding the changes with a COMMIT statement.

The -problem with BEGIN...COMMIT is that BEGIN acquires an exclusive lock on the -database file which is not released until the COMMIT completes. That means -that only a single connection to the database can be in the middle of a BEGIN...COMMIT -at one time. If another thread or process tries to start a BEGIN...COMMIT -while the first is busy, the second has to wait. To avoid holding up other -threads and processes, therefore, every BEGIN should be followed by a COMMIT -as quickly as possible.

But sometimes you run into a situation where -you have to make periodic INSERTs or UPDATEs to a database based on timed -or external events. For example, you may want to do an INSERT into an event -log table once every 250 milliseconds or so. You could do a separate INSERT -for each event, but that would mean doing a separate COMMIT four times per -second, which is perhaps more overhead than you desire. On the other hand, -if you did a BEGIN and accumulated several seconds worth of INSERTs you could -avoid doing a COMMIT except for every 10^th second or so. The trouble -there is that other threads and processes are unable to write to the database -while the event log is holding its transaction open.

The usual method -for avoiding this dilemma is to store all of the INSERTs in a separate TEMP -table, then periodically flush the content of the TEMP table into the main -database with a single operation.

A TEMP table works just like a regular -database table except that a TEMP table is only visible to the database connection -that creates it, and the TEMP table is automatically dropped when the database -connection is closed. You create a TEMP table by inserting the “TEMP” or “TEMPORARY” -keyword in between “CREATE” and “TABLE”, like this:

-CREATE TEMP TABLE event_accumulator( - eventId INTEGER, - eventArg TEXT - ); -

Because TEMP tables are ephemeral (meaning that they do not -persist after the database connection closes) SQLite does not need to worry -about making writes to a TEMP table atomic or durable. Hence a COMMIT to a -TEMP table is very quick.

A process can do multiple INSERTs into a -TEMP table without having to enclose those INSERTs within an explicit BEGIN...COMMIT -for efficiency. Writes to a TEMP table are always efficient regardless of -whether or not they are enclosed in an explicit transaction.

So as -events arrive, they can be written into the TEMP table using isolated INSERT -statements. But because the TEMP table is ephemeral, one must take care to -periodically flush the contents of the TEMP table into the main database where -they will persist. So every 10 seconds or so (depending on the application -requirements) you can run code like this:

-BEGIN; -INSERT INTO event_log SELECT * FROM event_accumulator; -DELETE FROM event_accumulator; -COMMIT; -

These statements transfer the content of the ephemeral event_accumulator -table over to the persistent event_log table as a single atomic operation. -Since this transfer occurs relatively infrequently, minimal database overhead -is incurred.

Use Bound Parameters

Suppose -you have a descriptor, nameDes, and you want to insert that value into the -namelist table of a database. One way to proceed is to construct an appropriate -INSERT statement that contains the desired string value as a SQL string literal, -then run that INSERT statement. Pseudo-code for this approach follows:

-_LIT(KSql, “INSERT INTO namelist VALUES('%S')”); -sqlBuf.Format(KSql, nameDes); -sqlDatabase.Execute(sql); -

The INSERT statement is constructed by the call to Format() on -the second line of the example above. The first argument is a template for -the SQL statement. The value of the nameDes descriptor is inserted where the -%S occurs in the template. Notice that the %S is surrounded by single quotes -so that the string will be properly contained in SQL standard quotes.

This -approach works as long as the value in nameDes does not contain any single-quote -characters. If nameDes does contain one or more single-quotes, then the string -literal in the INSERT statement will not be well-formed and a syntax error -might occur. Or worse, if a hostile user is able to control the content of -nameDes, they might be able to put text in nameDes that looked something like -this:

-hi'); DELETE FROM critical_table; SELECT 'hi -

This would result in the sqlBuf variable holding

-INSERT INTO namelist VALUES('hi'); DELETE FROM critical_table; SELECT 'hi' -

Your adversary has managed to convert your single INSERT statement -into three separate SQL statements, one of which does things that you probably -do not want to happen. This is called an “SQL Injection Attack”. You want -to be very, very careful to avoid SQL injection attacks as they can seriously -compromise the security of your application.

SQLite allows you to -specify parameters in SQL statements and then substitute values for those -parameters prior to running the SQL. Parameters can take several forms, including:

-? -?NNN -:AAA -@AAA -$AAA -

In the above, NNN means any sequence of digits and AAA means -any sequence of alphanumeric characters and underscores. In this example we -will stick with the first and simplest form – the question mark. The operation -above would be rewritten as shown below. (Error checking is omitted from this -example for brevity.)

-_LIT(KSql, “INSERT INTO namelist VALUES(?)”); -RSqlStatement s; -s.PrepareL(db, KSql); -s.BindText(1, nameDes); -s.Exec(); -s.Close(); -

PrepareL() compiles the SQL statement held -in the literal KSql. This statement contains a single parameter. The value -for this parameter is initially NULL.

The BindText() sets -the value of this parameter to the content of the nameDes descriptor -and then Exec() executes the SQL statement with the bound -parameter value.

There are variations of BindXxx() to -bind other kinds of values such as integers, floating point numbers, and binary -large objects (BLOBs). The key point to observe is that none of these values -need to be quoted or escaped in any way. And there is no possibility of being -vulnerable to an SQL injection attack.

Besides reducing your vulnerability -to SQL injection attacks, the use of bound parameters also happens to be more -efficient that constructing SQL statements from scratch, especially when inserting -large strings or BLOBs.

Cache and Reuse -Prepared Statements

Using RSqlStatement is a -two-step process. Firstly the statement must be compiled using Prepare(). -Then the resulting prepared statement is run using either Exec() or Next().

The -relative amount of time spent doing each of these steps depends on the nature -of the SQL statement. SELECT statements that return a large result set or -UPDATE or DELETE statements that touch many rows of a table will normally -spend most of their time in the Virtual Machine module and relatively little -time being compiled. But simple INSERT statements on the other hand, can take -twice as long to compile as they take to run in the virtual machine.

A -simple way to reduce the CPU load of an application that uses SQLite is to -cache the prepared statements and reuse them. Of course, one rarely needs -to run the exact same SQL statement more than once. But if a statement contains -one or more bound parameters, you can bind new values to the parameters prior -to each run and thus accomplish something different with each invocation.

This -technique is especially effective when doing multiple INSERTs into the same -table. Instead of preparing a separate insert for each row, create a single -generic insert statement like this:

-INSERT INTO important_table VALUES(?,?,?,?,?) -

Then for each row to be inserted, use one or more of the BindXxx() interfaces -to bind values to the parameters in the insert statement, and call Exec() to -do the insert, then call Reset() to rewind the program counter -of the internal bytecode in preparation for the next run.

For INSERT -statements, reusing a single prepared statement in this way will typically -make your code run two or three times faster.

You can manually manage -a cache of prepared statements, keeping around only those prepared statements -that you know will be needed again and closing prepared statements using Close() when -you are done with them or when they are about to fall out of scope. But depending -on the application, it can be more convenient to create a wrapper class around -the SQL interface that manages the cache automatically.

A wrapper -class can keep around the 5 or 10 most recently used prepared statements and -reuse those statements if the same SQL is requested. Handling the prepared -statement cache automatically in a wrapper has the advantage that it frees -you to focus more mental energy on writing a great application and less effort -on operating the database interface. It also makes the programming task less -error prone since with an automatic class, there is no chance of accidentally -omitting a call to Close() and leaking prepared statements.

The -downside is that a cache wrapper will not have the foresight of a human programmer -and will often cache prepared statements that are no longer needed, thus using -excess memory, or sometimes discard prepared statements just before they are -needed again.

This is a classic ease-of-programming versus performance -trade-off. For applications that are intended for a high-power workstation, -it can be best to go with a wrapper class that handles the cache automatically. -But when designing an application for a resource constrained devices where -performance is critical and engineering design talent is plentiful, it may -be better to manage the cache manually.

Regardless of whether or not -the prepared statement cache is managed manually or automatically using a -wrapper class, reusing prepared statements is always a good thing, and can -in some cases double or triple the performance of the application.

-SQL Overview - -SQL Server -Guide -SQLite - -SQL DB Overview - -Avoid Transient -Tables -Prevent Datafile -Corruption -SQL Index -Tips -SQL Schema -Tips -SQL Expressions - -SQL Statement -Tips -SQL Joins - - ANALYZE -Command - SQL WHERE -CLause Tips + + + + + +SQL Insertion +TipsThis guide provides some tips for using COMMIT statements. +

Introduction

INSERT, UPDATE and DELETE operations +are all very fast. However, COMMIT statements are very slow. You need to consider +several techniques to ensure you make the best use of COMMIT statements.

Intended audience:

This document is intended to be used by +Symbian platfomr licensees and third party application developers.

INSERT and +UPDATE are Fast but COMMIT is Slow

A programmer migrating to this +database engine might write a test program to see how many INSERT statements +per second it can do. They create an empty database with a single empty table. +Then they write a loop that runs a few thousand times and does a single INSERT +statement on each iteration. Upon timing this program they find that it appears +to only be doing a couple of dozen INSERTs per second.

“Everybody +I talked to says SQLite is suppose to be really fast”, the new programmer +will typically complain, “But I'm only getting 20 or 30 INSERTs per second!”

In +reality, SQLite can achieve around 50000 or more INSERTs per second on a modern +workstation, although less on a typical embedded platform. But the characteristics +of the underlying storage medium and the fact that the database engine guarantees +atomic updates to the database mean that it can only achieve a few dozen COMMIT +operations per second.

Unless you take specific action to tell SQLite +to do otherwise, it will automatically insert a COMMIT operation after every +insert. So the programmers described above are really measuring the number +of transactions per second, not the number of INSERTs. This is a very important +distinction.

Why is COMMIT so much slower than INSERT? SQLite guarantees +that changes to a database are ACID – Atomic, Consistent, Isolated, and Durable. +The Atomic and Durable parts are what take the time.

In order to be +Atomic, the database engine has to go through an elaborate protocol with the +underlying file system, which ultimately means that every modified page of +the database file must be written twice.

In order to be durable, the +COMMIT operation must not return until all content has been safely written +to nonvolatile media. At least two consecutive non-concurrent writes to flash +memory must occur in order to COMMIT.

An atomic and durable COMMIT +is a very powerful feature that can help you to build a system that is resilient, +even in the face of unplanned system crashes or power failures. But the price +of this resilience is that COMMIT is a relatively slow operation. Hence if +performance is a priority you should strive to minimize the number of COMMITs.

If +you need to do more than one INSERT or UPDATE or DELETE operation, you are +advised to put them all inside a single explicit transaction by running the +BEGIN statement prior to the first changes and executing COMMIT once all changes +have finished. In this way, all your changes occur within a single transaction +and only a single time-consuming COMMIT operation must occur.

If you +omit the explicit BEGIN...COMMIT, then SQLite automatically inserts an implicit +BEGIN...COMMIT around each of your INSERT, UPDATE, and DELETE statements, +which means you end of doing many COMMITs which will always be much slower +than doing just one.

Batch INSERT, +UPDATE, and DELETE Operations Using TEMP Tables

As described above, +when you have many changes to make to a database, you are advised to make +all those changes within a single explicit transaction by preceding the first +change with a BEGIN statement and concluding the changes with a COMMIT statement.

The +problem with BEGIN...COMMIT is that BEGIN acquires an exclusive lock on the +database file which is not released until the COMMIT completes. That means +that only a single connection to the database can be in the middle of a BEGIN...COMMIT +at one time. If another thread or process tries to start a BEGIN...COMMIT +while the first is busy, the second has to wait. To avoid holding up other +threads and processes, therefore, every BEGIN should be followed by a COMMIT +as quickly as possible.

But sometimes you run into a situation where +you have to make periodic INSERTs or UPDATEs to a database based on timed +or external events. For example, you may want to do an INSERT into an event +log table once every 250 milliseconds or so. You could do a separate INSERT +for each event, but that would mean doing a separate COMMIT four times per +second, which is perhaps more overhead than you desire. On the other hand, +if you did a BEGIN and accumulated several seconds worth of INSERTs you could +avoid doing a COMMIT except for every 10^th second or so. The trouble +there is that other threads and processes are unable to write to the database +while the event log is holding its transaction open.

The usual method +for avoiding this dilemma is to store all of the INSERTs in a separate TEMP +table, then periodically flush the content of the TEMP table into the main +database with a single operation.

A TEMP table works just like a regular +database table except that a TEMP table is only visible to the database connection +that creates it, and the TEMP table is automatically dropped when the database +connection is closed. You create a TEMP table by inserting the “TEMP” or “TEMPORARY” +keyword in between “CREATE” and “TABLE”, like this:

+CREATE TEMP TABLE event_accumulator( + eventId INTEGER, + eventArg TEXT + ); +

Because TEMP tables are ephemeral (meaning that they do not +persist after the database connection closes) SQLite does not need to worry +about making writes to a TEMP table atomic or durable. Hence a COMMIT to a +TEMP table is very quick.

A process can do multiple INSERTs into a +TEMP table without having to enclose those INSERTs within an explicit BEGIN...COMMIT +for efficiency. Writes to a TEMP table are always efficient regardless of +whether or not they are enclosed in an explicit transaction.

So as +events arrive, they can be written into the TEMP table using isolated INSERT +statements. But because the TEMP table is ephemeral, one must take care to +periodically flush the contents of the TEMP table into the main database where +they will persist. So every 10 seconds or so (depending on the application +requirements) you can run code like this:

+BEGIN; +INSERT INTO event_log SELECT * FROM event_accumulator; +DELETE FROM event_accumulator; +COMMIT; +

These statements transfer the content of the ephemeral event_accumulator +table over to the persistent event_log table as a single atomic operation. +Since this transfer occurs relatively infrequently, minimal database overhead +is incurred.

Use Bound Parameters

Suppose +you have a descriptor, nameDes, and you want to insert that value into the +namelist table of a database. One way to proceed is to construct an appropriate +INSERT statement that contains the desired string value as a SQL string literal, +then run that INSERT statement. Pseudo-code for this approach follows:

+_LIT(KSql, “INSERT INTO namelist VALUES('%S')”); +sqlBuf.Format(KSql, nameDes); +sqlDatabase.Execute(sql); +

The INSERT statement is constructed by the call to Format() on +the second line of the example above. The first argument is a template for +the SQL statement. The value of the nameDes descriptor is inserted where the +%S occurs in the template. Notice that the %S is surrounded by single quotes +so that the string will be properly contained in SQL standard quotes.

This +approach works as long as the value in nameDes does not contain any single-quote +characters. If nameDes does contain one or more single-quotes, then the string +literal in the INSERT statement will not be well-formed and a syntax error +might occur. Or worse, if a hostile user is able to control the content of +nameDes, they might be able to put text in nameDes that looked something like +this:

+hi'); DELETE FROM critical_table; SELECT 'hi +

This would result in the sqlBuf variable holding

+INSERT INTO namelist VALUES('hi'); DELETE FROM critical_table; SELECT 'hi' +

Your adversary has managed to convert your single INSERT statement +into three separate SQL statements, one of which does things that you probably +do not want to happen. This is called an “SQL Injection Attack”. You want +to be very, very careful to avoid SQL injection attacks as they can seriously +compromise the security of your application.

SQLite allows you to +specify parameters in SQL statements and then substitute values for those +parameters prior to running the SQL. Parameters can take several forms, including:

+? +?NNN +:AAA +@AAA +$AAA +

In the above, NNN means any sequence of digits and AAA means +any sequence of alphanumeric characters and underscores. In this example we +will stick with the first and simplest form – the question mark. The operation +above would be rewritten as shown below. (Error checking is omitted from this +example for brevity.)

+_LIT(KSql, “INSERT INTO namelist VALUES(?)”); +RSqlStatement s; +s.PrepareL(db, KSql); +s.BindText(1, nameDes); +s.Exec(); +s.Close(); +

PrepareL() compiles the SQL statement held +in the literal KSql. This statement contains a single parameter. The value +for this parameter is initially NULL.

The BindText() sets +the value of this parameter to the content of the nameDes descriptor +and then Exec() executes the SQL statement with the bound +parameter value.

There are variations of BindXxx() to +bind other kinds of values such as integers, floating point numbers, and binary +large objects (BLOBs). The key point to observe is that none of these values +need to be quoted or escaped in any way. And there is no possibility of being +vulnerable to an SQL injection attack.

Besides reducing your vulnerability +to SQL injection attacks, the use of bound parameters also happens to be more +efficient that constructing SQL statements from scratch, especially when inserting +large strings or BLOBs.

Cache and Reuse +Prepared Statements

Using RSqlStatement is a +two-step process. Firstly the statement must be compiled using Prepare(). +Then the resulting prepared statement is run using either Exec() or Next().

The +relative amount of time spent doing each of these steps depends on the nature +of the SQL statement. SELECT statements that return a large result set or +UPDATE or DELETE statements that touch many rows of a table will normally +spend most of their time in the Virtual Machine module and relatively little +time being compiled. But simple INSERT statements on the other hand, can take +twice as long to compile as they take to run in the virtual machine.

A +simple way to reduce the CPU load of an application that uses SQLite is to +cache the prepared statements and reuse them. Of course, one rarely needs +to run the exact same SQL statement more than once. But if a statement contains +one or more bound parameters, you can bind new values to the parameters prior +to each run and thus accomplish something different with each invocation.

This +technique is especially effective when doing multiple INSERTs into the same +table. Instead of preparing a separate insert for each row, create a single +generic insert statement like this:

+INSERT INTO important_table VALUES(?,?,?,?,?) +

Then for each row to be inserted, use one or more of the BindXxx() interfaces +to bind values to the parameters in the insert statement, and call Exec() to +do the insert, then call Reset() to rewind the program counter +of the internal bytecode in preparation for the next run.

For INSERT +statements, reusing a single prepared statement in this way will typically +make your code run two or three times faster.

You can manually manage +a cache of prepared statements, keeping around only those prepared statements +that you know will be needed again and closing prepared statements using Close() when +you are done with them or when they are about to fall out of scope. But depending +on the application, it can be more convenient to create a wrapper class around +the SQL interface that manages the cache automatically.

A wrapper +class can keep around the 5 or 10 most recently used prepared statements and +reuse those statements if the same SQL is requested. Handling the prepared +statement cache automatically in a wrapper has the advantage that it frees +you to focus more mental energy on writing a great application and less effort +on operating the database interface. It also makes the programming task less +error prone since with an automatic class, there is no chance of accidentally +omitting a call to Close() and leaking prepared statements.

The +downside is that a cache wrapper will not have the foresight of a human programmer +and will often cache prepared statements that are no longer needed, thus using +excess memory, or sometimes discard prepared statements just before they are +needed again.

This is a classic ease-of-programming versus performance +trade-off. For applications that are intended for a high-power workstation, +it can be best to go with a wrapper class that handles the cache automatically. +But when designing an application for a resource constrained devices where +performance is critical and engineering design talent is plentiful, it may +be better to manage the cache manually.

Regardless of whether or not +the prepared statement cache is managed manually or automatically using a +wrapper class, reusing prepared statements is always a good thing, and can +in some cases double or triple the performance of the application.

+SQL Overview + +SQL Server +Guide +SQLite + +SQL DB Overview + +Avoid Transient +Tables +Prevent Datafile +Corruption +SQL Index +Tips +SQL Schema +Tips +SQL Expressions + +SQL Statement +Tips +SQL Joins + + ANALYZE +Command + SQL WHERE +CLause Tips

\ No newline at end of file