Database storage overhead

Database storage is very efficient. This document gives details of the storage overhead.

Space utilization is efficient

  • the underlying permanent file store adds minimal overhead to maintain its structure: store imposes a fixed 46 bytes + two bytes per additional 16K; each stream requires seven bytes.

  • the database has minimal space requirements to store its schema and data structure.

  • including the stream overhead, row storage overhead can be less than two bytes per row in ideal clustering conditions, even with lower clustering the overhead is usually below 1% of the data volume.

The storage required for a row can be determined as follows:

  • each non-null value requires the storage for fixed width columns, and the raw storage plus one byte for variable width columns.

  • long columns are stored embedded in the row data when they are small enough, and otherwise they are stored in a separate stream: this makes more efficient use of the both space and speed. When embedded they require just one bit more than the short columns, when separated they require eight bytes in the row data plus any stream overhead.

  • each nullable column requires one extra bit.

  • bits are packed into bytes in the row storage.

Indexes

Indexes are implemented using STORE B+trees. Note in particular that indexes use fixed length keys, so that indexes on longer text fields can consume significant space.

If the key for the index is k bytes, the number of rows to index is n , the index page size is P , and the B-tree packing density is r :

a = [(P-8)/(k+4)] * r

Where [x] is the largest integer <= x . Then the number of pages required, N , is

N = n * (1/a + 1/(a*a))

Each page requires P+7 bytes in the store, so the total indexing overhead, S , is

S = N * (P+7)

For DBMS P=512 and r=0.86 .