How the SQLite Database Works
How the SQLite Database Works
Let's discuss some architectural details of database implementation by using an early version of SQLite.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
A database is an essential part of building a software system, which used to store and read data efficiently. Here, we are going to discuss some architectural details of database implementation by using an early version of SQLite.
SQLite is a small database application that is used in millions of software and devices. SQLite was invented by D.Richard Hipp in August 2000. SQLite is a high performance, lightweight relational database. If you are willing to learn the internals of a database at a coding level, then SQLite is the best open-source database available out there with highly readable source code with lots of documentation. Reading later versions of SQLite becomes a little harder since it contains lots of new features. In order to understand the basic implementation of database internals, you should have good knowledge about data structures, some knowledge about the Theory of Computing, and how an operating system works.
Here, we will look into the SQLite 2.5.0 version. You can find out a simple implementation of the SQLite backend on GitHub. Also, I have found this repository which contains SQLite 2.5.0 implementation for code reading.
Keeping data in a flat file is not that efficient to read and store the data. Databases organize the data in proper order such that data reading and writing make much faster. Data can be structured, semi-structured or unstructured. Databases are mainly used to store structured and semi-structured data. Databases can be dived as follows based on the type of data structure going to store.
- Relational database: Commonly used database type which has a table structure. Tables can have relations with other tables. SQL language used to manipulate data on this type of database.
- Key-value database: data stored along with a key associated with it. Data can be retrieved back with the given key. In-memory databases are commonly found as this type of database.
- Object database: Data structure is more like an object rather than a table.
- Graph database: Graph database is a collection of node and edges mostly used in data mining and social media applications.
SQLite Database Architecture
SQLite database architecture split into two different sections named as core and backend. Core section contains Interface, Tokenizer, Parser, Code generator, and the virtual machine, which create an execution order for database transactions. Backend contains B-tree, Pager and OS interface to access the file system. Tokenizer, Parser and code generator altogether named as the compiler which generates a set of opcodes that runs on a virtual machine.
Where Do I Start?
To understand the architecture of a database you need to have the following prerequisites.
- Good understanding of data structures and algorithm. Especially data structures such as B-tree, Linked list, Hashmaps, etc.
- Some understanding of computer architecture. How to read-write into a disk, how paging and caching works.
- Theoretical computers such as Finite Automata, Context Free Grammer and some Regular Expression knowledge.
VFS (Virtual File System)
File access on Unix and Windows are different from each other. VFS provides common API to access files without considering the type of the operating system it runs on. This API includes functions to open, read, write and close a file. Here follows are some of the API used in VFS to read, write data into a file.
/* Create a connection to file to read write zFilename : file name id : file pointer pReadonly : read or write */ int sqliteOsOpenReadWrite(const char *zFilename, OsFile *id,int *pReadonly);
/* Acqure the lock to read file. This function should be called before caling to any file read function. Return 0 if success id : file pointer */ int sqliteOsReadLock(OsFile *id);
/* Get the write lock to write into a file. This function should called before doing any write operation into the file system. Return 0 if success id : file pointer */ int sqliteOsWriteLock(OsFile *id);
/* Move to the given number of offest to read or write into the file */ int sqliteOsSeek(OsFile *id, int offset);
/* Read amt bytes from the file with offset pointed by sqliteOsSeek */ int sqliteOsRead(OsFile *id, void *pBuf, int amt);
/* Write amt bytes from the pBuf into the file */ int sqliteOsWrite(OsFile *id, const void *pBuf, int amt);
Here you can start using a file by using sqliteOpenReadWrite function. This function gives you a pointer to a file which can be used to read or write data into it. Next, the lock should be acquired before doing any read-write operations. If it is only a read operation then read lock should be acquired. Write lock should be acquired for both read and write transactions.
Then read and write can be done by seeking the location by providing the offset of the file into the sqliteOsSeek function. Offset is number of byte from the starting point of the file to the place the data should write or read.
Pages are the smallest unit of a transaction of the database on the file system. When a database needs to read data from a file, it requests it as a page. Once the page is loaded into the database engine, it can store the page if it accesses frequently on its cache. Pages are numbered and start from one. The first page is called the root page. The size of a page is constant.
/* Open pager with the given file name */ int sqlitepager_open(Pager **ppPager,const char *zFilename,int nPage,int nEx);
/* Get page specified by the page number */ int sqlitepager_get(Pager *pPager, Pgno pgno, void **ppPage);
/* Start to write data into a page specified in pData */ int sqlitepager_write(void *pData);
/* Commit page changes into the file */ int sqlitepager_commit(Pager*);
/* Close the connection to the file */ int sqlitepager_close(Pager *pPager);
Btree is a data structure that used to store data as a tree based on its value. The simplest form of the BTree is the Binary tree. Databases use Btree data structure to store indexes to improve the performance of the database. Each node of Btree contains a list of keys which are used to indexing. Btree indexes can be created for each of the columns on a table. Each of the Btree has root page, which is the starting point for any of the Btree search.
To point a row on a Btree, special pointer used called "Cursor". The cursor used to point a record given with page id and offset called idx. SQLite store database schema on a table which is known as "master_table". master_table always stored on the first page of the database.
/* Open file connection to a page file name specified by zFileName with nCache size cache */ int sqliteBtreeOpen(const char *zFilename, int mode, int nCache, Btree **ppBtree)
/* Start transaction. This function should called before any btree modification operations */ int sqliteBtreeBeginTrans(Btree *pBt)
/* Insert key pKey with nKey byte and value pData with nData byte put into the Btree */ int sqliteBtreeInsert(BtCursor *pCur, const void *pKey, int nKey, const void *pData, int nData)
/* Write data into the file */ int sqliteBtreeCommit(Btree *pBt)
/* Move cursor to the matching pKey with nKey bytes */ int sqliteBtreeMoveto(BtCursor *pCur, const void *pKey, int nKey, int *pRes)
VDBE(Virtual Database Engine)
VDBE is a virtual machine that runs a set of operations, which was generated by the Code Generator. All SQL commands including insert, delete, update, select converted into a set of opcodes and then runs it on a virtual machine. Each opcode contains three input named as p1, p2, and p3. You can think this input as input for a function.
Below I added sample execution opcodes stack for following SQL select statement. PC is the instruction id for program counter. The most interesting thing in SQLite for me is I can see the set of VBDE opcode instruction for a given SQL code by just appending "explain" keyword at the beginning of SQL query.
explain select * from foo;
|1||ColumnCount||1||0||Set column count to 1|
|2||ColumnName||0||0||value||Set column name as "value"|
Open cursor and point it to the third page which is the root page for foo table(p3 is not important
|4||VerifyCookies||46||0||Make sure the schema was not changed|
|5||Rewind||0||11||Move cursor to the first entry|
|6||Column||0||0||Read data from Btree payload and put it onto the stack|
Pop the top two elements from the stack. If they are not equal, then jump to instruction P2. Otherwise, continue to the next instruction.
Pop P1 values off the stack and form them into an array. This would be the result row for this SQL expression
Move cursor to next record, if data exit goto P2 else go to next line
|11||Close||0||0||Close the cursor|
Tokenizer, Parser, and Code Generator are together known as Compiler, which generates sets of opcode that runs on VBDE. Tokenizer generates a set of tokens by scanning SQL code. Then, it validates the syntax and generates the parse tree. Lemon parser used to parse the given SQL code by pre defined Context Free Grammer. Code Generator converts this parse tree into a mini program, written in SQLite opcodes.
SQLite is a simple, lightweight, high-performance, relational database that is widely used in software designs. Ealy versions of SQLite were written with simple architecture and highly readable code. Pager provides an abstraction layer to read-write data into the file system as fixed size blocks. While Btree provides a way to store data in the memory efficient way to access data faster. When SQL comes into the SQLite, it converts SQL into the SQLite machine code and runs it on VBDE. The result sends back to the user through the API.
Opinions expressed by DZone contributors are their own.