When we create an application we will almost certainly need a way to
store custom data. For small amounts of data we can use .ini
files or on Windows we can use the registry, but for larger amounts of
data (e.g., the application's notion of a “document”), we will need to
create a storage format. Such formats can be binary or plain text,
custom or predefined.
For example, the SVG image format is plain text (XML markup; UTF-8 encoding) and predefined, whereas the FLAC audio format is binary and predefined. In this article we will discuss custom formats, and show how to use SQLite as a custom binary format.
For custom data, using a plain text format is very appealing. Such formats are human readable and so tend to be much easier to debug than binary formats. And if we use an existing text format such as JSON or XML, libraries exist in almost every language for reading and writing them. For small documents, such formats work well, but for large amounts of data they can be quite slow to read and write, especially XML.
Creating a custom binary format can be tricky, especially when debugging, but can produce very compact documents which are very fast to read and write.
The main problem with custom plain text and binary formats is that it can be quite painful to change them. Suppose, for example, that version 2 of our application needs to store additional data. We need to be able to read our version 1 files and fill in the gaps with sensible defaults so that they can be saved as version 2 files. Over the years I've tried many different solutions both plain text and binary, but nowadays I use a one-size-fits-all solution: every custom document type I create is a SQLite database.
SQLite's “Lite” may make it sound flaky but in fact it is the most widely deployed database in the world (inside practically every smartphone), and its test suite at well over three billion automated tests provides 100% code coverage — so it is extremely reliable. SQLite is an in-process database which means that it exists inside the same operating system process as the application that uses it, so it is normally used as a single-user database. Each SQLite database is stored in a single file, no matter how many tables, views, and triggers it contains, so it is very easy to use as an application's document storage format.
Let's imagine that we want to store simple card index cards, each with a title and a body:
class IndexCard:
def __init__(self, title, body=None):
self.title = title
self.body = body
We could, of course, store them in memory, loading and saving them all in one go. But if we use SQLite, we only need to read in the cards we're interested in and can add, delete, or update them individually with very little memory overhead. If our data items were a lot bigger and more complex and used a lot more memory per-item, this could make a big difference to performance.
We'll hold our cards in a model that wraps a SQLite database and which uses the third-party APSW library rather than the standard library's rather weaker sqlite3 module.
VERSION = 1
class IndexCardModel:
def __init__(self, filename):
create = not os.path.exists(filename)
self.db = apsw.Connection(filename)
cursor = self.db.cursor()
cursor.execute(Sql.Prepare)
pragma = Sql.SetVersion.format(VERSION)
with Transaction(self.db) as cursor:
if create:
cursor.execute(Sql.Create)
else:
version = getfirst(Sql.GetVersion)
if version < VERSION:
pass # Update Database structure here
cursor.execute(pragma)
A model must be created with a filename which may be an existing
database of cards or a non-existent file that must be created. The
Transaction class is from our Automated
Cleanup with Python article. SQLite allows us to set and get a
user_version
from a database. We use it here to set our
database version when creating a new database and to check if an
existing database is up-to-date.
The Sql
namespace is one we have created like this:
Sql = types.SimpleNamespace()
Sql.GetVersion = "PRAGMA user_version;"
Sql.SetVersion = "PRAGMA user_version = {};"
Sql.Prepare = """PRAGMA foreign_keys = 1;
PRAGMA temp_store = MEMORY;"""
Sql.Create = """CREATE TABLE cards (
cid INTEGER PRIMARY KEY NOT NULL,
title TEXT NOT NULL, body TEXT);"""
Sql.Add = """INSERT INTO cards (title, body)
VALUES (:title, :body);
SELECT LAST_INSERT_ROWID();"""
Sql.Insert = """INSERT INTO cards (cid, title, body)
VALUES (:cid, :title, :body);"""
Sql.Update = """UPDATE cards SET title = :title, body = :body
WHERE cid = :cid;"""
Sql.Delete = "DELETE FROM cards WHERE cid = :cid;"
Sql.Get = "SELECT title, body FROM cards WHERE cid = :cid;"
Sql.Select = """SELECT cid, title, body FROM cards
ORDER BY LOWER(title);"""
Sql.Count = "SELECT COUNT(*) FROM cards;"
Sql.Exists = "SELECT COUNT(*) FROM cards WHERE cid = :cid;"
In this particular case we could have used a
collections.namedtuple
since the values aren't changed, but using a
types.SimpleNamespace
makes
for an easier syntax in this case.
def getfirst(cursor, sql, params=None, Class=int):
params = {} if params is None else params
first = cursor.execute(sql, params).fetchone()[0]
return bool(int(first)) if isinstance(Class, bool) else Class(first)
This is a helper function for extracting the first field from the first
record produced by a query and returning it with the correct type. We
have to special case Booleans because bool("0")
is
True
(because it is a
non-empty string), but bool(int("0"))
is False
since
bool(0)
is False
.
def close(self):
self.db.close()
self.db = None
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.db.close()
These methods allow a model instance to be used in a with
statement, or
in a try
... finally
block with an explicit call to
close()
to guarantee
finalization.
def add(self, card): # cid = model.add(card)
with Transaction(self.db) as cursor:
return getfirst(cursor, Sql.Add, vars(card))
This method adds a new card and returns its cid
(Card ID). Python's
built-in vars()
function returns a key-value dictionary of the object it
is given, i.e., a copy of the object's __dict__
, if it has one. So for
IndexCards
it returns a {"title": "card title", "body":
"card body"}
dict
. APSW's cursor.execute()
method takes a SQL query string and
optionally a dict
where the values are substituted for the corresponding
keys in the query. The Sql.Add
query has two separate statements, the
second returning the newly inserted card's cid
.
def __getitem__(self, cid): # card = model[cid]
with Transaction(self.db) as cursor:
row = cursor.execute(Sql.Get, dict(cid=cid)).fetchone()
return IndexCard(*row) if row is not None else None
This special method allows us to retrieve a card by its cid
. We've
chosen to fail safe by returning None
for an invalid
cid
, but perhaps a
better alternative would be to raise a custom InvalidCidError
exception.
def __delitem__(self, cid): # del model[cid]
with Transaction(self.db) as cursor:
cursor.execute(Sql.Delete, dict(cid=cid))
This method allows a card identified by its cid to be deleted. Strictly speaking we don't need a transaction for this method or the previous one since they are both single statements, but using a transaction is safer since if the SQL was changed to be two or more statements they would still work correctly.
def __setitem__(self, cid, card): # model[cid] = card
d = vars(card)
d["cid"] = cid
with Transaction(self.db) as cursor:
exists = getfirst(cursor, Sql.Exists, d, bool)
sql = Sql.Update if exists else Sql.Insert
cursor.execute(sql, d)
The transaction is needed here because we do two separate statements both of which must succeed or fail. The first statment is used to tell us if we're updating an existing card or replacing (really, inserting) with a new card, and the second to perform the update or insert.
def __iter__(self): # for cid, card in model:
with Transaction(self.db) as cursor:
for row in cursor.execute(Sql.Select):
cid = row[0]
card = IndexCard(row[1], row[2])
yield cid, card
Users of our model can iterate over all the cards using this method.
If we wanted to allow a constraint (e.g., all cards with an empty body
or all cards with a title containing a specific word), we would have to
provide a separate method, perhaps with the signature iterate(self,
where="", orderby="ORDER BY LOWER(title)")
.
def __len__(self): # size = len(model)
with Transaction(self.db) as cursor:
return getfirst(cursor, Sql.Count)
This method returns the number of cards in the database, and completes our implementation.
You might like to consider raising a custom InvalidCidError
if
a non-existent cid
is passed to __getitem__()
,
__delitem__()
, or __setitem__()
. And more
interestingly, you might like to add support for undo/redo, although
doing so is quite a bit of work.
For those who don't much like writing raw SQL, an excellent alternative is to use an Object-Relational Mapper (ORM) such as SQLAlchemy or sqlobject.org's SQLObject. These make it possible to create Python classes such that a class maps to a database table and an instance to a database record with all the underlying SQL handled by the ORM.
For more see Python Programming Tips
Your Privacy • Copyright © 2006 Qtrac Ltd. All Rights Reserved.