Automated Cleanup with Python

First published in Micro Mart #1420, June 2016

One of the most useful programming patterns is to acquire some resource—a file handle, a lock, or a database or network connection—do some work with it, and then release it. In C++ this is easily achieved using constructors and destructors—but Python (like Java) doesn't guarantee that a destructor is ever called. Fortunately, Python provides a nice solution: context managers.

Let's begin by seeing how we can read in a text file without using a context manager:

file = None
lines = None
try:
    file = open(filename, "rt", encoding="utf-8")
    lines = file.readlines()
finally:
    if file is not None:
	file.close()

When this code finishes we are guaranteed that the file has been closed (if it was opened in the first place), even in the face of exceptions. If an exception did occur, it would be raised after the file was closed. Here's how to write the code in modern Python:

lines = None
with open(filename, "rt", encoding="utf-8") as file:
    lines = file.readlines()

These three lines do exactly the same thing as the first example. This works because Python's open() function returns a file object and file objects support the context manager protocol. So the middle line says: call the open() function and use its return value as a context manager—and name this return value file so that it can be accessed within the scope of the context manager. The scope is any code indented under the with statement.

The with context_manager as syntax is so useful that in almost every new Python release more Python objects in the standard library are turned into context managers. For example, the standard library's shelve, subprocess, tempfile, threading, unittest, and zipfile modules all provide functions that return context managers, and there are many others.

One of Python's nicest features is that it allows us to fully integrate our own classes and functions so that they work just like built-ins, and this extends to context managers. To make a class whose instances are context managers (i.e., to make it support the context manager protocol), the class must have two special methods: __enter__() and __exit__(), both with specific signatures. (A special method is one that we write but never explicitly call: calls are made by Python itself in response to the use of particular syntax.) Here's an example used to ensure that a SQLite database transaction will be committed—or rolled back in the face of an exception:

class Transaction:

    def __init__(self, db):
	self.cursor = db.cursor()

    def __enter__(self): # Start transaction
	self.cursor.execute("BEGIN;")
	return self.cursor

    def __exit__(self, exc_type, exc_val, exc_tb):
	# End transaction
	if exc_type is None:
	    self.cursor.execute("COMMIT;")
	else: # Exception will be raised
	    self.cursor.execute("ROLLBACK;")

Here's how we could use the Transaction class:

db = None
try:
    db = apsw.Connection(filename)
    with Transaction(db) as cursor:
	cursor.execute("DELETE FROM sales WHERE pid = ?", (pid,))
	cursor.execute("DELETE FROM products WHERE pid = ?", (pid,))
finally:
    if db is not None:
	db.close()

The apsw module provides a comprehensive interface to the SQLite database, but it is not in the standard library so must be downloaded separately. We could always use the standard (but less functional), sqlite3 module instead.

When the Transaction(db) call is encountered a new Transaction object is created (i.e., its __init__()) method is called, which in turn creates a database cursor. The returned Transaction object is assumed to be a context manager (since it is in a with statment), so its __enter__() method is immediately called, which in this case begins a SQLite transaction. The __enter__() method's return value is assigned to the variable that follows the as, in this case cursor. When the code leaves the context of the with statement (i.e., after the attempt to delete a product)—or if an uncaught exception occurs within the with statement—the context manager's __exit__() method is called. In this case, if there was no exception we commit the transaction to the database and both deletions take place; otherwise we rollback and nothing is deleted. In either case we are guaranteed to preserve the integrity of our database.

It is also possible to create an atomic context manager class which can ensure that a sequence of actions on a mutable data structure such as a dict, list, or set either all happen—or don't happen at all. For an example, see Programming in Python 3.

The context manager protocol can also be used to monitor, rather than to manage state. Here's a useful example that can be used to time little bits of code:

class Timer:

    def __init__(self, message, minSecs=None):
	self.message = message
	self.minSecs = minSecs
	self.monotime = 0

    def __enter__(self):
	self.monotime = time.monotonic()

    def __exit__(self, exc_type, exc_val, exc_tb):
	elapsed = time.monotonic() - self.monotime
	if self.minSecs is None or elapsed > self.minSecs:
	    print("{} {:.3f} sec".format(self.message, elapsed))

The time.monotonic() function returns a reference time in seconds (as a float), and is not affected if the program is running when the clocks go back. The Timer class can be used as follows:

with Timer("slow function"):
    slowFunction(args)

This will print how long slowFunction() took to run. We can provide a second argument, e.g., with Timer("slow function", 2), that means the time is only printed if the duration exceeds the second argument's number of seconds. And, of course, we could time multiple statements by including them all within a single with Timer statement.

The Timer class requires us to remember (or copy and paste) the rather complex signature of the __exit__() method. Fortunately, the standard library's contextlib module provides a simpler and shorter way to create context managers:

@contextlib.contextmanager
def timer(message, minSecs=None):
    monotime = time.monotonic() # here we __enter__()
    yield # The body of the with statement executes here
    elapsed = time.monotonic() - monotime # here we __exit__()
    if minSecs is None or elapsed > minSecs:
	print("{} {:.3f} sec".format(message, elapsed))

The timer() function can be used just like the Timer class:

with timer("slow function"):
    slowFunction(args)

If we need to return a value to be the variable after the as, we can provide it as the yield statement's argument. However, this case is less convenient than using a class since we have to wrap the yield in a try/except/finally construct. Personally, I always create a class.

The contextlib module provides some useful generic context managers, as we will see in a moment. First though, let's see how to delete a file that may or may not exist:

try:
    os.remove(filename)
except FileNotFoundError:
    pass

A nicer alternative is to use the contextlib's suppress() context manager which throws away the specific exception it is given if that exception is raised—but lets any other exception through:

with contextlib.suppress(FileNotFoundError):
    os.remove(filename)

Two other really useful context managers are contextlib.redirect_stdout() and contextlib.redirect_stderr(). These are especially helpful in unit tests since they make it easy to capture output that would normally be written to the console to check that it matches the output we expect. For example, suppose we want to test a function which prints odd numbers given a list of numbers:

def print_odd_numbers(data):
    for i, datum in enumerate(data):
        if datum % 2:
            print(datum, end=" " if i + 1 < len(data) else "")
    print()

expected = "1 3 5 7 9 11 13 15 17 19\n"
out = io.StringIO()
with contextlib.redirect_stdout(out):
    print_odd_numbers(range(20))
actual = out.getvalue()
assert actual == expected

We start by creating the string we expect to be output. Then we create an io.StringIO object called out. This object can be treated like a file opened for reading and writing text, so we can call out.write(str) or print(str, file=out). But the print_odd_numbers() function doesn't know anything about our out object, it contains statemenst like print(datum) which writes to sys.stdout. Fortunately, we can overcome this problem by using the context manager to temporarily redirect any output to sys.stdout to our out object. And once we leave the context of the with statement, sys.stdout is automatically restored and we can obtain anything written to it within the context manager by calling the io.StringIO.getvalue() method.

Context managers are used throughout the Python 3 standard library—and their use is growing. Furthermore, as we've seen, we can easily create our own custom context managers. The contextlib documentation provides examples and links to further information. In addition, the documentation covers the closing() context manager which can be used for any Python object that has a close() method but which isn't itself a context manager, and the ExitStack class which can be used to handle a whole bunch of context managers in one go.

For more see Python Programming Tips

Top