Python Sorted Collections

A short plea for adding sorted collections to Python's standard library

Python's “batteries included” standard library seems to have a battery missing. And the argument that “we never had it before” has worn thin. It is time that Python offered a full range of collection classes out of the box, including sorted ones.

Many long-term Python programmers are used to holding unsorted collections and sorting them only when needed. This is a perfectly sensible approach that works well for many use cases. Furthermore, Python's sort algorithm is highly optimized and works especially well for partially sorted collections, so this approach is usually very fast.

In fact, some developers seem so used to sorting when needed that they can be dismissive of the value of intrinsically sorted collections. Yet most mainstream languages (such as Java and C++) provide both sorted and unsorted collections and leave it to their users to decide which is the appropriate kind of collection to use. Yet Python, for most of its history—and despite claiming to have a “batteries included” standard library—has denied its users this choice (at least, out of the box).

Back in 2012, the release of Python 3.1 at last included an intrinsically-sorted collection: the collections.OrderedDict class. This provides an insertion-ordered dictionary, and is extremely useful. I, personally use it myself, and I'm seeing more and more uses of it in other people's code. The most recent documentation shows newly added functionality and some interesting use cases.

Implementing a sorted list in Python is easy using the bisect module. And it isn't much more difficult to create a sorted list that accepts a key function. Since it is so easy, it might be argued that there is no point adding a sorted list to the standard library. But in practice, time after time developers create their own custom sorted list—a sure sign that such a class belongs in the standard library. There are, of course, many examples to choose from, but one of those that ought to be considered is the SortedContainers package.

Implementing an intrinsically sorted dictionary, i.e., a dictionary that is sorted using its keys' < operator, or implementing a key-function sorted dictionary isn't difficult—unless you want decent performance that is. I've implemented a red-black tree in pure Python and achieved absolutely dismal performance, so now I use my own dict subclass in conjunction with a custom sorted list. The fact is that Python's built-in dict is implemented in C and is extremely efficient, so it is very hard to compete with.

Yet, time after time, the need for an intrinsically sorted dictionary arises. And again, people keep coming up with their own solutions. Some use an in-memory SQLite database, others implement pure Python solutions (like the one in the SortedContainers package), and still others wrap the C++ map class (which is an intrisically sorted dictionary). Why do so many developers go to so much trouble to create these things? Can they all be wrong, should they all be using dict and sorting on demand? Or could it be that for their use cases what they need really is an intrisically sorted dictionary?

A look on PyPI soon reveals just how many different implementations there are.

It is my hope that the Python developers will implement—or adopt—an intrisically sorted list and dictionary. It could start out as pure Python (like the SortedContainers package), and perhaps over time will end up being backed by a C implemention as happened with the decimal module.


Book Cover

Top