In this article we'll see what generators are, why Python 3 favours
them, and how to create our own. Incidentally, Python 2 methods that
returned sequences (e.g., dict.keys()
), generally returned
lists, while in Python 3 they return generators. Let's start with a
couple of Python 3 examples:
N = 1000000 # 1_000_000 allowed in Python 3.6
lst = list(range(N))
gen = range(N)
print(sys.getsizeof(lst)) # 9000112
print(sys.getsizeof(gen)) # 48
So the list occupies almost 9MB while the generator consumes a mere 48
bytes! Clearly generators can save a lot of memory. And it gets better.
If we change N
to, say, 20 million, the list will occupy 180MB,
while the generator still consumes just 48 bytes.
A list stores object references, so if we want to store a million numbers, the list will contain a million objects. But a generator stores a tiny bit of code—the code needed to generate each item on demand. When I timed the code with twenty million numbers, it took 0.36 seconds to create the list, and 1.1 seconds to iterate over it using a for loop, for a total of 1.4 seconds. For the generator it took no time to create (well, less than 0.000004 seconds), and 1.3 seconds to iterate.
For lists, we pay an up-front cost in processing and memory to create
the list, and get fast iteration. For generators there's effectively no
setup cost and no memory overhead, but there is a tiny processing cost
per item we iterate over. In practice, we normally do some per-item
processing which is usually so much more expensive than the tiny cost of
generating each item, that generators are just as fast in practical use
as lists. For these reasons, Python 3 always provides generators rather
than lists when sequences of items are required. And, of course, we can
always get a list (or tuple) simply by wrapping the generator in
list()
or tuple()
—as shown in the example.
Like lists, generators have a literal syntax, for example:
odd_lst = [x for x in range(1000) if x % 2]
odd_gen = (x for x in range(1000) if x % 2)
The odd_lst
list's size varies proportionally (e.g., 528 bytes
for odd numbers less than 100, 4272 bytes for less than 1000, and so
on), while the generator is a fixed size of 72 bytes. This generator is
slightly bigger than the previous one because the generating code is a
bit more complex.
Using the built-in range()
function, or the literal syntax with
parentheses is straightforward and useful. Furthermore, many of Python's
built in types return a generator rather than a list: for example,
dict.keys()
, dict.values()
, and dict.items()
.
One other distinction between lists and generators is worth pointing
out: lists are always either empty or of finite length, but a generator
may be of any length—even infinite. Naturally, we would like to be
able to make our own generators when appropriate. Let's start with a
simple generator function that returns odd numbers:
def odds():
start = 1
while True:
yield start
start += 2
Although tiny, this function is quite sophisticated. When it is called
it creates a local variable called start
with value 1. It then
immediately begins an infinite loop whose first statement is yield
start
. When a yield statement is reached, the function returns the
value of the yield statement (or None
if there isn't a value
given), and then suspends execution of the function. This means that all
the function's state (in this case the value of start
), is
stored. The second time the function is called it resumes from the
statement following the yield
, so in this case start
is incremented to value 3, then the loop continues, and again the
yield
is reached, so this time 3 is returned (yielded), and the
state of start
is stored as 3 and the function is again
suspended. Here's how we can use the function:
for x in odds():
print(x, end=" ")
if x > 20:
break
print()
# 1 3 5 7 9 11 13 15 17 19 21
The odds()
generator function contains an infinite loop, so we
must only ever call it a finite number of times or else it will never
stop. Here's an improved version:
def odds(start=1, end=None):
while True:
if end is not None and start >= end:
break
yield start
start += 2
We can call this with optional start and end values. If we give neither
it will behave like the original version producing 1, 3, 5, .... If we
give a start and end, or just an end (e.g., odds(end=40)
), then
we will get a finite generator:
for x in odds(20, 40):
print(x, end=" ")
print()
# 20 22 24 26 28 30 32 34 36 38
The two odds()
functions are sufficient to show how to create
generator functions. But what if we want to use a generator in a class?
We might, for instance, have a class that contains multiple items for
which we want to provide an iterator (like dict.keys()
). Here
is a simple Polygon class whose instances store a sequence of integers
representing pairs of (x, y) coordinates:
class Polygon:
def __init__(self, *points):
self._points = array.array("i", points)
def x(self, index=0):
return self._points[index * 2]
def setX(self, index, value):
self._points[index * 2] = value
def y(self, index=0):
return self._points[1 + (index * 2)]
def setY(self, index, value):
self._points[1 + (index * 2)] = value
This class uses the array module's array class to store the points in a
very compact form (e.g., 56 bytes for 20 points vs. 472 bytes for a list
of 40 integers), but at the price of some tiny processing overhead to
get and set points. The *points
syntax means that the
constructor will accept zero or more arguments, so if none are given the
polygon will be created with an empty array. What if we want to iterate
over all the points in a polygon? We could easily add a
points()
method:
def points(self):
points = []
x = None
for coord in self._points:
if x is None:
x = coord
else:
points.append((x, coord))
x = None
return points
The points()
method can be used as follows:
polygon = Polygon(range(40))
for point in polygon.points():
print(point)
# (0, 1)\n(2, 3) etc.
for x, y in polygon.points():
print(x, y)
# 0 1\n2 3 etc.
The points()
method has three disadvantages. First, users of
our Polygon
class must learn it, when really they expect to be
able to iterate the same as with any other class, e.g., for point in
polygon. Second, the class creates a (possibly big) list of integers
rather than returning each pair on demand. Third, although correct, the
code is inelegant.
Integrating our Polygon
class so that it supports iteration is
really easy. We simply rename the points()
method to
__iter__()
, and make it return an iterator—and while
we're at it, we'll improve the code too:
def __iter__(self):
points = []
for x, y in zip(self._points[::2], self._points[1::2]):
points.append((x, y))
return iter(points)
Python's built-in zip()
function takes two or more iterables
and returns a two-(or more)-tuple of the first items, then the second
items, and so on, until one of the iterators runs out of items. Here,
we've used slicing syntax [::2]
which means step from the first
(zeroth) to the last in steps of two (i.e.,
[0:len(_self.points):2]
), and [1::2]
which means step
from the second to the last in steps of two (i.e.,
[1:len(self_points):2]
). Once we have this code in place we can
write for point in polygon:
or for x, y in polygon:
.
Here's our final refinement:
def __iter__(self):
for x, y in zip(self._points[::2], self._points[1::2]):
yield (x, y)
This version of the method doesn't store any data at all, simply
yielding each x, y coordinate pair in turn. The built-in zip()
function is also a generator, so that doesn't create any overhead
either. And this new __iter__()
can be used with exactly the
same for point in polygon:
or for x, y in polygon:
syntax as before. And if we ever needed a tuple or list of a polygon's
points we would simply write tuple(polygon)
or
list(polygon)
and under the hood Python would automatically use
the Polygon.__iter__()
method.
In general, it is better to provide functions or methods that return a generator than ones which return lists. Generators consume hardly any memory, and users of our functions or methods have the choice of using the generator as-is, or if they really need a tuple or list, forcing the generator to generate all its values.
For more see Python Programming Tips
Your Privacy • Copyright © 2006 Qtrac Ltd. All Rights Reserved.