def nows_indexing(s):
t = ''
for i in range(len(s)):
if not s[i].isspace():
t += s[i]
return t
One problem that frequently arises in practice is the need to take a string, and return a copy of it with all whitespace—leading, trailing, and internal—removed.
Most languages support stripping whitespace from the ends
out-of-the-box; for example, Python’s str.strip()
method. But fewer
languages directly support the complete whitespace removal required
here.
According to The Zen of Python, “There should be one—and preferably only one—obvious way to do it”. But that was written many years ago, and in modern Python, it is easy to solve this problem in more than half a dozen different ways.
This version shows how someone coming to Python from a procedural language might tackle the problem.
def nows_indexing(s):
t = ''
for i in range(len(s)):
if not s[i].isspace():
t += s[i]
return t
Notice that for every non-space character, two index lookups into the
string are needed. Also, this approach uses string concatenation (+=
).
In older Python versions this was very slow, but more modern Pythons
seem to have put a lot of effort into optimizing it. Even so, in
performance terms this weighs in at around 248 units of time on our
test machine. This is about 10x slower than the best function we’ll
see.
Python newcomers usually learn how to iterate over collections and strings, so they might create a function like this one.
def nows_iadd(s):
t = ''
for c in s:
if not c.isspace():
t += c
return t
In modern Python’s this isn’t quite as bad as some of the other functions, typically taking around 177 units of time.
Here’s an approach that might be considered by real old-timers when they first learnt about functional-style programming:
def nows_filter(s):
return ''.join(filter(lambda c: not c.isspace(), s))
The best that can be said about this is that it is short. Performance-wise it typically takes about 267 units of time and is the slowest of all the examples shown. The best we’ll see is more than 10x faster!
Many textbooks recommend doing this kind of thing by adding each wanted character to a list and then joining the list at the end.
def nows_list_append_join(s):
t = []
for c in s:
if not c.isspace():
t.append(c)
return ''.join(t)
Many Python programmers would expect this to be faster than using the
string concatenation (+=
) shown in two previous approaches, but this
isn’t the case in practice! This one usually takes about 190 units of
time.
Modern Python programmers know how useful generators are. For example:
def nows_generator_join(s):
return ''.join(c for c in s if not c.isspace())
But it is easy to forget that generators perform best if the processing involved with each iteration costs a lot more than the (tiny) generator overhead. And this isn’t one of those best cases, with a dismal typical performance of about 181 units of time
List comprehensions build lists in memory. This can be expensive compared with using generators—at least for large lists or where the creation of each element is expensive. But for small lists, list comprehensions can provide good performance.
def nows_list_comp_join(s):
return ''.join([c for c in s if not c.isspace()])
This typically takes around 154 units of time, comprehensively beating
the generator approach. Keep in mind that this is not a generalisable
result, so for any given situation it is best to use the timeit
module
or similar to compare.
A simple regular expression can be used to match any amount of
whitespace—including newlines if we use the re.MULTILINE
flag.
NOWS_RX = re.compile(r'[\s\n]+', re.MULTILINE)
def nows_re_sub_ws(s):
return NOWS_RX.sub('', s)
This function is very simple (assuming we understand regular expressions), but has disappointing performance of around 110 units of time.
Python’s str
class provides a static method called maketrans
which
can create a “translation” table mapping characters to characters. It is
also possible to map characters to None
which has the effect of
deleting them. Once a translation table has been created it can be used
with the str.translate
method.
NOWS_TABLE = str.maketrans({' ': None, '\n': None, '\t': None})
def nows_str_translate(s):
return s.translate(NOWS_TABLE)
Of course in this example we haven’t actually deleted every possible
whitespace character, just a few to show how it is done. As for
performance, it takes a respectable 80 units of time. (Incidentally,
this shoots up to over 130 units if we replace with ''
(the empty
string) rather than None
.)
The standard Python interpreter is written in C which is amongst the fastest languages available. So it shouldn’t be a surprise that by offloading all the work to Python functions that are implemented in C we get good performance.
def nows_split_join(s):
return ''.join(s.split())
This is probably the simplest code of all the examples. The
str.split()
method returns a list of characters excluding any
whitespace ones. (It is also possible to split on specified characters,
but whitespace is the default.) This typically takes a mere 24 units
of time, less than a tenth of that taken by a couple of the earlier
functions, and far faster than any of the others.
On our test machine using our test data the nows_split_join()
function
comfortably outperformed every other method we tried for removing all
whitespace from strings. Of course, in our real code we just call it
nows()
.
def nows(s):
return ''.join(s.split())
For more see Python Programming Tips
Your Privacy • Copyright © 2006 Qtrac Ltd. All Rights Reserved.