Premature micro-optimisations that keep me awake at night

So it’s idiomatic Python to use containers rather than extended Boolean expressions or if/else or switch statements, e.g. this:

if key in ('foo', 'bar'): ...

Rather than:

if key == 'foo' or key == 'bar': ...

Or:

keys = { 'foo': 1, 'bar': 2 }
value = keys[key]

Rather than:

if key = 'foo': value = 1
elif key = 'bar': value = 2

But what’s the performance cost of the idiomatic approaches though? If these statements were in a tight loop for dealing with large amounts of data or 3d rendering, would it be better to avoid the idiomatic forms?

The long-winded Boolean expressions might cost more since it’s interpreted, but that would have to be a long set of conjunctions before it would be larger than the cost of instantiating a tuple. Same for the dict construction and lookup.

I assume that Python needs to evaluate and construct the container every time its definition is executed, but does CPython do any optimisation around that?

Any thoughts?

So it’s idiomatic Python to use containers rather than extended Boolean
expressions or if/else or switch statements, e.g. this:

if key in ('foo', 'bar'): ...

Rather than:

if key == 'foo' or key == 'bar': ...

I think we can usually agree that the former is more readable than the
latter?

Or:

keys = { 'foo': 1, 'bar': 2 }
value = keys[key]

I confess I write this one as:

 value = {
     'foo': 1,
     'bar': 2,
 }[key]

on the occasions I do use it.

Rather than:

if key = 'foo': value = 1
elif key = 'bar': value = 2

But what’s the performance cost of the idiomatic approaches though? If
these statements were in a tight loop for dealing with large amounts of
data or 3d rendering, would it be better to avoid the idiomatic forms?

I believe the Python compiler recognises a “const tuple” and precompiles
it, so key in ('foo', 'bar') is in fact faster. I think I might have
seen Chris demo this in recent months, maybe in the thread on a literal
frozenset syntax.

The long-winded Boolean expressions might cost more since it’s interpreted, but that would have to be a long set of conjunctions before it would be larger than the cost of instantiating a tuple. Same for the dict construction and lookup.

I assume that Python needs to evaluate and construct the container every time its definition is executed, but does CPython do any optimisation around that?

CPython optimises the tuple case.

For the dict, I usually prebuilt the literal example you use. So instead
of, eg:

 class Foo:
     def key_for(key):
         return { 'foo': 1, 'bar': 2, }[key]

I’ll usually make the effort to go:

 class Foo:
     FOO_BAR_TABLE = { 'foo': 1, 'bar': 2, }
     def key_for(key):
         return self.FOO_BAR_TABLE[key]

or whatever the natural equivalent is. This has the advantage of (a)
building the dict just once, so that key_for has O(1) cost and (b)
gives the dict a good name, providing more clarity to the purpose of
the method.

Cheers,
Cameron Simpson cs@cskk.id.au

If you care about the timings, find out! Python comes with a timeit module that makes this easy. You’ll need to test a matrix of examples - whatever syntactic options you want to choose, tested against several situations like “key is equal to the first option”, “key is equal to the last option”, and “key is not found”.

Other things to be aware of:

  • Some variants will have slightly different semantics, which can affect performance too. The condition key == 'foo' or key == 'bar' will fetch the variable key twice, whereas key in ('foo', 'bar') fetches it once.
  • What’s it like if you add more options? Let’s say that 'fum' becomes a valid option too. How much effort is it for you, and how error-prone, to add it to the list? On that basis, I would recommend one of the inclusion options rather than multiple equality checks.
  • What if some are constants and some are variables? There’s a huge performance difference between if x in ['foo', 'bar', 'fum', 'spam', 'ham']: and if x in ['foo', 'bar', fum, 'spam', 'ham']: due to optimizations that can’t be done in the second case. Test things out with the dis module.

Absolutely it does! You can find out more with the dis module. As long as the values in the collection are all constants, and the only thing you do with the collection is test for membership, the collection is created once and retained. For somethiing typed in as a list or tuple, a tuple is retained; for something typed in as a set, you get a frozenset instead. Either way, this is quite a bit faster than constructing the collection every time.

There’s a lot to play around with. I definitely recommend experimenting. Just be aware that you’re playing around with some very small numbers, so most of the time, the correct conclusion is “it doesn’t matter”. But it’s still a lot of fun to try things out!

If the values you are comparing to are variables or expressions, chained equality is about half the cost of using a tuple:

[steve ~]$ python3.10 -m timeit -s "a = 1; b = 2; x = 3" "x == a or x == b"
10000000 loops, best of 5: 34.3 nsec per loop
[steve ~]$ python3.10 -m timeit -s "a = 1; b = 2; x = 3" "x in (a, b)"
5000000 loops, best of 5: 62.2 nsec per loop

If you pre-build the tuple ahead of time, it becomes just as fast as chained equality:

[steve ~]$ python3.10 -m timeit -s "a = 1; b = 2; t = (a, b); x = 3" "x in t"
10000000 loops, best of 5: 36.5 nsec per loop

If the values are constants, the tuple version is optimized to be virtually as fast as chained equality:

[steve ~]$ python3.10 -m timeit -s "x = 3" "x in (1, 2)"
10000000 loops, best of 5: 36.3 nsec per loop

But the fastest tactic seems to be to use a pre-built frozenset:

[steve ~]$ python3.10 -m timeit -s "a = 1; b = 2; t = frozenset((a, b)); x = 3" "x in t"
10000000 loops, best of 5: 23.1 nsec per loop

So it looks like there is still a bit of room to optimize the tuple of literals case, at least in Python 3.10.