The cost of nested definitions

My problem is advanced and out of the scope of the question. So I’ll keep it simple. Imagine the following:

This is a set of helper functions, they only serve to make other functions more readable.

def functionA(*args, **kwargs):
        ...

def functionB(*args, **kwargs):
        ...

def functionC(*args, **kwargs):
        ...

But then I have my main functions:

def mainA(*args, **kwargs):
        ...
        a = functionA(*args, **kwargs)
        ...

def mainB(*args, **kwargs):
        ...
        b = functionB(*args, **kwargs)
        ...

def mainC(*args, **kwargs):
        ...
        c = functionC(*args, **kwargs)
        ...

Thus functionA is only used by mainA ever! Etc. For functionB, C and mainB and C.

Would it be mode optimized to do this then?

def mainA(*args, **kwargs):

       def functionA(*args, **kwargs):
        ...

        ...
        a = functionA(*args, **kwargs)
        ...

def mainB(*args, **kwargs):

        def functionB(*args, **kwargs):
        ...

        ...
        b = functionB(*args, **kwargs)
        ...

def mainC(*args, **kwargs):

       def functionC(*args, **kwargs):
        ...

        ...
        c = functionC(*args, **kwargs)
        ...

My experience from C says, when the file is analysised, the functions would be created and called once. But if they are nested in the functions, they will be created on every call. Thus reduce performance.

Nesting them is purely for readability and keeping everything together. But am I impacting performace here?

Edit: similarly this can be used in classes, does that also possibly affect performance?

It would have a performance cost, but incredibly minor - some brief testing suggests it’s similar in cost to making a dict. Functions are split into two different objects - the code object which contains all immutable data, and a function object which is mutable. During parsing/compilation of the module, the code objects are all created. Then when you define a function, it just takes that constant and creates the function object.

The bigger cost is something else really - since they’re nested, these functions are inaccessible, so you can’t do things like call them from testing code. You might be fine with that, it depends.

1 Like

If you want to know that the performance impact of a code change or alternative implementations are you can use the timeit module to benchmark the code.

If you are interested the seeing the code that is being compiled for the python you can use the dis module to disasemble it, see dis.dis().

2 Likes

This is actually wrong. C doesn’t re-create anything for a nested function (which it does support), because there is nothing to create. Functions aren’t first-class objects in C; there’s just the compiled code. You can make pointers to functions, but the only thing you can really do with them - portably - is store the pointer and use it to call the function. In particular, C doesn’t natively support closures; if you try to return a pointer to the local function, you can’t expect it to work right for the caller (the outer function’s locals are gone from the stack by that point).

Python does create objects that represent functions, and which support closures; and for a nested function it will need to do this every time. However, it does not recompile the code or anything like that. It already has compiled bytecode stored, that it can use to create the new function object. (It may also need to create some “cell” objects that support the closure: because it is late-binding, it can’t just assign the current values of the outer function’s locals.)

The creation does not take a lot of time:

>>> def outer():
...     def inner():
...         pass
... 
>>> def simple():
...     pass
... 
>>> import timeit
>>> timeit.timeit(outer)
0.10839796392247081
>>> timeit.timeit(simple)
0.04258948704227805

For comparison:

>>> def noloop():
...     for i in range(0):
...         pass
... 
>>> timeit.timeit(noloop)
0.12932194117456675

Creating the function object took less time than even setting up a trivial loop that doesn’t even run. (The default compiler really does not try to optimize much of anything at the bytecode level; historical speed improvements in Python, as far as I’m aware, have focused on the interpreter and a bit on bytecode design.)

1 Like

There is potentially some additional cost, since nonlocal lookups are required for some things that might otherwise be local or global (both of which are faster, moreso with locals). But (a) this cost is probably insignificant, so spend more time thinking about how it looks in source code than how it runs; and (b) if it really worries you, time it. Measure. Don’t just hope for the best - try it on your actual code. You can easily do microbenchmarks to find out how long it takes to construct a function or look up a variable, but to know how much that will actually affect your code, measure your code! There’s no substitute for data. :slight_smile: