There are good reasons for why Python doesn’t have automatic tail recursion elimination. (Implementation is difficult; developers need guarantees in order for it to make sense to write recursive algorithms.)
def add_two(x: int) -> int:
return x + 2
def double_and_add_two(x: int) -> int:
y = 2 * x
return add_two(y)
where a function (double_and_add_two) calls another function—with the same signature—immediately before returning, then instead of creating a new stack frame for this inner function call, we can just re-use the stack frame of the outer function and just jump to the beginning of the inner function. This allows recursion without a stack overflow.
Now, we want to allow the programmer to explicitly request a tail call and an error is thrown if it isn’t possible to do a tail call. Rust introduced a new keyword for this: become. The above would then be:
def add_two(x: int) -> int:
return x + 2
def double_and_add_two(x: int) -> int:
y = 2 * x
become add_two(y) # explicit tail call
become can only be used in front of a single function call to a function with the same signature; otherwise it throws an error.
This can also be used for recursive functions:
def factorial(n, k):
if n > 1:
become factorial(n - 1, k * n) # explicit tail call
return k
Don’t recent versions of python implement a limited form of tail call elimination already? I think using an article from 2009 as evidence misses far too many recent changes in how the interpreter works.
Definitely +1 on adding this feature, but I’ll take the opportunity to bikeshed the spelling, as I think it’d be valuable to ensure that the keyword return is still used.
Perhaps tail return?
def factorial(n, k):
if n > 1:
tail return factorial(n - 1, k * n) # explicit tail call
return k
I’m sure everyone is going to have their own opinions about what feels right, but just to offer an alternative perspective, I feel like what’s going on behind the scenes would be different enough here that I would personally want a syntax that’s distinct from return.
Though I wouldn’t have thought of it on my own, having seen become in the example, it feels like a perfect fit since we’re effectively asking (the frame for) this function call to become (the frame for) the other function call, whereas return feels like it’s about switching between frames.
If other languages are also going to use become for this behavior, that might be additional reason to like it.
If I were writing code that used this feature, I would want a real error in the case where I accidentally misapplied it, rather than falling back to the less-efficient regular return. So I would personally want the failure modes (using become / tail return on a function with a different signature, or using it outside of a function, or using it on something that isn’t actually a tail call) to actually raise an exception.
Hang on. Are you demanding that a tail call be recursion, as opposed to calling some other function? That might run into issues with decorated functions, where technically the one you’re calling isn’t the one you’re in - and might not have the same signature.
I like the idea. Something you’ll have to contend with (I don’t believe it’s a nonstarter, just something that requires a plan) is how things like tracebacks should be surfaced to users from within a tail call.
I personally think it’s sufficient for the interpreter to point to the fact that it doesn’t have the entire prior stack frame if it’s explicitly opted into with become, rather than making prior “incidentally” optimizable cases change their existing debugging experience.
Maybe having some new builtin function become(value: Any) -> SOME_SPEZIAL_TYPING_FORM: ... or some tail(value: Any) -> SOME_SPECIAL_TYPING_FORM: ... would be easier, as a thing many people (including myself) like about Python is it’s simplicity, including the fact that there are quite few keywords. Now adding a special keyword for a case really used only for optimization, seems unnecessary. I don’t see many people using some become val or tail return val feature. IMO the Parsing should filter out return tail(...) # Or become(...) and make some attribute in the AST indicate that we will have some special byte-code Opcode, which will tell the Interpreter that this is a tail return.
Maybe it could just set some flag in the Interpreter at runtime then? This flag, say EXPLICIT_TAIL is false most of the time, but during runtime it will make the next few instructions (perhaps set with EXPLICIT_TAIL_LIMIT) to the number we need. IMO this would be the best option then, no?
Interpreters flags are a bad solution, generally… There are plenty of reasons for this, that I will not detail here.
We could also have a decorator like @tail_call (or whatever) but that would imply inspecting and stuff… probably not worth it regarding performance.
The keyword become implies any usage of a variable named “become” will be broken (become = value).
I guess the tail return is the best possible syntax for this. It is only a “half-keyword” (any usage of “tail” name will not be broken). It is quite explicitly a modified type of return (thus it is beginner-friendly). It allows for mixing returns and tail-returns in a function and the parsing has no ambiguity (just like become).
Right, given Python’s dynamic nature, I guess the only real requirement is that the number of function parameters is the same? Whatever is needed so that call frames can be re-used. I’m unfortunately not really sufficiently familiar with Python’s internals to say.
Python, the language, doesn’t have “internals” in that way. CPython, the implementation, does; but if you tie a language proposal to CPython’s current implementation, you are locking every implementation to that. It would be better to define the proposal without reference to internals.
No, please no. Debugging is NOT something that you do and then are done with. Debugging is something that we are always doing. Having poor tracebacks most of the time, with an environment variable to get better tracebacks, is not a good way to do things. I’ve seen that done, and it sucks. Python is better than that [1].
aside from some quirks of asyncio, which I hope can be fixed in the future ↩︎
I think this is likely to be a non-starter as a user opt-in but might be a general optimization. With, as much as people hate it, an environment variable or interpreter arg to opt in/out of how much of the stack is discarded. Optimizations have trade offs.
How many times have people here actually hit a stack overflow without it being something they’ve done wrong? For me, it’s almost always if not always something I’ve done wrong.
It uses tail calls between small C functions that implement individual Python opcodes
and
This is not to be confused with tail call optimization of Python functions, which is currently not implemented in CPython.
So not exactly what OP is suggesting.
I don’t even think that would be a requirement. I’d be willing to bet at some level argument args/kwargs are dispatched from the caller for all functions as a tuple and a dictionary that then get unpacked to locals (*args, **kwargs). I suspect most stack frames are fairly similar, you can access a frame through inspect and every frame has the same attributes according to the docs. That said I know very little about CPython’s internals, but some of python’s having poked it in weird ways.
I’d be slightly ok with this in statically typed compiled languages like rust because it would result in an error for the maintainer during compilation that the code doesn’t behave as they expect, not an error for the end user. Python is dynamically typed and this could result in runtime errors for the end user if they pass in a different type object to a function that uses this.
What counts as a function call? x + y calls x.__add__ or if that fails y.__radd__, but there’s another function along the way that works out which to use. Is invoking a property method something that could be tail call optimized? I would be in favour of letting the interpreter work out the right thing to do and support whatever optimization work the core devs think is highest priority because I’m totally clueless on this matter.
I mean, for the purpose of this discussion, it kind-of does. It would be reusing frame objects and associated attributes for tail calls. Frame objects are in the documentation and I’d say they’re part of the internals of Python as opposed to CPython.
It’s entirely permissible for a Python implementation to just return None there. But more importantly, specifics like “number of parameters” are very much an implementation detail. The normal expectation would be that you can access a dictionary of locals, but knowing how many parameters is less clear. Locking a feature to “tail call only if same number of parameters” seems backwards. (It also doesn’t adequately handle pos-or-kwd parameters; how would you know what the correct way to call that is?)
If Python were to gain a feature of “replace the current stack frame with a call to f(…)”, it shouldn’t matter how many parameters you have. It’s just a stack frame replacement. The biggest problem, IMO, is tracebacks.
And quite frankly, I have yet to see much need for a way to marginally improve recursion at the cost of debugging. If you’re calling yourself, there needs to be a justification for not using a while loop or somesuch. But if you might be calling some other function, the loss of traceback is huge.
Inspect allows you to access Frame Objects via frame info but I was meaning Frame Objects defined in the datamodel docs
The attributes it lists are back pointer, executing code object, locals, globals, builtins and current instruction. I’d assume this is what the tail call optimizaiton would be reusing.