Copy a generator with state

Personally my interest in it, besides being a very cool and impressive feature, as quoted earlier, is being a part of a “durable execution” mechanism.

I have a program that runs some flow/task with multiple steps, that has to wait to be fed some external event between steps. However, my program may need to scale down, update, restart, or stop for any other reason during the waiting.

My ideal solution for this would be a generator. I would run the next() step of it, then pickle it and persist it somewhere. If at any point my program stops, the next time it starts, I can just let it pick off from where it stopped and .send() it the event when it arrives.

The code for implementing the tasks then becomes a lot easier to read and follow. I have a single function that always keeps it’s state between steps. So I can define a variable locally and use it in another step, even if that happens in a different process! The flow is very easy to follow and it just looks like a normal function.

Instead, right now, I have some class that has multiple functions, each representing a step, and another function, something like get_steps, that says the order of the functions. It’s much less readable and each function also requires naming it, which is just an extra burden. In addition, I have to keep track myself of the current state of the task and what step it’s in.

I think picklable generators would be a very powerful primitive for this, and for many other use cases.


I do want to recognize that this would probably require significant work to add to cpython, and as with every idea here, this discussion is worth much less without someone that is willing to implement it, or even a proof of concept.

1 Like

I don’t know how does cpython works, but I realized that we can effectively copy a running generator using only plain python by doing a bit of meta-programing.

It will require significant work and also some hacky code, but I think is possible to do it (or at least a proof of concept for most common cases) I will just need to spend some time on it. I will post again when I have something decent.

Okay, I’m going to adress this thoroughly since I’ve attempted this and understand the topic to an extent.

In cpython, at least in version 3.11 lets say, in short, so long as the things that make up a generator are retrievable and copyable/pickleable it can be copied and pickled.

There will be cases that cannot be copied or pickled and this will be because the assumptions are violated e.g. cannot retrieve source code or frame locals, or the objects used in the frame are not copyable/pickleable.

However, if the assumptions are met, then users can freely shallow or deep copy a generator with state.

Shallow copying may mean both copies will share the state whereas deepcopying may mean the states are initially identical but completely independent.

Why?

This is because if I can get a generators source code you can do meta analysis on it to recreate the generator in a copyable form. E.g. rewrite the source code in such a way so that you can essentially slice it by f_lineno each time and also adjust when inside code blocks. Why? Because you can create a class that when initializing rewrites the code into such form and then every time the __next__ method is used on it, it execs the new prepared full code object and saves the frame. Then, all that needs to be done is to slice the source code and update the variables from the saved frame, thus, we’ve created a manual version of a generator in pure python that only relies on source code and frame locals. Note: I’m oversimplyfing all the headaches that I’ve had to implement that are not mentioned but this is a valid introductory way of thinking of what it’s doing at a high level.

In terms of copying, all that happens is a transference into a new instance of that object with the say same state if shallow copying and deep copying each variable if deep copying.

Again, this is possible only if the assumptions are not violated. This means there are lots of use cases where it works and lots where it does not work. My take on this idea is that assuming people actually want this then maybe it should be introduced to python (once I’ve finished developing and testing it) otherwise it can be a stand alone library available if needed.

Additional notes:

  • There’s further details not mentioned that I’ve figured out like how I’ve managed closures and nonlocals but, in short, it turns out you can retrieve and deal with these.
  • likely carries some overhead mostly in code adjusting every time __next__ is called.

In short, the point I’m making is that you can generalize the idea of copyable generators without changing cpython internals (at least in 3.11 like python from what I’m aware of) but it doesn’t mean all are possible since there are limitations e.g. when you have things making up a generator that are not pickleable/copyable.

1 Like

This is a major contribution, many thanks! I hope we can push this implementation to the python ecosystem and finally get rid of all the issues coming from not being able to copy generators after so many years in the language.

This problem does not have a solution. You can write a code that returns something, and that “copy” will work with your use case, but it will work incorrectly with other user cases, and it is impossible to distinguish correctly working case from incorrectly working case programmatically, because corectness depends on the wide context.

So I’m against including something like this in the stdlib. We already have a negative experience with copying itertools.tee().

1 Like

out of curiosity, can you give short examples of wide context. My custom generator implementation can handle a wide range of things a regular generator does handle and can be converted back into a regular generator trivially if needed.

My implementation handles async generators, .send method, .close method, yield from, yield, yields as values on send e.g. (yield ...) and its nestings, type checking, copying + pickling, nonlocals and closures, exceptions, yield values used as default function args, implicitly defined for loop iterators made explicit, retrieving source code of expressions as well as functions, adjusting generator expressions, can work on initialized or uninitialized (a function that initializes a generator) generators, generator exception handling on .close, and maybe other things that I’m not recalling right now.

With the example you gave before here’s what can happen:

## your example
a = [1]
# infinite loop of appending f(x) e.g. f( f( f(  ... (1) ) ) )
for x in g(a):
    if x:
        a.append(f(x))
## using my implementation (shallow copying)
a = [1]
# infinite loop of appending f(x) e.g. f( f( f(  ... (1) ) ) )
for x in Generator(g(a)).copy(shallow=True):
    if x:
        a.append(f(x))
## using my implementation (deep copying)
a = [1]
# appends f(1)
for x in Generator(g(a)).copy(shallow=False):
    if x:
        a.append(f(x))
1 Like