Adding Call Context to concurrent.futures.Future

tebeka · July 16, 2023, 3:19pm

Hi. I’d like to suggest adding call context to concurrent.futures.Future. I’d love some feedback on the proposal.

Rationale

Assume you scheduling several callable to be executed on an executor. When you get to extracting the result from the future, you need some external information to connect the callable to the future.

For example, in the official documentation we’re using a map from the future to the URL.

We suggest adding the call context (fn, args, kwargs) to the future returned from “submit”. This way users won’t need external data structures to keep mapping of future to context and it’ll also be possible for future callbacks to access this information.

The code in the documentation example will become:

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    futures = [executor.submit(load_url, url, 60) for url in URLS]

# Start the load operations and mark each future with its URL
for future in concurrent.futures.as_completed(futures):
    url = future.context.args[0]
    try:
        data = future.result()
    except Exception as exc:
        print('%r generated an exception: %s' % (url, exc))
    else:
        print('%r page is %d bytes' % (url, len(data)))

Proposed API

concurrent.future.Future will gain a new “context” attribute which is an object with three attributes:

fn: the callable
args: The *args to the callable
kwargs: The **kwargs to the callable

Alternative APIs

Add fn, args & kwargs directly to the future object
Allow the user to privde “context” (any object) to the executor “submit” function. This context will be added to the future

pf_moore · July 16, 2023, 3:41pm

This sounds like a nice addition. I’ve certainly had cases where I could have used this. It’s not hard to do it manually, just by returning a tuple of the actual result and any context you need, but it is extra boilerplate. Having the data available automatically would be convenient.

EpicWink · July 16, 2023, 8:37pm

This proposal would extend the lifetime of the function and it’s input arguments beyond when the future is complete, which in stone scenarios may be an unacceptable or even breaking usage of memory.

Perhaps this functionality could be enabled via an argument to the executor, making it backwards compatible.

tebeka · July 17, 2023, 6:09am

Good point. IMO when the future object is GCed then the references to the context is decreased and then they will be freed if refcount reaches 0. If the future is still reference by something - it probably means they want access to it. The future object already holds reference to the result and exception so I think it’s OK.

The other option is to use a weakref for the context.

csm10495 · July 18, 2023, 5:09am

I’d use this too.

I wonder a bit about the memory usage. If you currently are using a ProcessPoolExecutor and get this update, in theory you can have objects that were used only as args/kwargs that now would be living longer in the memory of the original process.

In general I guess that doesn’t matter but if someone is using a lot of memory, this could be enough to push them over the edge.

Funny enough after writing this, I figure that most additions could have the same argument: ‘this adds some data that could mess with someone on the edge of their max memory’.

Another thought: Would args/kwargs be copied or be ‘by reference’? The reason I ask is in ThreadPoolExecutor technically if its the same values should they by Lock’d … and then editable inside the thread itself? Like if an arg is a dict and I add a value to the dict inside the execution, would it be in future.context.args[0] ?

To try to make this thought make more sense:

def mess_with_dict(d):
    d['hello'] = 'world'

with concurrent.futures.ThreadPoolExecutor() as executor:
    my_dict = dict()
    future = executor.submit(mess_with_dict, my_dict)
    future.result()

assert my_dict.get('hello') == 'world'

# are they the same object? .. what if I used ProcessPoolExecutor?
assert my_dict is future.context.args[0]

tebeka · July 18, 2023, 5:51am

If you keep the future, then the context will be there as well. Once the future object is GCed, so will the context (if refcount reaches 0). To be extra cautions we can think about using weakref but IMO it’ll make the API more confusing.

As for thread vs process: Even though the API is similar there are difference that you need to be aware of even without the context (e.g. not everything is pickleable). In the last assert use = and not is

csm10495 · July 18, 2023, 5:58am

Oh I specifically used is because I’m wondering in the thread case if they’d be the same object?

(So I guess same id(..) in CPython)

tebeka · July 18, 2023, 8:44am

Yup, same as id You can’t use is if you’re crossing process boundary so the user will need to be aware of the specific executor used.