Hi. I’d like to suggest adding call context to concurrent.futures.Future. I’d love some feedback on the proposal.
Assume you scheduling several callable to be executed on an executor. When you get to extracting the result from the future, you need some external information to connect the callable to the future.
For example, in the official documentation we’re using a map from the future to the URL.
We suggest adding the call context (fn, args, kwargs) to the future returned from “submit”. This way users won’t need external data structures to keep mapping of future to context and it’ll also be possible for future callbacks to access this information.
The code in the documentation example will become:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(load_url, url, 60) for url in URLS]
# Start the load operations and mark each future with its URL
for future in concurrent.futures.as_completed(futures):
url = future.context.args
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
print('%r page is %d bytes' % (url, len(data)))
concurrent.future.Future will gain a new “context” attribute which is an object with three attributes:
- fn: the callable
- args: The *args to the callable
- kwargs: The **kwargs to the callable
- Add fn, args & kwargs directly to the future object
- Allow the user to privde “context” (any object) to the executor “submit” function. This context will be added to the future
This sounds like a nice addition. I’ve certainly had cases where I could have used this. It’s not hard to do it manually, just by returning a tuple of the actual result and any context you need, but it is extra boilerplate. Having the data available automatically would be convenient.
This proposal would extend the lifetime of the function and it’s input arguments beyond when the future is complete, which in stone scenarios may be an unacceptable or even breaking usage of memory.
Perhaps this functionality could be enabled via an argument to the executor, making it backwards compatible.
Good point. IMO when the future object is GCed then the references to the context is decreased and then they will be freed if refcount reaches 0. If the future is still reference by something - it probably means they want access to it. The future object already holds reference to the result and exception so I think it’s OK.
The other option is to use a weakref for the context.
I’d use this too.
I wonder a bit about the memory usage. If you currently are using a
ProcessPoolExecutor and get this update, in theory you can have objects that were used only as args/kwargs that now would be living longer in the memory of the original process.
In general I guess that doesn’t matter but if someone is using a lot of memory, this could be enough to push them over the edge.
Funny enough after writing this, I figure that most additions could have the same argument: ‘this adds some data that could mess with someone on the edge of their max memory’.
Another thought: Would args/kwargs be copied or be ‘by reference’? The reason I ask is in
ThreadPoolExecutor technically if its the same values should they by
Lock’d … and then editable inside the thread itself? Like if an arg is a
dict and I add a value to the
dict inside the execution, would it be in
To try to make this thought make more sense:
d['hello'] = 'world'
with concurrent.futures.ThreadPoolExecutor() as executor:
my_dict = dict()
future = executor.submit(mess_with_dict, my_dict)
assert my_dict.get('hello') == 'world'
# are they the same object? .. what if I used ProcessPoolExecutor?
assert my_dict is future.context.args
If you keep the future, then the context will be there as well. Once the future object is GCed, so will the context (if refcount reaches 0). To be extra cautions we can think about using weakref but IMO it’ll make the API more confusing.
As for thread vs process: Even though the API is similar there are difference that you need to be aware of even without the context (e.g. not everything is pickleable). In the last assert use
= and not
Oh I specifically used
is because I’m wondering in the thread case if they’d be the same object?
(So I guess same
id(..) in CPython)
Yup, same as
id You can’t use
is if you’re crossing process boundary so the user will need to be aware of the specific executor used.