Name for new itertools object to make iterators thread-safe

In the free-threading build several iterators are not thread safe under concurrent iteration. Work is in progress to prevent concurrent iteration to corrupt the interpreter state, but it is acceptable for an iterator to return “funny” results. For example, concurrent iteration over enumerate(range(10)) can lead to values (5, 6), instead of only pairs (a, a). For more information see gh-124397 Strategy for Iterators in Free Threading and PEP 703 Container thread-safety.

To make an iterable thread-safe, we will create a new object in itertools. The current working name itertools.serialize, but we do not like the name. A rough Python equivalent is:

class serialize(Iterator):
    def __init__(self, it):
        self._it = iter(it)
        self._lock = Lock() 
    def __next__(self):
        with self._lock:
            return next(self._it)

A draft PR with a C implementation is gh-124397: Add itertools.serialize (name tbd) by eendebakpt · Pull Request #133272 · python/cpython · GitHub

Any suggestions for a better name?

4 Likes

itertools.thread_safe()?

3 Likes

This is to be used in place of iter so the name should involve iter like iterlock, iter_locked, …

2 Likes

It’s a bit of a mouthful, but how about itertools.sequentially_consistent?

2 Likes

I would think that usually it will be passed an existing iterable, and so spelling it thread_safe() (or something like that) would be intuitive to me. i.e. “give me a thread-safe version of the iterator I want to use”.

It’s a tricky bit of naming because of how iterables are implicitly coerced into iterators in a lot of code.

3 Likes

The fact that it is in itertools kind of suggests that it is to do with iter, but a bit longer name to make it more explicit could be also an option.

Especially when it is imported via from ... import ....

Then somewhere far down in code:

b = thread_safe(a)

might be lacking a bit of context to figure out what it is quickly as the name is a bit too general which can be easily used to implement some local function.

Thus, I would not be against (and probably in favour of):

b = thread_safe_iter(a)

Or:

b = iter_thread_safe(a)
1 Like

I believe that threading and asyncio are better places for iterator wrappers that are thread-safe and asynchronously safe respectively.

We also need to think about decorator names that will make the generator function thread-safe or async-safe, respectively.

10 Likes

Is it acceptable, really? Would it be that costly to ensure that all built-in iterators are thread-safe by default?

7 Likes

We should definitely do this - nobody should need to be wrapping enumerate or range.

But we probably also need a primitive for wrapping up pure-Python iterators that weren’t written with threading in mind (currently that’s practically all of them, and anyone trying to run in free-threaded is going to be blocked if they don’t have their own mechanism).

8 Likes

Agreed, but those are already not thread-safe, so there’s no breakage of working user code :slight_smile:

In the GIL build, we get a RuntimeError if we try to use the same generator created by a generator function simultaneously in different threads. So these wrappers would be useful even in the GIL build.

The cost of making built-in iterators thread-safe using locks is about 5% for typical cases (see Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading · Issue #123471 · python/cpython · GitHub for benchmarks on itertools.product, I measured roughly 5% for the builtin range as well). But it was decided adding locks is not (always) worth the performance hit (Sequence iterator thread-safety · Issue #120496 · python/cpython · GitHub, Strategy for Iterators in Free Threading · Issue #124397 · python/cpython · GitHub).

For further discussion on this I suggest opening a new topic.

Thanks for the suggestions. I decided to move the new object to the threading module and go with the name iter_locked. Rationale:

  • iter_locked (or locked_iter) describes well what the object does
  • sequentially_consistent is quite poetic, definitely my runner up
  • thread_safe in the name suggests it is safe, but with concurrent iteration things can still go wrong

The final name is not set in stone, the new PR gh-124397: Add threading.iter_locked by eendebakpt · Pull Request #133908 · python/cpython · GitHub still has to be reviewed by a core dev.

2 Likes

This rationale is not very consistent with the PR documenting the function as “Make an iterable thread-safe”.

1 Like

These are micro-benchmarks and are probably not representative of typical cases. 5% is actually quite small for a micro-benchmark.

I will let moderators split the thread if they want to.

I agree that 5% is acceptable for reliable semantics. I also don’t expect raw iterator performance to be a major bottleneck in overall performance numbers.

Now if one wanted to introduce non-thread-safe iterators for performance, then I could see such a library existing on PyPI.

But personally, I think the stdlib should prefer unsurprising semantics over raw performance when we have to make a choice between the two options.

12 Likes