When exactly was this bug fixed?

I found re.split doesn't split with zero-width regex · Issue #47512 · python/cpython · GitHub via the old bug tracker, via an old accepted Stack Overflow answer. I don’t see any resolution described. According to Github’s migrated version, the issue was closed in 2010 as “completed”, but the archived bug tracker states that the resolution for the bug is “rejected”… and mentions a patch being accepted in merged in 2017, way after the supposed closure.

It seems there was a patch proposed as early as 2004, which got rejected when the corresponding issue was marked as a duplicate of the first one. Then there’s another report from 2003 that also seems to have gone nowhere.

Indeed, the bug is no longer there in contemporary versions of Python. For example:

$ python
Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.split(r'\b', 'a b')
['', 'a', ' ', 'b', '']
>>> re.split(r'(?<!\w)(?=\w)', 'a b')
['', 'a ', 'b']

I don’t know how to parse all of this. It seems like information got mangled by the bug tracker migration, and the legacy format is basically incomprehensible to me.

What version of Python actually introduced the fix?

In general, how am I meant to answer such a question myself?

(Reading the Stack Overflow page a bit further, it seems people claim the bug was fixed in 3.7. I think that lines up with the 2017 timeline? but that doesn’t give me clear evidence or documentation. It also seems strange that it really would have taken that long.)

The re.split doc says 3.7. What’s New In Python 3.7 mentions it in two sections (search for “split” on that page), which refer to issue and issue, where I think you can find the info.

1 Like

The most visible change is the change in re.split(). This is compatibility breaking change, and it affects third-party code. But ValueError or FutureWarning were raised for patterns that will change the behavior in this PR for two Python releases, since Python 3.5.

That explains at least part of the lag time, anyway.