The decimal module has a mature, well-supported and complete C implementation with the mpdec library, thanks to Stefan Krah. (Missing parts are some new formatting options, where we use fallback to the pure-Python version.) We are going to remove bundled copy of it with the 3.16 release.
Should we now preserve the pure-Python implementation?
It’s rather uncommon for other cases, where we have C-coded versions. The bugtracker also full of issues (mostly closed as wontfix), where people mentioned some differences for two implementations of the decimal module. Just few examples:
The source of this issue is obvious: the decimal is a big module, the pydecimal.pyhas now 6389LOC, which is comparable with the C wrapper to libmpdec, _decimal.c. (In fact, it’s a biggest top-level module in Lib.)
I don’t think that _pydecimal.py has any practical sense on systems, where C-coded extension is available, which is now all Tier 1 platforms. It’s used on WASI, but honestly I don’t think that the decmal module is something, that must be present on any system.
This code also used by some other Python implementations (PyPy, maybe more) — that’s one reason I’ve found to keep this pure-Python module in the CPython tree (see PEP 399). Are there other excuses for this maintenance burden? E.g. I’m not sure if this module is helpful for teaching (I would rather use Fraction class as an example of a numerical type).
I don’t have much involvement in decimal, so count me only -0 against removal of the Python version. I expect we will probably see WASI/Emscripten move to Tier 1 someday, and I’m not sure I agree with the argument for removal just because platforms are not yet Tier 1.
The other point worth noting beyond the PEP-399 rationale is that Python sources are much more easily read and introspected from regular installations — even when using _datetime, pydatetime has been useful in the past to quickly verify how a function works.
The argument here is that not all stdlib modules are available on all platforms. IMO, it’s fine for the decimal module.
Thanks, this PEP should be mentioned. Though, I think that it’s main argument is the existence of other Python implementation (and that was mentioned).
The pure Python version isn’t only there for platform support. It’s also there so that interpreter implementations that don’t support the C extension modules can use the pure Python version instead.
Edit: I missed the last paragraph of the OP on initial reading. Centralising the support burden of the pure Python versions is an intentional policy decision, since we want to lower barriers to experimentation with new interpreter implementations, not raise them
I for one feel very cpython biased here in wishing we didn’t have this maintenance burden of parallel implementations and could revisit PEP 399 like decisions. It made sense in the community at the time. But I don’t have the visibility outside our project to understand why it still should to feel great making such a decision.
A pile of std questions arise in my mind:
What parts of PEP-399 still make sense? maintenance is not free.
What other Python implementations that use any of these are actually relevant today?
Which of these pure Python module implementations do said VMs actually use in a manner that is actually important to the other-VM users? (support matrix, including how language version up to date each is)
Could the other VMs not take on the follow-to-match-cpython-behavior maintenance of each of these in a collective separate repo?
Why should that maintenance burden be ours to bear?
This doesn’t necessarily need a blanket PEP 399 rewrite, I think each of these pure Python sibling implementations (there aren’t that many) futures could be decided on a case by case basis.
That’s primarily a timing issue (PEP 399 wasn’t imposed retroactively on existing modules), but exceptions do get granted (especially for modules that are just thin wrappers around third party libraries).
The big ecosystem level benefit of CPython taking on the maintenance burden is that the dual execution of the test suite means that new features are added to both implementations in parallel rather than the pure Python one lagging behind as other implementations play catch up. I’m not sure if there’s anything other than the decimal module that is currently significantly impacted by the policy, though.
And if stalemate wins (as I suspect) in the decimal case, I think we should solve some issues to reduce maintenance load.
For instance, how to sync interfaces and docstrings wrt different implementations? Preferably, there should be just one source of this data (sometimes it does make sense to have different function signatures and/or different documentation for C and Python implementations, but I believe such cases are rather rare).
Well, maybe. This argument is natural and come in my mind before reading PEP.
But… It’s easy to make a CFFI wrapper to libmpdec for PyPy. And I guess, that such decimal module will be more performant. Though, it’s missing, due to existence of the _pydecimal.py in the CPython tree…
I fundamentally disagree with the argument put forward in this paragraph.
“Tier 1” is not for choosing whether features are available or not. It’s for whether we block release due to known issues. Only if every platform in every tier considered mpdec the de facto standard library for this (as they do for math and cmath) would I consider removing the pure-Python version. One exception in our own support matrix is enough to justify keeping the portable version.
There is no system libmpdec on Ubuntu 24.04. I didn’t notice it for half a year.
The pure Python implementation uses more efficient algorithm for some operations (e.g. Decimal-to-string conversion).
If there are two implementations, the Python implementation is usually primary. The C implementation can miss some corner cases and lack support of some features of the Python implementation (there are many examples for pickle, ElementTree, datetime, etc that I have worked with over the past year). When you add a new feature, it’s convenient to implement it in Python, write tests, and then implement it in C.
This seems to be case for _decimal, though lack of features is limited (some formatting stuff). In this case we should try to push missing code to the C version. (Ditto if some pure-Python algorithm is better, that I doubt.)
BTW, I think that removal of the _pydecimal is out of the question, per this thread. But we should think how to reduce maintenance cost. E.g. for docstrings (right now we have separate copies for C and pure-Python versions).
Stefan Krah used _pydecimal.py to prototype heroic efforts at making Decimal.pow() always return the infinitely precise result when it was representable in the current context. Last I looked (about a year ago?), libmpdec still settled for “really close, but maybe not exact in all possible cases”. I don’t know that anyone is relying on the Python version’s stronger guarantees, though.
“Push missing code to the C version” in this case means pushing into C code we don’t control (libmpdec is external to us, of course). We have a C wrapper, but in his days here Stefan also maintained that, We’ve fiddled a bit with that since on our own, but I don’t think anyone stepped up to fill the void of “deep knowledge” about it.
I agree. decimal became part of the core language before libmpdec existed, and it would be foolhardy to rely on its seemingly sole developer to keep libmpdec available, and especially given the - umm - prickly relationship he had with our community.
I’ve spent a few days overall myself wrestling with _pydecimal.py issues, but the bright side is that the time I spend on that is time I don’t spend annoying people with posts here .
My memory of that was faulty. Achieving just that much is satisfied by any implementation that “merely” guarantees strictly less than 1 ULP error.
_pydecimal.py is much stronger than that: guarantees correct rounding in all cases, under the full meaning of “correct” for nearest/even rounding: <= 0,.5 ULP error, and with ties rounded to even. It’s impossible to do better than that “even in theory”. In fact, the Python docs claim even more:
power(x, y, modulo=None )
Return x to the power of y, reduced modulo modulo if given.
…
The result will be inexact unless y is integral and the result is finite and can be expressed exactly in ‘precision’ digits. The rounding mode of the context is used. Results are always correctly rounded in the Python version.
That is, the Python version also intends to honor the rounding mode in effect, and to trigger the “inexact” flag only when the delivered result does suffer a rounding error.
Very ambitious; hard to achieve, requiring both careful code and careful numerical analysis informing the code.