To experiment with faster float-to-string conversions there is now the floatium package. The package makes repr(float), str(float), f"{x:.3f}" and similar conversions much faster (and also speeds up float(str) parsing). Install it with pip install floatium. Benchmarks on CPython 3.14.3:
Corpus
Operation
Stock (ns)
floatium (ns)
Speedup
random_uniform
repr(x)
284
96
2.95×
random_uniform
f"{x:.4f}"
119
103
1.16×
random_uniform
float(s)
121
44
2.79×
random_bits
repr(x)
820
134
6.11×
random_bits
f"{x:.4f}"
1,933
196
9.86×
random_bits
float(s)
275
61
4.52×
financial
repr(x)
171
80
2.14×
financial
f"{x:.4f}"
145
101
1.43×
financial
float(s)
37
36
1.01×
scientific
repr(x)
640
135
4.74×
scientific
f"{x:.4f}"
1,081
161
6.71×
scientific
float(s)
212
58
3.64×
integer_valued
repr(x)
143
88
1.62×
integer_valued
f"{x:.4f}"
169
106
1.60×
integer_valued
float(s)
43
42
1.02×
The package should be fully compatible with cpython.
The microbenchmarks show significant improvements, but does this have impact on real-world cases? If so, please report it (github, here or via DM).
Please report any bugs or differences on the github issue tracker, we will use it to improve the cpython unit tests and the package
Will this be in cpython? Probably that will take some time. The backends implemented in floatium are either C++ (currently not supported in cpython) or slightly modifed versions of the packages (for performance reasons). For inclusion in cpython we probably need a fast, fully compliant, unmodified, C based implementation that replaces dtoa.c.
Those benchmarks are for C++ backends, right? What will be for C libraries?
Have you tried to run the whole CPython test suite with this?
(1) is relevant only if we will provide bundled libraries in the CPython source tree (with configure options to override this). Though, in principle we could require such libraries from the system, like we did for the libmpdec now. In this case language doesn’t matter. Maybe we shouldn’t exclude such possibility?
(2) looks more severe for me. Are whose modifications discussed with upstream?
IIUC, in both cases (C or C++) we ended with at least two required external dependencies, right?
If you’re going down that route, there’s a rust port of the current state-of-the-art (faster than Ryu and everything since then) that would be a good candidate.
Not sure if it qualifies, but you can check out copium, full re-implementation of deepcopy(), initially written in C, then rewritten in Rust with zero performance cost.
Haven’t had time to polish it for a proper announcement yet (config/patching API may change, FFI-boundary abstractions are to be redesigned), so please consider it unstable for now. That said, feedback is welcomed!
Full benchmarks for the backends are at floatium/BENCHMARKS.md at main · eendebakpt/floatium · GitHub. Unmodified Ryu (C version) is not one of the backends, but it performs good (except for a few minor regressions which are handled in the modifications).
Yes locally. With the latest version of the package it passes.
It is possible. But with an external library that is not guaranteed on all platforms we would have to keep the current dtoa.c in place, which I would prefer not to.
Not yet, but I plan to at some point.
Yes, one for float-to-str and one for str-to-float. Replacing those in cpython can be done independent (although both conversions share the same the BigInt code from dtoa.c so it makes sense to combine them).
I also would prefer to drop custom dtoa.c code. But I’m not sure if we have to provide a fallback path (current “legacy repr” code also could be used for this, in principle).
That’s unfortunate. Maybe you can try more libraries, perhaps written in Rust?
How portable those libraries, can we remove “legacy repr” and always rely on their code?
If a platform is currently using dtoa.c (“short float repr”), I don’t think we should switch it to the legacy repr mode as part of this change. That would be extremely unpopular.
And ideally if we’re changing things we could take the number of platforms using the legacy code to zero and delete it. I don’t know if that’s realistic.
I experimented with using a modern float to string algorithm in cpython last month, wondering if there were any quick wins. While it was easy to make microbenchmarks impressively faster, I was unable to see any worthwhile difference in the pyperformance test suite, so I tabled it for the time being as not being worth the volume of code required.
The reality is… how often does Python code really generate strings from floats in sufficient quantity for this to be noticed by real world applications?
Things serializing mass quantities of floats… are presumably likely use another data format? Or if they’re emitting JSON full of decimal floats, they can use a library like ujson which is known to use best in class float conversions both ways.
My own focus was on float to str, but it is also worthwhile doing str to float in a more modern manner. But we shouldn’t have it add a huge pile of code unless it is actually meaningful. pyperformance may be lacking a real world application where it would be.
I gave it a try in pylint for the scientific notation checker. It calls float() on every float literal and builds suggestions with repr(), str() and format() on big floats, so it looked like a natural candidate. It’s not so heavy in float/str conversion after all.
On the raw primitives I do reproduce a speedup, float(str): 1.29x, repr(float): 1.50x, str(float): 1.63x, f"{x:.15g}": 1.61x. It seems the lengthier the float the greater the speedup, but on my corpus there were 6 to 8 significant digit in general.
I measured roughly 15 to 21 ms of added interpreter startup from the import and autopatch. It’s free for a long-lived process, but for a short-lived CLI like a linter (where you might lint one file with 10 floats, or 439 floats for the whole cpython project) it outweigh the conversion savings. The startup time would need to be less then 5ms to be worth it for 10^6 linted float.
Thanks for reporting, this is valuable feedback. Most of the import time was due to floatium importing packages like typing and contextlib (these packages are typically avoided by CLI apps). I created a new release where the import time is < 2 ms.
I would very much like to see one more benchmark, that’s a proxy for cache trashing.
Python programs typically spend relatively little time formatting floats. Thus it’s very important to test relative speedup in a complex program that does a certain (smallish) fraction of float ↔ str conversion.
As a thought experiment, consider a library that performs the conversion using huge lookup tables. It could be faster in throughput (converting many numbers) but not latency (competing for cache with the Python interpreter).
Nice, I see about 2.7 ms of overhead, now. I tried on astropy (970 files, 19,936 float literals, the floatiest repo I could find). Stringifying every astropy literal once costs 4.4 ms without floatium vs 2.8 ms with floatium (over a 1300 seconds lint). Interesting to note also that around 80% of float in astropy are small (0.5 / 1.0) and Cpython is already very fast for those. The output using floatium is exactly the same though.I didn’t find any mismatches.
Considering it’s +1.1 ms with floatium (startup of 2.7ms + 2.8 ms vs 4.4ms) in the best case linting scenario (whole repo at once, very float heavy one), the only realistic way this can be used in pylint is if that algo is merged in Cpython directly imo.