Accelerator for Ascii85/Base85

The base64 module supports not only standard encodings Base 16, Base 32 and Base 64, specified in RFC 4648, but also non-standart encodings Ascii85, Base85 and Z85. The latter are implemented in Python. While the Python code was hightly optimazed without the loss of maintenability, it is still 100 times slower and uses 50 times more memory than encodings implemented in C. There is a PR for implementing these functions in C.

The question is – should we kept the Python code? binascii is a non-optional dependency of base64, any CPython build should have binascii for working base64. Alternative Python implementation, if they use base64.py, should implement binascii. Should we provide a Python fallback for the case if they don’t implement Ascii85/Base85 in binascii (this code is more complex than for Base 64), or they should just copy the Python code from the older version? Supporting both implementations has a non-trivial cost, so we have to weigh whether it’s worth doing.

If binascii is a non-optional dependency for the more common base64 encoding and no python fallback was needed then, it doesn’t seem worth the hassle trying to provide a pure python fallback for less common encodings.

1 Like

Agreed, I’d drop the old python code.

1 Like

Would not it be against the spirit of PEP 399?

We don’t have to have a pure Python version of every stdlib extension module do we? Only if it can’t be supported on all platforms. This seems like simple enough C code that every C compiler can handle.

Why do we care that it’s slow? Has any actual user ever complained?

The complain was not only that it is slow (100x), but that it consumes enormous amount of memory (50x), comparing with Base 64.

When these functions were added in Issue 17618: base85 encoding - Python tracker, @pitrou initially planned to add both Python and C implementations (to create a Python implementation of binascii), but when we finished with the Python implementation, we did not return to the idea of the C implementation. Mercurial has a C accelerator for their implementation.

2 Likes

I see. Then by all means go ahead!

If there’s already a binascii.py and a _ binascii.c, I would follow the pattern.

The “enormous amount of memory” part can probably be fixed even in pure Python, no?

Yes. I have a patch that does it using bytearray as buffers. But it does not help with performance, it makes it slightly worse. We can also simply split the large input on parts, encode them in performance efficient way and then concatenate results (it is more complicated, it requires correction for prefix, suffix and line wrapping). But there is already a C implementation, 100 times faster and with zero memory overhead, so I stopped trying to optimize the Python implementation.

4 Likes