struct PyLongWriter
A Python int writer instance.
The instance must be destroyed byPyLongWriter_Finish()
Is it correct? It’s not obvious for me from PEP’s text, when PyLongWriter_Discard should and should not been called.
struct PyLongWriter
A Python int writer instance.
The instance must be destroyed byPyLongWriter_Finish()
Is it correct? It’s not obvious for me from PEP’s text, when PyLongWriter_Discard should and should not been called.
Do you mean that this PyLong_Export
will fail with small ints?
That doesn’t sound easier for the user:
Both digits_array
and value
:
PyLong_Export
PyLong_AsNativeBytes
; use the result or failWith only digits_array
:
PyLong_AsLongAndOverflow
PyLong_Export
digits_array
, or use compact value
data, or fail, or continue:PyLong_AsNativeBytes
; use the result or failConceptually, IMO it’s better for PyLong_Export
to give you the internal representation of an int, rather than only do that for “big enough” ints – which suggests PyLong_Export
itself should be a fallback for some other strategy.
Please don’t use enum
in public API. The size of an enum is rather hard to get correctly without a C compiler, and the API should be usable in non-C languages too. Use an int
here with:
#define PyLong_ExportType_Error -1
#define PyLong_ExportType_DigitArray 0
That’s optional. E.g. current gmpy2 code just uses ob_digit
.
But we don’t have yet a really different representation for small integers. As you can see,
static inline Py_ssize_t
PyUnstable_Long_CompactValue(const PyLongObject *op)
{
Py_ssize_t sign;
sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
return sign * (Py_ssize_t)op->long_value.ob_digit[0];
}
So, proposed version with just one export format — covers all integer ranges. Someday this might fall for small integers, then we have to introduce a new export format.
If that’s the case, gmpy2 should only need PyLong_Export
– it should not need to use PyLong_AsLongAndOverflow
at all. Otherwise, PyLong_Export
fails its mission to give you the data in the fastest way we can (with long-term support).
I guess that’s one performance benchmarks. Is skipping a PyLong_FreeExport
call worth the overhead of an extra branch? (Plus all the other differences, but this seems like the main one.)
This being rather specialized API, I don’t think ergonomics are an overriding priority, compared to:
switch
with 2 existing cases is easier to extend if/when a new format comes up) and docs (with 2 cases they would IMO be clearer, even if they’re longer).I would rather say that it’s mission is to provide maximally transparent access to int’s internals, especially in case of “big” values, where no public API exists yet. Currently it’s just an array of “digits”. I think we should create a new export only when this simple picture will be broken.
No, PyLong_AsLongAndOverflow
provide some performance boost in case, when long value doesn’t fit in one digit.
Probably it’s better to leave some micro-optimizations for small integers to users. E.g. previously one was rejected in the gmpy2 Update int <-> mpz conversion by skirpichev · Pull Request #484 · aleaxit/gmpy · GitHub
I think it’s documented, no need to guess.
Hi,
I didn’t reply recently since I got overwhelmed by the discussion. I expected that the C API Working Group would take a decision, either accept or reject the PEP, rather than restarting the discussion (almost from scratch). The current PEP 757 design is the result of many months of work and tries to address most known issues. It’s not perfect, but it’s the best that Sergey and me could propose.
Sergey: I don’t understand why you accepted to add PyLongExport.value
and now you are moving backwards and argues against it. There are reasons why we added this member:
int64_t
is a nice trick to handle them. PyLong_Export() caller doesn’t have to attempt different APIs to check if it’s a “small” or “compatct” integer, it’s part of the API.About preparing the structure for a potential future different export format: I really dislike this, I don’t want to waste memory until it’s needed (adding a reserved
member). If tomorrow PyLongExport
structure no longer fits, we should deprecate the PyLong_Export()
function and add a new one. It’s something very common in Python C API to deprecate functions and replace one function with another. Trying to make PyLong_Export()
“too future proof” would make it really hard to use in practice.
I has already been said, PyLong_Export() format relies on the “native layout” which is flexible enough to support any digits array. It sounds very unlikely that a new “bigint” implementation doesn’t fit with its design.
I also dislike changing PyLong_Export()
return value to return a kind which can be a new kind in the future. It sounds very unconvenient to have to support future hypothetical formats without knowing them in advance. It was proposed to kill the process in this case: I dislike that I don’t think that falling back on PyLong_AsNativeBytes()
makes sense here. If you already support PyLong_AsNativeBytes()
, just call it instead of PyLong_Export()
.
Last but not least, @encukou wants to remove the “always succeeds if it’s an integer” sentence. If tomorrow, Python internals evolve, maybe PyLong_Export()
has to allocate memory and so can fail. That’s why it’s paired with PyLong_FreeExport()
. I’m open to remove this sentence, I created a PR for that, but @skirpichev doesn’t seem to embrace it yet (at least, he didn’t approve it yet).
In short, I spent enough time on PEP 757, its API is not perfect but I don’t want to rewrite its API anymore. If you have question about why it’s designed this way, just ask. If you consider that the API is not good enough for your criteria, maybe we should just give up on PEP 757 abstraction and leaves the situation as it is: gmpy2 and SAGE use CPython internals and don’t support PyPy, and Python-FLINT uses an inefficient hex export (and so support PyPy). These projects are working well with the current code, so maybe it’s not worth it to provide a “better” API.
I propose to only make a single change to the current PEP 757: remove the following sentence.
This function always succeeds if obj is a Python
int
object or a subclass.
No, I think we don’t want to have to introduce new CPython APIs each time an implementation details changes. We would like third-party libraries to be able to get good performance while not changing their code every two years to accomodate for internal CPython changes.
Also, ideally this API would end up in the stable ABI, which requires that it’s not too coupled to internal implementation details.
In general, I’m agree with you.
But what do you think about my question on PyLongWriter_Discard?
I was thinking that this trick allows us to have both (1) the fixed PyLongExport
structure and (2) the PyLong_Export()
API, that doesn’t fail.
But that simple picture was objected. It seems, to address @encukou criticism and save (2) - the PyLong_Export()
must return some code for the export type we offer (for the given integer value). With new CPython release - new export types might be introduced.
So, with this - now there is no real need to include anything else than simple “array of digits” view. Currently, it’s an optimization, that has some benefits and some drawbacks (that actually will depend on used bigint library). Lets offer just a simple view for our end users and let them do micro-optimizations for their workload.
@encukou wants to remove the “always succeeds if it’s an integer” sentence.
Export API, that fails - doesn’t make sense.
But I thought that this @encukou objection is gone with addition on the export format type. Is this not true?
See referenced above pr.
Like the Unicode export, this isn’t an export API, it’s an efficient access to internals API. If we can’t provide efficient access, we should fail so that you can go use an API that will succeed.
We have hundreds[1] of ways to reliably export from a Python int, but none with anything approaching guaranteed performance characteristics. This API can guarantee that the time taken does not scale with the value, but the tradeoff is that if we can’t guarantee that, we have to fail rather than silently doing it the slow way.
If you just want a successful export, use AsNativeBytes
, but it might be slow for some values.
If you want a fast export, use Export
, but it might fail for some values (at some point in the future).
(FWIW, I agree with expanding the return value, provided we require callers actually use the value now. So the switch () case Py_Long_Specific_Macro
example is fine, but a >= 0
check is not. That way we can keep the ABI stable, change the format, and not break anyone as long as we don’t expand the total size of the union.)
Okay, maybe like 15. ↩︎
I lost track of this for some time during the discussions above but I’ve just read through the PEP and PR again.
The API as shown looks good to me.
We made this change: PyLong_Export() can now fail