Should `Py_Is` be preferred to `==`?

dg-pb · December 15, 2024, 11:49am

I had a brief conversation, which I come back to from time to time.

In short, should using Py_Is and other similar macros/functions be strongly preferred to their explicit variants?

I see couple of reasons for this to being a better practice (as opposed to explicitly typing contents of the macro):

“Plenty of people learn the internals from reading the CPython source, so it would be nice to make it easier on them :)” - @ZeroIntensity
There is always a possibility of new additions that would potentially need to break equivalence of == and Py_Is. Such as: Backquotes for deferred expression. Thus, adhering to C API standards would make things smoother for possible unexpected turns.

I am not suggesting modifying existing code.

But I think that maybe it might be worth considering using Py_Is as opposed to == in the same spirit as Py_TYPE as opposed to its explicit variant going forward.

Are there any other similar cases?

Nineteendo · December 15, 2024, 3:38pm

We’re talking about C here, where == means the variable has the same address.

ZeroIntensity · December 15, 2024, 3:39pm

I don’t think we could ever do that for compatibility reasons, but Py_Is seems more clear to me.

storchaka · December 15, 2024, 5:40pm

I consider Py_Is deadborn, like iso646.h macros or trigraphs. A change which makes Py_Is and == not equivalent will break not only CPython code, but every one Python extension. It will also likely require changing the assignment operation and comparison with NULL, so PyObject *x = foo() and if (x) will be needed to be rewritten with some new macros. This is such larger breaking change, that equivalent to rewriting all extensions from C to other programming language. It will kill Python ecosystem.

dg-pb · December 15, 2024, 7:19pm

Thanks, I update my estimate of probability of occurrence of such event in the next 20 years from 1% to 0.01%…

Then, this is the main reason.

It would provide a clearer separation between:

if (Py_Is(py_obj1, py_obj2)) {}
// and
if (c_obj1 == c_obj2) {}
if (py_obj_maybe_null == NULL) {}

Can Py_Is be deemed as slightly better practice as opposed to == when checking identity of Python objects?

guido · December 15, 2024, 7:31pm

I don’t see why. It is clumsy to type, and has no benefits over ==. I wholly agree with Serhiy.

ZeroIntensity · December 15, 2024, 9:02pm

The main case where I see it as better is when you’re rewriting Python code in C. It’s more clear in that case that you actually want is and not equality, but otherwise I use ==.

erlendaasland · December 17, 2024, 8:30am

FTR, here’s the original issue and PR:

Issue: gh-87919
PR: gh-25227

The API(s)^[1] are targeted towards extension module developers.

Carl Friedrich Bolz-Tereick left the following comment on the original issue:

Just chiming in to say that for PyPy this API would be extremely useful, because PyPy’s “is” is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

I’m aligned with Peter’s remark that extension module developers often look to the CPython repo for inspiration. I think it would be worth it recommending these (and similar CPython implementation detail agnostic APIs) for the C code in Modules/^[2].

Py_Is, Py_IsNone, Py_IsTrue, and Py_IsFalse ↩︎
… only the extension modules in Modules/, and nowhere else ↩︎

vstinner · December 17, 2024, 9:09am

It should be used in new code, but I don’t think that it’s worth it to proactively replace every “x == y” in existing code.

Why limiting to Modules/? That sounds like an arbitrary and complicated rule

erlendaasland · December 17, 2024, 9:20am

This is true for any semantic change to any public C API, so I consider this comment off-topic and borderline FUD. Perhaps you are misunderstanding the intentions of the API. Please see @cfbolz’s comment on the original issue for why this API has a purpose.

storchaka · December 17, 2024, 10:27am

No other change to public C API has such enormous effect. For now, using Py_Is is nothing but code obfuscation. I discourage using it and will not approve any PR that uses it.

If PyPI has issues with using == for identity checks, they have a large issue with the C API.

steve.dower · December 17, 2024, 11:47am

(Corrected PyPy myself)

They do have a large issue - it’s called movable objects. A big part of the reason we can’t even begin to experiment with movable objects, or indirection via handles, is because we use the PyObject pointer as the identity.^[1]

I’m not a fan of overly strict guidelines anywhere, but I can certainly see the value in at least allowing people to use Py_Is where it makes sense in their code. Forbidding it is more limiting to ourselves than anyone else.

We can try these under C++, because we can override == there, but not in C. ↩︎

guido · December 17, 2024, 9:20pm

I think this is a thorny problem (how friendly do we want to be for PyPy) but it’s too early for threats like “I won’t approve”. Let’s keep the discussion open.

markshannon · December 18, 2024, 10:32am

I think the way to support PyPy and GraalPython is with HPy or something like it.

Adding macros like Py_Is is the worst of both worlds. It doesn’t by itself allow C extensions to work with PyPy or GraalPython, but it does obfuscate the C code and uses a name that might have been useful with a HPy like interface in the future.

vstinner · December 18, 2024, 12:59pm

HPy has a HPy_Is(ctx, x, y) function. Using Py_Is(x, y) ease the migration to HPy: it’s easier to replace Py_Is() with HPy_Is() than going through all x == y and x != y comparisons to check if objects are compared.

cfbolz · December 18, 2024, 3:20pm

I understand that changing every existing == to Py_Is is way too disruptive in most cases. This is particularly true for code that is part of CPython, which is not going to be used in alternative python implementations anyway.

However, there is still value in having the function as part of the API, for several reasons:

code generators like cython can generate it
when an extension module reports a bug when running on PyPy that is caused by using ==, there is a way to fix it that works on all implementations.

Also, I wanted to give some more background on how PyPy does things: right now, we mostly don’t break == in C, at the cost of a much less efficient emulation of the C-API. There is one situation where we do break ==, but that happens only for immutable builtin type, where users mostly already know that you shouldn’t depend on identity. Python is deals with that case correctly, but when using ==there is no way to achieve this fix in C extensions.

barry-scott · December 18, 2024, 4:38pm

The research I recall from extreme programming is that it is better to only code what you need and not attempt to predict what you may need in the future. The research shows it is wasted effort as the predictions are almost always wrong.

When HPy is ready I plan to port to it, but not attempt to change code in an attempt to make the port possibly easier before hand.

dg-pb · January 5, 2025, 9:05am

My main aim is to figure out what is best to be used in new CPython PRs:
a) Use ==
b) Use Py_Is
c) Both are ok

Personally I would probably favour (b). My reasoning is as follows:

(c) is vague and the whole reason why I started this is to figure out a better practice so to eliminate repetitive comments and changes in PRs per requests of different people with different opinions. Thus, it is either (a) or (b)
Imagine if only one can be used, but not the other until the end of time. Which one would be better? I would say (b). == just happens to work given the implementation, while Py_Is has been introduced for a reason and replacing == with Py_Is will always be correct, it will work on different implementations given they adapt official API and all the benefits and reasons laid out above. While costs for using Py_Is seem to be on the softer side such as readability, the fact that in isolated CPython space it is only an obfuscation, and similar.

So this is my take from what I have seen here.

However, to me what is more important is to have a consensus so I don’t need to keep loosing time on this small detail going forward. If people want to use == while there are no hard issues with it that seems perfectly reasonable to me too.

pitrou · January 5, 2025, 10:08pm

The point here is not to make switching to HPy easier, it’s to make compatibility with PyPy better (even without using HPy).

steve-s · January 7, 2025, 8:00am

Note: == comparison also doesn’t work for tagged pointers. There is already PyStackRef_Is in CPython [0], so I would argue that not only for the sake of potential migration to a 3rd party HPy API or supporting alternative interpreters, but also for the sake of potential migration of some code to stack refs within CPython and who knows, maybe CPython will expose some stack refs public API for extensions too (or HPy will get stack refs based backend on CPython, which would make it more appealing even on CPython).

There is an important difference between internal code and public API design. On top of that tagged pointers have been shown to be useful and effective in research and in practice and are already used within CPython, so while == comparison still works today on CPython, I think we can say that an eventuality where we would like to migrate code from == to Py_Is is not some wild theoretical prediction unlikely to materialize.

[0] gh-117139: Convert the evaluation stack to stack refs by Fidget-Spinner · Pull Request #118450 · python/cpython · GitHub