Should `Py_Is` be preferred to `==`?

I had a brief conversation, which I come back to from time to time.

In short, should using Py_Is and other similar macros/functions be strongly preferred to their explicit variants?

I see couple of reasons for this to being a better practice (as opposed to explicitly typing contents of the macro):

  1. “Plenty of people learn the internals from reading the CPython source, so it would be nice to make it easier on them :)” - @ZeroIntensity
  2. There is always a possibility of new additions that would potentially need to break equivalence of == and Py_Is. Such as: Backquotes for deferred expression. Thus, adhering to C API standards would make things smoother for possible unexpected turns.

I am not suggesting modifying existing code.

But I think that maybe it might be worth considering using Py_Is as opposed to == in the same spirit as Py_TYPE as opposed to its explicit variant going forward.

Are there any other similar cases?

We’re talking about C here, where == means the variable has the same address.

3 Likes

I don’t think we could ever do that for compatibility reasons, but Py_Is seems more clear to me.

3 Likes

I consider Py_Is deadborn, like iso646.h macros or trigraphs. A change which makes Py_Is and == not equivalent will break not only CPython code, but every one Python extension. It will also likely require changing the assignment operation and comparison with NULL, so PyObject *x = foo() and if (x) will be needed to be rewritten with some new macros. This is such larger breaking change, that equivalent to rewriting all extensions from C to other programming language. It will kill Python ecosystem.

4 Likes

Thanks, I update my estimate of probability of occurrence of such event in the next 20 years from 1% to 0.01%…

Then, this is the main reason.

It would provide a clearer separation between:

if (Py_Is(py_obj1, py_obj2)) {}
// and
if (c_obj1 == c_obj2) {}
if (py_obj_maybe_null == NULL) {}

Can Py_Is be deemed as slightly better practice as opposed to == when checking identity of Python objects?

I don’t see why. It is clumsy to type, and has no benefits over ==. I wholly agree with Serhiy.

4 Likes

The main case where I see it as better is when you’re rewriting Python code in C. It’s more clear in that case that you actually want is and not equality, but otherwise I use ==.

2 Likes

FTR, here’s the original issue and PR:

The API(s)[1] are targeted towards extension module developers.

Carl Friedrich Bolz-Tereick left the following comment on the original issue:

Just chiming in to say that for PyPy this API would be extremely useful, because PyPy’s “is” is not implementable with a pointer comparison on the C level (due to unboxing we need to compare integers, floats, etc by value). Right now, C extension code that compares pointers is subtly broken and cannot be fixed by us.

I’m aligned with Peter’s remark that extension module developers often look to the CPython repo for inspiration. I think it would be worth it recommending these (and similar CPython implementation detail agnostic APIs) for the C code in Modules/[2].


  1. Py_Is, Py_IsNone, Py_IsTrue, and Py_IsFalse ↩︎

  2. only the extension modules in Modules/, and nowhere else ↩︎

1 Like

It should be used in new code, but I don’t think that it’s worth it to proactively replace every “x == y” in existing code.

Why limiting to Modules/? That sounds like an arbitrary and complicated rule :slight_smile:

1 Like

This is true for any semantic change to any public C API, so I consider this comment off-topic and borderline FUD. Perhaps you are misunderstanding the intentions of the API. Please see @cfbolz’s comment on the original issue for why this API has a purpose.

No other change to public C API has such enormous effect. For now, using Py_Is is nothing but code obfuscation. I discourage using it and will not approve any PR that uses it.

If PyPI has issues with using == for identity checks, they have a large issue with the C API.

1 Like

(Corrected PyPy myself)

They do have a large issue - it’s called movable objects. A big part of the reason we can’t even begin to experiment with movable objects, or indirection via handles, is because we use the PyObject pointer as the identity.[1]

I’m not a fan of overly strict guidelines anywhere, but I can certainly see the value in at least allowing people to use Py_Is where it makes sense in their code. Forbidding it is more limiting to ourselves than anyone else.


  1. We can try these under C++, because we can override == there, but not in C. ↩︎

7 Likes

I think this is a thorny problem (how friendly do we want to be for PyPy) but it’s too early for threats like “I won’t approve”. Let’s keep the discussion open.

3 Likes

I think the way to support PyPy and GraalPython is with HPy or something like it.

Adding macros like Py_Is is the worst of both worlds. It doesn’t by itself allow C extensions to work with PyPy or GraalPython, but it does obfuscate the C code and uses a name that might have been useful with a HPy like interface in the future.

HPy has a HPy_Is(ctx, x, y) function. Using Py_Is(x, y) ease the migration to HPy: it’s easier to replace Py_Is() with HPy_Is() than going through all x == y and x != y comparisons to check if objects are compared.

3 Likes

I understand that changing every existing == to Py_Is is way too disruptive in most cases. This is particularly true for code that is part of CPython, which is not going to be used in alternative python implementations anyway.

However, there is still value in having the function as part of the API, for several reasons:

  • code generators like cython can generate it
  • when an extension module reports a bug when running on PyPy that is caused by using ==, there is a way to fix it that works on all implementations.

Also, I wanted to give some more background on how PyPy does things: right now, we mostly don’t break == in C, at the cost of a much less efficient emulation of the C-API. There is one situation where we do break ==, but that happens only for immutable builtin type, where users mostly already know that you shouldn’t depend on identity. Python is deals with that case correctly, but when using ==there is no way to achieve this fix in C extensions.

1 Like

The research I recall from extreme programming is that it is better to only code what you need and not attempt to predict what you may need in the future. The research shows it is wasted effort as the predictions are almost always wrong.

When HPy is ready I plan to port to it, but not attempt to change code in an attempt to make the port possibly easier before hand.