Status of `PyLong_FromPid()` and `PyLong_AsPid()`

PyLong_FromPid and PyLong_AsPid are defined as aliases of PyLong_FromLong/PyLong_FromLongLong and PyLong_AsLong/PyLong_AsLongLong.

#if !defined(SIZEOF_PID_T) || SIZEOF_PID_T == SIZEOF_INT
#define _Py_PARSE_PID "i"
#define PyLong_FromPid PyLong_FromLong
#define PyLong_AsPid PyLong_AsLong
#elif SIZEOF_PID_T == SIZEOF_LONG
#define _Py_PARSE_PID "l"
#define PyLong_FromPid PyLong_FromLong
#define PyLong_AsPid PyLong_AsLong
#elif defined(SIZEOF_LONG_LONG) && SIZEOF_PID_T == SIZEOF_LONG_LONG
#define _Py_PARSE_PID "L"
#define PyLong_FromPid PyLong_FromLongLong
#define PyLong_AsPid PyLong_AsLongLong
#else
#error "sizeof(pid_t) is neither sizeof(int), sizeof(long) or sizeof(long long)"
#endif /* SIZEOF_PID_T */

On one side, they are always available, independingly from the Py_LIMITED_API setting, and their names do not start from underscore, so they can be considered public API.

On other side, they are not documented, there were not even any reference, not even a NEWS entry about them, there are no references in PEPs. So they can be considered private API that have non-underscored names by mistake.

They were added in * Replaces the internals of the subprocess module from fork through e… · python/cpython@fb94c5f · GitHub. Note that PARSE_PID added in the same commit was renamed later to _Py_PARSE_PID. Note that PyLong_FromSocket_t and PyLong_AsSocket_t which was defined in the similar way were later removed from public headers.

So how should we treat PyLong_FromPid() and PyLong_AsPid(), as undocumented part of the public API, or as private functions with wrong names? Can we rename them and move to private headers? There is a problem with PyLong_AsPid() (Unchecked signed integer overflow in PyLong_AsPid() · Issue #117021 · python/cpython · GitHub) which cannot be solved with the current definition.

I think we can probably get away with changing them (HPy exposes them but doesn’t document them as public API). So they’d break, but theoretically none of their users will?

Alternatively, if we need a fix, add the _PyLong_AsPid name with correct behaviour and deprecate the public names? Writing it around PyLong_AsNativeBytes should get correct behaviour for whatever the size is when built, as you pass in sizeof(pid_t) directly.

1 Like

Well, admittedly I don’t have any skin in the game here, but I’d think that anyone writing any kind of extension that deals with PIDs on Linux (and does something not already covered by subprocess or signal) would want to have access to something like this. The _Py_PARSE_PID value seems to be a typecode that will be used by some wrapper functionality that should be the actual API; but simply converting between a Python int and a numeric value to use as a PID for a system call, sounds potentially useful. Without exposing those names, someone might be tempted to DIY and possibly get it wrong. So maybe it would be better to document them? I find it hard to imagine a maintenance issue that wouldn’t simultaneously trigger massive changes everywhere else (e.g., hypothetically supporting new 128-bit architectures).

PyLong_AsNativeBytes and PyLong_FromNativeBytes solve the problem, but they seem quite a bit harder to use. PyLong_AsNativeBytes requires passing a pid_t “buffer” by pointer rather than just returning one, and similarly PyLong_FromNativeBytes would counterintuitively require taking the address of the source pid_t even though it isn’t allocated, modified, expensive to copy etc. Both would also require passing not only sizeof(pid_t), but also -1 for an endianness flag.

They are public API and changing/removing them needs a deprecation period. Our backward compatibility policy is pretty explicit nowadays: if something is not documented at all, it is not automatically considered private.

Adding new correct private versions and deprecating the old names sounds good. And dogfooding PyLong_AsNativeBytes is the way to go :‍)

As for the limited API, we can not define the old names if Py_LIMITED_API+0 >= 0x030d0000. (The stable ABI promises long-term stability; the API for it is just limited. And preprocessor #defines are not part of any ABI.) We don’t need a deprecation period: users can bump Py_LIMITED_API on their own schedule, when they’re ready to deal with the changes.

2 Likes

There is a precedence of PyLong_FromSocket_t and PyLong_AsSocket_t which was removed from headers without deprecation period. The case of PyLong_FromPid and PyLong_AsPid looks very similar.

The problem with PyLong_AsPid is that it should be defined as an alias of PyLong_AsInt on most non-Windows 64-bit platforms, but PyLong_AsInt is only the part of the limited C API since 3.13. For users of the limited C API < 3.13 and in Python < 3.13 it should be defined in other way, perhaps as inline function, and it has their own drawbacks. Any solution may break someones code.

Yes, in 2010, a decade before PEP 387 was accepted. Please be more careful today.

Any solution may break someones code.

Then choose to not break existing working code. Deprecate the functions, and move on :‍)

AFAIK, negative pids are special. Are there users that run into this signed integer overflow in practice, without tests catching it?

2 Likes

Okay, I just was not sure that they are officially the part of the public C API and the limited C API.

2 Likes