PEP 739: Static description file for build details of Python installations

FFY00 · January 30, 2024, 4:03pm

I released a PEP draft for a proposal to define the format of a static description file for Python installations.

I’d like to get feedback on which other information we should include in this file.

Scope

This PEP only meant to standardize the format. Methods of distribution is another can of worms (think about normal distributions, then WASM, etc), which I feel should be defined separately.

Even though we don’t standardize where to find the static description file, just having this format and having implementations providing it on their own terms would be a big improvement for cross-builds and other similar use-cases.

Motivation

Having a static file that describes a Python distribution is the first step towards standardizing installation introspection, which plays a big role in multiple different workflow: package building infrastructure, launchers, etc.

Why static?

We define a static file to target use-cases where running the Python interpreter is undesirable, and/or impossible.

Ideally, this should be the basis for future work in sysconfig, which would additionally provide information that can only be known at runtime.

steve.dower · January 30, 2024, 4:49pm

The big thing I think is missing from the PEP right now is future name reservations. Really just some text to say “implementations should not specify additional keys, except in the implementation object” (or make a new key to put them in).

Luckily, only one implementation at a time will create these files, so we don’t have to worry about collisions.

I also think we should be able to get the basic set of paths into this file as well (prefix, exec_prefix, executable, stdlib, platstdlib, include, Windows libs). I’d be totally fine with them being relative to the file rather than absolute, as that’s generally how we know them at compile time. It may require rewriting the file on install, but that seems an okay time to do it.

adamsilkey · January 30, 2024, 4:52pm

Quick nitpicky questions/thoughts from reading through it:

Since version_parts is equivalent to version_info, what’s the reasoning for calling it version_parts instead of version_info?
Would it be valuable to link to the docs for sys version info? sys — System-specific parameters and functions — Python 3.12.1 documentation
Similar question for the API/ABI Versioning: API and ABI Versioning — Python 3.12.1 documentation
Under Language subheading: Subsection with details related to the language specification. should add Python language to match what’s below.

FFY00 · January 30, 2024, 6:21pm

Yes, definitely, thanks for catching that.

Yeah, I think that’s possible, we just need to account for the cases where paths are not available (due to platform, kind of build, etc).

To be more explicit, it was definitely a subjective call. I am open to change it to version_info, to match the sys attribute.

It already does . Maybe it’s just not noticeable enough? We can explicitly mention that, what do you think?

Good catch, thanks!

ofek · January 30, 2024, 7:03pm

I would recommend for usability to add a key that represents the relative path to the Python binary. The new feature in Hatch where Python distributions can be managed, I had to hardcode the paths (CPython standalone project, PyPy, etc.)

Also cc @indygreg

steve.dower · January 30, 2024, 7:09pm

They can just be absent in that case, right?

I imagine using them to either [cross-]compile binaries for it (and so need the headers/import libs) or to copy/install things into it for later deployment (e.g. generate a script that can launch the executable after being copied to a “real” system). For cases where these don’t make sense, then I’m fine with the paths just being missing - my build tool will simply error out if it can’t find the files it needs and make the user get a “proper” package.

adamsilkey · January 30, 2024, 7:34pm

I see it now.

Personally, I don’t like hiding links behind any code styling, because it makes the link invisible (unless you know it’s there/happen to hover over it.) That said, it looks like this is the standard of how it’s done across PEPs, so I’m not too fussed about it one way or the other. (Here’s an example: PEP 705 – TypedDict: Read-only items | peps.python.org)

I would however link to the documentation on PY_VERSION, however you wanted to do that: API and ABI Versioning — Python 3.12.1 documentation

FFY00 · January 30, 2024, 7:36pm

I think this is a bit tricky, but I agree that we should provide a key to the Python interpreter. The tricky bit is doing this that works well for multiple use-cases. For example, if we make the path relative, then on virtual environments, we need to copy the file over, but when we update the base installation, the file will be outdated.

Maybe a good solution would be to have some sort of way to define the “origin” for the path. I would like to be able to have this work without requiring special handling for different scenarios, but also avoid as much complexity as I can.

ofek · January 30, 2024, 7:38pm

I’m totally fine with this not supporting virtual environments because those paths are basically standardized. I’m only interested in the parent installation that one would download/build.

FFY00 · January 30, 2024, 7:43pm

Yes, I think the paths can just be missing. For things like the interpreter, stdlib, include, etc. that is okay, but if we also want to have paths where you can install things, like purelib/platlib, then it becomes an issue. So we need to think this through very carefully, and hopefully involve the various impacted parties in these discussions.

FFY00 · January 30, 2024, 7:46pm

I’m in agreement with this. Bringing virtual environments in, would import a lot of complexity, but I understand why some might want to.

One option there would be to split the file describing the installation and environment.

FFY00 · January 30, 2024, 7:47pm

I think that would be a good improvement, I’ll add it on my next PEP PR. Thanks!

pf_moore · January 30, 2024, 7:48pm

I note that you’ve put me as PEP delegate. I’m perfectly happy to do this, although as this is something that the Python interpreter will provide, it’s arguably something for the SC, so if you’d prefer them to make the call on this that’s also fine with me.

My main comment otherwise is that I agree with @steve.dower that having the installation paths in this file would be useful. Being able to install packages into a Python installation, without needing to invoke the interpreter for that installation, is a useful capability to support. For example, it would be good if pip could support something more robust than --target for installing into an embedded Python installation (where there may not even be a standalone interpreter).

Like @ofek, I’m OK if this doesn’t support virtual environments. I do think it could be convenient, but catering for venvs shouldn’t be a reason to delay the main proposal, which is for base interpreters.

FFY00 · January 30, 2024, 8:02pm

For me, I think whatever makes you the most comfortable is the best.

I have addressed this above, I believe. I think it would be probably best to detach installable paths from this file, which is meant to just represent the Python installation.

In my opinion, it would be a mistake to address the use-case of installing files to the default environment, and not to virtual environments. This creates a difference between the two, requiring special handling for both. I think the use-case should just be “installing files”.

The way I see this happening most easily, is to have a static environment description file, in addition to the Python installation one.

What do you think?

pf_moore · January 30, 2024, 8:08pm

Leave it with me, then. The SC can weigh in if they care

That sounds fine to me - your previous reply appeared while I was writing my post, so I didn’t see it before hitting “Send”. Sorry about that.

steve.dower · January 30, 2024, 8:09pm

I’d argue that “the default contents of sys.path” represents the Python installation, and so some way to find Lib and Lib/site-packages (or their equivalent) from the file is fine.

I agree we don’t want to define all the other install locations that packaging tools want. Those can be done independently. (In particular, the “includes” I want here are literally Python’s includes, and not those belonging to any other package.)

FFY00 · January 30, 2024, 8:24pm

But sys.path is dependent on factors other than just the executable. More concretely, it depends on the presence of a pyvenv.cfg file, which signals the existence of a virtual environment.

So, to be clear, I agree with the inclusion of all paths other than the user-installable ones. What I think is messy here is that there are some paths where it is currently not clear if users should be installing to, like the “includes”. In my opinion, we should have two different sets of “include” paths, as you hint (I think), one for core, and one for user packages.

I think we are on the same page about the “include” paths, right? I am not sure on the site-packages though, as I believe we shouldn’t include them in this file, if we don’t want to create a lot of complexity.

steve.dower · January 30, 2024, 9:11pm

Nah, PYTHONPATH and pyvenv.cfg make it non-default, so we can ignore them for this file. The paths I’m suggesting are hard-coded into the built executable, so are perfectly fine candidates for also embedding in this file IMHO. (I exclude user site-packages too, mainly because there’s no reasonable way to put that into a static file.)

pitrou · January 30, 2024, 9:20pm

Reading through the PEP, and in particular looking at the example, several oddities stand out:

{
  "schema_version": 1,
  "language": {
    "version": "3.13.1",
    "version_parts": {
      "major": 3,
      "minor": 13,
      "micro": 1,
      "releaselevel": "final",
      "serial": 0
    }
  },
  "implementation": {
    "name": "cpython",
    "hexversion": "...",
    "cache_tag": "cpython-313",
    "multiarch": "x86_64-linux-gnu"
  },
  "c_api": {
  }
}

language.version is supposed to be the version of the language specification supported, but it has a “micro” component which corresponds to the bugfix level of a software release. It should be “3.13” not “3.13.1”.

Similarly, language.version_parts claims to mimick sys.version_info, which is a particular release identifier for the software implementation of Python, but it also falls under the language section which is supposed “details related to the language specification”.

So, it seems the PEP itself is being confused between language version and implementation version.

Also, for some reason the implementation has a hexversion child field (why? in which context is this more convenient than a string or tuple), but no regular version string or object.

I would therefore expect something like this:

{
  "schema_version": 1,
  "language": {
    "version": "3.13",
  },
  "implementation": {
    "name": "cpython",
    "version": "3.13.1",
    "version_parts": {
      "major": "3",
      "minor": "13",
      "micro": "1",
      "releaselevel": "final",
      "serial": "0"}
  }
}

Like others I would also expect information about paths to be present. One example for a hypothetical system CPython 3.10 install:

{
  "executable": "/usr/bin/python3.10",
  "shared_library": "/usr/lib/x86_64-linux-gnu/libpython3.10.so",
  "base_package_paths": [
    "/usr/lib/python310.zip",
    "/usr/lib/python3.10",
    "/usr/lib/python3.10/lib-dynload"
  ],
  "site_package_paths": [
    "/usr/local/lib/python3.10/dist-packages",
    "/usr/lib/python3.10/dist-packages"
  ],
  "user_site_package_paths": [
    "$HOME/.local/lib/python3.10/site-packages"
  ]
}

steve.dower · January 30, 2024, 9:45pm

It’s handy for version comparisons in C preprocessors and other situations that don’t have rich objects for versions (I have some build pipelines that use it). I’d probably use it if it were there, but could live without it.