PEP 739: Static description file for build details of Python installations

steve.dower · February 12, 2024, 1:10pm

Okay, PEP author has spoken. Please include it in Rejected Ideas though - there was enough discussion/interest that it’ll probably be brought up again if not.

I expect both of these to be true anyway. Certainly in the “preparing an install for deployment” case (where a venv usually won’t work anyway), having a venv-specific environment-details.json isn’t going to prevent anyone from having to special-case it.

FFY00 · February 27, 2024, 8:15pm

I pushed some updates to the PEP. Could you have a look and provide further feedback? Thanks in advance.

FFY00 · March 13, 2024, 1:01pm

If there isn’t any further feedback, I’d like to request a review from the PEP delegate, whenever possible, @pf_moore.

I’ll start drafting a PEP for a static file describing the environment in the meanwhile.

pf_moore · March 17, 2024, 2:22pm

I’ve now reviewed the PEP, as it currently stands. I am not pronouncing on it at this point, as there are a number of questions that came up in my review that I think need to be answered before it’s ready for pronouncement.

In spite of these comments, I still don’t see anything in the PEP that makes this explicit. Furthermore, while looking at the structure of the file, I’m concerned by the description of schema_version:

Version of the schema to parse the file contents. It SHOULD be 1 for the format described in this document.

Surely it must be 1? As this statement stands, it’s perfectly legitimate for someone to publish a file with a schema version of banana - which I assume is not the intention. It could also be 2, even if someone produced a version 2 spec that removed fields that version 1 requires (the resulting file would not be a valid version 2 file, but it would be a valid version 1 file, just with a schema version string of 2… )

The wording for this in the spec is confusing (to me, at least):

The path to the Python interprer. Either an absolute path, or a relative path to the directory containing this file, if applicable.

It’s actually only while typing this message that I finally worked out what was intended, which I think would be better worded as

The path to the Python interprer. This SHOULD be an absolute path. If this static description is made available as a file in the filesystem, the interpreter path MAY be relative to the directory containing the description file.

Given how clumsy it is to word that, I feel as though things would be simpler if we required the static desciption to be provided in an actual filesystem file. What’s the use case for an installation providing it in some other way?

In the discussion about the interpreter path, the question of virtual environments was raised. The conclusion seemed to be that this static description was only intended to be defined for the “base” interpreter, but the PEP makes no mention of that.

I’d like to see a minimal example, as well as the more complete one given. So:

{
  "schema_version": 1,
  "language": {
    "version": "3.13"
  },
  "implementation": {
    "name": "cpython",
    "version": {
      "major": 3,
      "minor": 13,
      "micro": 1,
      "releaselevel": "final",
      "serial": 0
    }
  }
}

This gives a better sense of what consumers can rely on (i.e., barely anything… ) In fact, I question what use such a minimal description would be in practice. If the only way this is going to be usable is if consumers start relying on optional or implementation-defined data, I don’t think it works very well as a standard.

In contrast, I’d like the existing example to include an interpreter path, so it shows all of the defined items from the spec.

Also, it would be helpful if the proposal made it mandatory for the interpreter, libpython and c_api sections to be present if the implementation provided those things. So, for example, the lack of an interpreter section would explicitly mean that the installation didn’t provide a standalone interpreter.

I’d like to see some practical use cases of this description file included in the rationale, in enough detail to allow readers to understand how the proposed file helps. The only suggestions given are cross-compilation (which I know nothing about, so I can’t see how this proposal helps, unfortunately) and launchers. But if I’m writing a launcher, what matters most of all is discovery - in other words, how do I find the Python installation(s) that I want to choose from - and the PEP is silent on that. Once I have found one or more installations, I need to know if they are the right version, which is something the proposal would help with, but there’s no indication how I can find the file, even though I’ve found the installation. If I did find the file, I could use the version and the interpreter^[1] values from it, but there’s a huge chain of unknowns that prevent me getting to that point. And to be honest, I feel that most other use cases would have the same problem - while the file might be useful, there needs to be a reliable way to find it first if it’s to help. If you have any use cases that don’t need a discovery mechanism, they definitely need to be added to the rationale - otherwise, you should seriously consider standardising a discovery mechanism.

assuming it’s acceptable to say “no interpreter key implies I can reject this installation as having no interpreter”, which I mentioned above ↩︎

steve.dower · March 18, 2024, 4:02pm

I think “provided by the user” is fine as a discovery mechanism. Standardising across platforms, across Linux distros or even across install tools is such a huge task that it basically has to be out of scope.

If the tool can find either the Python executable (e.g. by searching PATH) or the install directory (which presumably includes either a python.exe or a bin/pythonX), they ought^[1] to be able to locate this file without having to launch Python itself.

Perhaps adding DisplayName and SupportUrl attributes (like in PEP 514) would help clarify the benefit to launchers even without a global discovery mechanism?

With potential updates to this PEP ↩︎

pradyunsg · March 21, 2024, 4:20pm

If that’s what we’re hoping to achieve here, we need a decent amount of additional information to be made available to pip.

github.com/pypa/pip

Various package-index filtering flags do not affect the environment markers

opened 01:17PM - 16 Dec 22 UTC

pradyunsg

state: needs discussion type: feature request

This is a common pattern that I'm seeing in pip's issue tracker; so filing an is…sue to (a) trigger a discussion of how to improve documentation / output / errors to better deal with the mismatch in user expectations vs behaviour here and (b) consolidate those issues. Basically, the fundamental problem here is that pip has various options that affect the _wheels_ that pip will consider when triaging things (`--platform <platform>`, `--python-version <python_version>`, `--implementation <implementation>`, `--abi <abi>`) but those do not affect the marker evaluation environment for dependencies. This results in a subtle failure mode: `pip install [options] package` will use a wheel for package that has a (hypothetical) Python version 9.22 but when evaluating environment markers (https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers), that would still use the environment for the current Python interpreter. It is *not* possible to compute the environment markers based on the values passed in by the user via the CLI.

Namely, pip needs enough information to compute platform compatibility tags, destination paths and environment markers. Or, at least, pip will need a bespoke mechanism/file that’s separate from this with all the relevant information being provided there.

pf_moore · March 21, 2024, 5:04pm

I’d need some concrete examples of where this file would be used to be convinced of that. I get that it’s hard (if not impossible) to standardise a discovery mechanism, but that just means the PEP needs to present a convincing argument that the file without a discovery mechanism is still useful, and I don’t think it’s done that. Part of the problem here, of course, is that the amount of required information is minimal, so it feels like many potential use cases will need to rely on data that isn’t actually standardised. If the standard doesn’t say how to find the data, and it doesn’t include the data you need in practice, I start to question why we would have a standard at all…

steve.dower · March 21, 2024, 5:17pm

Sure, here’s a case where currently I require the user to pass the full path to a Python includes directory and libs directory as environment variables. In practice, it usually requires setting the build platform and wheel tag manually as well: GitHub - zooba/pymsbuild: An MSBuild wrapper for use building Python extensions

Cross-compiling wheels

Cross compilation may be used by overriding the wheel tag, ABI tag, or build platform, as well as the source for Python’s includes and libraries. These all use environment variables, to ensure that the same setting can flow through a package installer’s own process.

…

The platform is used to determine the MSBuild target platform. It cannot yet automatically select the correct Python libraries, and so you will need to set PYTHON_INCLUDES and PYTHON_LIBS (or with a PYMSBULID_ prefix) environment variables as well to locate the correct files.

…

With this file, only the path to the file (or its parent directory, more likely) would be required and everything else can be inferred/calculated from that. And most likely the parent directory is the location where the user has already downloaded/extracted a package containing those files, so they know where it is.

pf_moore · March 21, 2024, 6:46pm

Don’t you need more data than the file provides for that, though? PYTHON_LIBS is one of the paths that isn’t covered in this proposal, for example, and the wheel and ABI tags aren’t available as the specification stands, either.

Sure, you can say your tool needs this data to be there, but if you can’t rely on the implementation providing it, at that point how is it any different than specifying a tool-specific config file? It’s a little better, certainly, but enough to warrant being a standard? I still need convincing of that.

steve.dower · March 21, 2024, 7:23pm

I thought the libraries path was included? And I’d love to have wheel and ABI tag in there,^[1] but will settle for inferring it from the version info and architecture. Right now, the best I can do is to parse patchlevel.h for version number and guess the architecture.

Especially because of the possibility of setting a custom platform tag, though packaging would have to learn to read it… ↩︎

pf_moore · March 21, 2024, 8:35pm

The only guaranteed values in the file are language.version, implementation.name, implementation.version (which must be sys.version_info), interpreter.path (if there’s an interpreter - I assume that the lack of this key means “no interpreter is available”, although I’ve asked for that to be made explicit), libpython.dynamic, libpython.static and c_api.headers (same caveat for these three).

So there’s no architecture information guaranteed.

The implementation section SHOULD be the same as sys.implementation, but (a) it’s not required to, and (b) sys.implementation only guarantees name, version, hexversion and cache_tag (and that’s all CPython on Windows at least includes).

So as far as I can see, you’ll still have to guess the architecture, and make something up for the libraries path.

FFY00 · April 8, 2024, 8:15pm

Sorry for the delay, I was a bit overhelmed.

That makes sense, I will try to rewrite it to clarify it must be a “valid” schema version.

Paul Moore:

The wording for this in the spec is confusing (to me, at least):

The path to the Python interprer. Either an absolute path, or a relative path to the directory containing this file, if applicable.

It’s actually only while typing this message that I finally worked out what was intended, which I think would be better worded as

The path to the Python interprer. This SHOULD be an absolute path. If this static description is made available as a file in the filesystem, the interpreter path MAY be relative to the directory containing the description file.

Given how clumsy it is to word that, I feel as though things would be simpler if we required the static desciption to be provided in an actual filesystem file. What’s the use case for an installation providing it in some other way?

Yes, sorry, writing clear specifications, like your rewrite example, is not one of my strong suits, and definitely something I need to improve. Your rewrite aligns with what I was trying to say, and is clearer.

One of the examples I had in mind was, for example, shipping this file together with the interpreter binary in WASM.

Not quite. I think the conclusion, at least from my part, was that this is supposed to describe details tied to the build. The inclusion of the interpreter here can be confusing, so maybe it’s just best to take it out, and leave it up to the environment details file (I haven’t pushed the PEP for that yet).

One of my main use-cases for this file would be to describe a Python installation to a build system like Meson, for example (see Prototype support for `--python.target-config` by FFY00 · Pull Request #12193 · mesonbuild/meson · GitHub). In this type of situation, relying on an automatically generated file is not a huge concern, as package builders can just fill any missing information, if needed — especially in cross-compilation scenarios, which is where this PEP provides the most value.

That said, relying on an automatically generated file is still a relevant use-case, and something to keep in mind.

I think this is reasonable.

Yes, exactly. See the Meson example I provided above — the main benefit comes from standardizing a way to provide the build details.

pf_moore · April 8, 2024, 8:42pm

I don’t completely follow that PR, but if I’m not misunderstanding, that only references a python.version value from the file, which isn’t even in the PEP. So I’m rather unclear how that is actually intended to work.

I’m not sure I understand this at all. Who is the “package builder” in this case? The person running python -m build against the project source? Or the project maintainers themselves? And how would such a person “fill in the missing information”? And for that matter, what do you mean by “missing information”? Are you expecting that using data from that file that isn’t guaranteed by the spec will be a common use case? That seems contrary to the whole idea of having a standard, which is to not have to rely on implementation-defined data.

Maybe if you provided a complete, end-to-end example of how someone creating and building a project using Meson would use this PEP to improve their workflow, that would help clarify. It doesn’t have to be in the PEP at this point, just a post here explaining the use case would be enough for now.

FFY00 · April 9, 2024, 2:43am

That was just an early prototype PR for this proposal before it became a PEP, so it doesn’t match the current spec, nor does it provide a finished implementation. The idea is to use the static description file instead of introspecting the Python installation.

Anyone building the package.

I’m not, let’s ignore that.

Sure, let’s say I am a Linux distribution packager — I ship a Python installation, as well as Python packages, as part of my distribution. In this scenario, I build all Python packages from source (by running python -m build or an equivalent workflow).
Currently, pretty much all build backends introspect the Python interpreter, by running it, to find information needed for the build process (eg. What’s the Python version? Is there a shared libpython? Where is it?). This is fine if I am targeting the same system that is being used for the packaging process, but is problematic for cross-compilation.
With this proposal, we have a static description file for the Python installation that Meson can take in instead of having to run the Python interpreter for introspection.

If this still isn’t clear enough, I am happy to have a meeting to clarify things.

pf_moore · April 9, 2024, 7:48am

No, that’s fine - the missing bit of information was that the PR didn’t implement the spec as it is now written.

rgommers · May 10, 2024, 11:49am

+1 for this. Here is a bit of feedback from an initial attempt to write a JSON config file like this and use it for cross compilation purposes.

I started from the example in the current version of PEP 739:

Example in current PEP 739 draft

{
  "schema_version": 1,
  "language": {
    "version": "3.13"
  },
  "implementation": {
    "name": "cpython",
    "version": {
      "major": 3,
      "minor": 13,
      "micro": 1,
      "releaselevel": "final",
      "serial": 0
    },
    "hexversion": 51184112,
    "cache_tag": "cpython-313",
    "_multiarch": "x86_64-linux-gnu"
  },
  "libpython": {
    "dynamic": "/usr/lib/libpython3.13.so.1.0",
    "static": "/usr/lib/python3.13/config-3.13-x86_64-linux-gnu/libpython3.13.a",
  },
  "c_api": {
    "headers": "/usr/include/python3.13"
  }
}

I needed to add several things, mostly data that currently has to be obtained from sysconfig.get_config_vars(), but a few other things too:

EXT_SUFFIX (config var): file extension that is expected when targeting the full C API
the file extension expected when targeting the limited API (not in sys_config_vars, currently obtainable as importlib.machinery.EXTENSION_SUFFIXES[1].
- Side note that this currently doesn’t seem to include the architecture or Python implementation. For consistency with EXT_SUFFIX one would expect something like replacing the 312 in '.cpython-312-aarch64-linux-gnu.so' with abi3, but instead it’s only abi3.so. Probably subject to change/improvement in the future?
LIBPYTHON (config var): whether or not libpython has to be linked
LIBPC (config var): location of the directory where the pkg-config files for Python are installed
sysconfig.get_platform() return value
sys.base_prefix: install prefix of the interpreter.
- Needed in combination with sys.prefix to determine whether a venv is targeted. That latter part seems out of scope for this particular file I think, based on the discussion in this thread so far (needs a separate file, together with install paths probably). However, the base prefix seems like a property of the installed interpreter itself, and is not related to venvs or install paths.
- This property may also be useful to make the file more easily relocatable. Right now one needs absolute paths in a number of places; it’d be great to be able to spell them as "{base-prefix}/...". Please consider allowing this.

Furthermore I removed the following two items from the "implementation" section:

"hexversion": too hard to figure out the correct value for a non-native Python by hand, and I didn’t need it. If this is kept, please consider describing how to obtain or calculate this value for an interpreter that cannot be run locally.
"_multiarch": it seems wrong to include a value with an underscore, and again I didn’t need this. If this is kept, please consider removing the underscore.
"static": because the interpreter I was targeting didn’t ship a static libpython

This resulted in the following:

{
  "schema_version": 1,
  "language": {
    "version": "3.12"
  },
  "implementation": {
    "name": "cpython",
    "version": {
      "major": 3,
      "minor": 12,
      "micro": 3,
      "releaselevel": "final",
      "serial": 0
    },
    "cache_tag": "cpython-312"
  },
  "libpython": {
    "dynamic": "/home/rgommers/mambaforge/envs/host-env-aarch64/lib/libpython3.12.so",
    "link_libpython": false
  },
  "c_api": {
    "headers": "/home/rgommers/mambaforge/envs/host-env-aarch64/include/python3.12"
  },
  "extension_suffixes": {
    "minor-version": ".cpython-312-aarch64-linux-gnu.so",
    "limited-api": ".abi3.so"
  },
  "platform": "linux-aarch64",
  "pkg-config-dir": "/home/rgommers/mambaforge/envs/host-env-aarch64/lib/pkgconfig",
  "base-prefix": "/home/rgommers/mambaforge/envs/host-env-aarch64"
}

(fold out) here is what the version using `{base-prefix}` looks like

{
  "schema_version": 1,
  "language": {
    "version": "3.12"
  },
  "implementation": {
    "name": "cpython",
    "version": {
      "major": 3,
      "minor": 12,
      "micro": 3,
      "releaselevel": "final",
      "serial": 0
    },
    "cache_tag": "cpython-312"
  },
  "libpython": {
    "dynamic": "{base-prefix}/lib/libpython3.12.so",
    "link_libpython": false
  },
  "c_api": {
    "headers": "{base-prefix}/include/python3.12"
  },
  "extension_suffixes": {
    "minor-version": ".cpython-312-aarch64-linux-gnu.so",
    "limited-api": ".abi3.so"
  },
  "platform": "linux-aarch64",
  "pkg-config-dir": "{base-prefix}/lib/pkgconfig",
  "base-prefix": "/home/rgommers/mambaforge/envs/host-env-aarch64"
}

This looks like everything that is needed for cross-compiling NumPy and SciPy, except for install schemes/paths.

Two more things that can be retrieved from parsing the extension suffix but will be nicer to include explicitly, e.g. under a "build-configuration" key:

"debug": the result of sysconfig.get_config_var('Py_DEBUG')
"free-threaded" or "gil-disabled": the result of sysconfig.get_config_var('Py_GIL_DISABLED')

EDIT: the same argument goes for the sysconfig.get_platform() result: this is composed of sys.platform and platform.machine(). Adding those separately seems nicer.

steve.dower · May 10, 2024, 1:31pm

Unfortunately, it’s not even as well defined as that (on windows [1] is just .pyd and is the last element in that list). Given we’re pushing people towards limited API, exposing the intended suffix for that specific case is probably a good idea here.

I was hopeful that this file would replace pkg-config (in a cross-platform way). What did you need this directory for? (e.g. for information missing from this file? or just a more convenient format for other platform-specific tools?)

The intent is to calculate this relative to the file, otherwise we can’t make this file static. In fact, all the paths are supposed to be relative to the file (unless they’re absolute, which ought to be allowed but not required).

We can probably define this by reference to PY_VERSION_HEX. I think it’s worth keeping.

Agreed. Arguably, we should include the information needed to calculate the ABI suffix, not the other way around.

Generally +1 to everything else. Thanks for taking the time to try it out!

rgommers · May 10, 2024, 2:28pm

Mostly the latter. This static file could potentially replace pkg-config usage over time, assuming it’d be as widely adopted and became available for all Python versions that a build tool may want to support. But for now it’d just be a lot of extra work for no real benefit to avoid using pkg-config when the static file is available.

There is one thing that isn’t in the static description file though, and that’s the actual compile and link flags one needs. I’m not sure that it’s a good idea to go in that direction. Example: I proposed adding link_libpython, because that’s what I needed from the LIBPYTHON config var. However, if you look at the .pc files, you see:

$ cat /usr/lib/pkgconfig/python3.pc | grep Libs
Libs:

$ cat /usr/lib/pkgconfig/python3-embed.pc | grep Libs:
Libs: -L${libdir} -lpython3.11

What the link flags look like will differ per platform and also used compilers. Build systems tend to already have support for converting paths as needed (e.g., change /c to -L/c/foo to C:/ for MinGW to MSVC style) and for dealing with various inconsistencies in the .pc files themselves and when mixing different compiler toolchains. I’d much prefer to leave that alone, at least for now.

Relative paths is nicer indeed when shipping the file together with an interpreter. But it doesn’t work as well when a user provides the file for an interpreter that doesn’t have it. Which will be required at least for the next number of years - I don’t want to wait until Python 3.17 or so before starting to use this file:)

steve.dower · May 10, 2024, 2:45pm

I’m not sure what @FFY00 thought about it, but I’m happy enough to have the actual options in there for when the same compiler is being used (or if they can be adapted). Include and lib directories are probably best specified as lists of directories, but a blob of other native build options would be fine by me.

Of course, it doesn’t mean that a consumer of them can just use it blindly and expect 100% reliability across all systems, but that’s no regression from today. (And these are build-specific files, which means we’ll know which compiler was used when the file is created - they aren’t going to be general purpose information.)

Well, a relative path starting with an anchor will ignore the base, so you can have absolute paths in a relative path field.

The alternative doesn’t work though. If the fields are required to be absolute, you can’t put a relative path in there without a variable. So we define them as relative to a particular place (the directory containing the static description file) and tools can easily calculate absolute paths.

It may become more verbose if you are defining a custom file separately from the runtime it’s for (unless you’re then consuming it through a tool that knows how to rebase the location correctly), but that’s not the primary scenario. It should be possible, but it doesn’t have to be the easiest case.

rgommers · May 10, 2024, 5:08pm

Sure, shipping with the interpreter is the more important of the two use cases in the long run. There is no real conflict though, it is possible to make both easy. I suggest to tweak the paragraph under “Scope” saying that shipping with the interpreter and providing separately are both in scope, and that the former is the primary use case. Then update the one example to use relative paths instead of absolute (or provide two examples).

Then, tweak the rules like so:

Add base-prefix (seems needed anyway), as a path that may be either relative to the static file, or absolute.
If other paths are given as relative, make them relative to base-prefix.

This is simpler than my initial example (no need for {base-prefix} in other paths), and will make the “user provides the static file” much nicer without changing anything important for the primary use case. If anything, I suspect that that may become nicer too. E.g., say CPython decides to put the file in {base-prefix}/lib/python3.13/. Then now you’d get relative paths like (for the example in the PEP):

  "libpython": {
    "dynamic": "../libpython3.13.so.1.0",
    "static": "python3.13/config-3.13-x86_64-linux-gnu/libpython3.13.a",
  },
  "c_api": {
    "headers": "../../include/python3.13"
  }

this looks quite asymmetric and unusual. If it’s relative to base-prefix, it’ll look like:

  "libpython": {
    "dynamic": "lib/libpython3.13.so.1.0",
    "static": "lib/python3.13/config-3.13-x86_64-linux-gnu/libpython3.13.a",
  },
  "c_api": {
    "headers": "include/python3.13"
  }

I’m not a huge fan of this wording in the PEP by the way:

and the specifics of how that file is provided are completely up to them.

It’s perfectly fine to say that implementers are free to provide the file yes or no, and that they are also free to put it where it makes most sense to them. However, a recommended default location for regular Python installs that should be used unless the implementer has a reason to make a different choice is much more helpful. It doesn’t matter much where that default is, but having every implementer do something different will not really help anyone.

Same for the name of the file - even if there are corner cases that require freedom to choose a different naming scheme, leaving it completely open is unhelpful. So I suggest picking some-name.json, and then recommend that if modifications are needed, to make it some-name-xxx.json. That way one has at least some idea on how to start looking for this file.