PEP 711: PyBI: a standard format for distributing Python Binaries

Hey all, finally got around to posting this properly!

If anyone else is excited about making this real, I could very much use some help with two things:

  • Cleaning up my janky PyBI building code (the Windows and macOS scripts aren’t so bad, but the Linux code monkeypatches auditwheel and hacks up the manylinux build process)

  • Setting up automation to build+upload new PyBIs whenever a new CPython release comes out. Right now I’m running the scripts by hand, when I remember, on a personal machine :slight_smile:

Anyway: PEP text is available at PEP 711 – PyBI: a standard format for distributing Python Binaries | peps.python.org, or here it is inline so you can use Discourse quoting to comment on particular parts.


PEP 711 – PyBI: a standard format for distributing Python Binaries

Abstract

“Like wheels, but instead of a pre-built python package, it’s a pre-built python interpreter”

Motivation

End goal: Pypi.org has pre-built packages for all Python versions on all popular platforms, so automated tools can easily grab any of them and set it up. It becomes quick and easy to try Python prereleases, pin Python versions in CI, make a temporary environment to reproduce a bug report that only happens on a specific Python point release, etc.

First step (this PEP): define a standard packaging file format to hold pre-built Python interpreters, that reuses existing Python packaging standards as much as possible.

Examples

Example pybi builds are available at pybi.vorpus.org. They’re zip files, so you can unpack them and poke around inside if you want to get a feel for how they’re laid out.

You can also look at the tooling I used to create them.

Specification

Filename

Filename: {distribution}-{version}[-{build tag}]-{platform tag}.pybi

This matches the wheel file format defined in PEP 427, except dropping the {python tag} and {abi tag} and changing the extension from .whl.pybi.

For example:

  • cpython-3.9.3-manylinux_2014.pybi
  • cpython-3.10b2-win_amd64.pybi

Just like for wheels, if a pybi supports multiple platforms, you can separate them by dots to make a “compressed tag set”:

  • cpython-3.9.5-macosx_11_0_x86_64.macosx_11_0_arm64.pybi

(Though in practice this probably won’t be used much, e.g. the above filename is more idiomatically written as cpython-3.9.5-macosx_11_0_universal2.pybi.)

File contents

A .pybi file is a zip file, that can be unpacked directly into an arbitrary location and then used as a self-contained Python environment. There’s no .data directory or install scheme keys, because the Python environment knows which install scheme it’s using, so it can just put things in the right places to start with.

The “arbitrary location” part is important: the pybi can’t contain any hardcoded absolute paths. In particular, any preinstalled scripts MUST NOT embed absolute paths in their shebang lines.

Similar to wheels’ <package>-<version>.dist-info directory, the pybi archive must contain a top-level directory named pybi-info/. (Rationale: calling it pybi-info instead dist-info makes sure that tools don’t get confused about which kind of metadata they’re looking at; leaving off the {name}-{version} part is fine because only one pybi can be installed into a given directory.) The pybi-info/ directory contains at least the following files:

  • .../PYBI: metadata about the archive itself, in the same RFC822-ish format as METADATA and WHEEL files:

    Pybi-Version: 1.0
    Generator: {name} {version}
    Tag: {platform tag}
    Tag: {another platform tag}
    Tag: {...and so on...}
    Build: 1   # optional
    
  • .../RECORD: same as in wheels, except see the note about symlinks, below.

  • .../METADATA: In the same format as described in the current core metadata spec, except that the following keys are forbidden because they don’t make sense:

    • Requires-Dist
    • Provides-Extra
    • Requires-Python

And also there are some new, required keys described below.

Pybi-specific core metadata

Here’s an example of the new METADATA fields, before we give the full details:

Pybi-Environment-Marker-Variables: {"implementation_name": "cpython", "implementation_version": "3.10.8", "os_name": "posix", "platform_machine": "x86_64", "platform_system": "Linux", "python_full_version": "3.10.8", "platform_python_implementation": "CPython", "python_version": "3.10", "sys_platform": "linux"}
Pybi-Paths: {"stdlib": "lib/python3.10", "platstdlib": "lib/python3.10", "purelib": "lib/python3.10/site-packages", "platlib": "lib/python3.10/site-packages", "include": "include/python3.10", "platinclude": "include/python3.10", "scripts": "bin", "data": "."}
Pybi-Wheel-Tag: cp310-cp310-PLATFORM
Pybi-Wheel-Tag: cp310-abi3-PLATFORM
Pybi-Wheel-Tag: cp310-none-PLATFORM
Pybi-Wheel-Tag: cp39-abi3-PLATFORM
Pybi-Wheel-Tag: cp38-abi3-PLATFORM
Pybi-Wheel-Tag: cp37-abi3-PLATFORM
Pybi-Wheel-Tag: cp36-abi3-PLATFORM
Pybi-Wheel-Tag: cp35-abi3-PLATFORM
Pybi-Wheel-Tag: cp34-abi3-PLATFORM
Pybi-Wheel-Tag: cp33-abi3-PLATFORM
Pybi-Wheel-Tag: cp32-abi3-PLATFORM
Pybi-Wheel-Tag: py310-none-PLATFORM
Pybi-Wheel-Tag: py3-none-PLATFORM
Pybi-Wheel-Tag: py39-none-PLATFORM
Pybi-Wheel-Tag: py38-none-PLATFORM
Pybi-Wheel-Tag: py37-none-PLATFORM
Pybi-Wheel-Tag: py36-none-PLATFORM
Pybi-Wheel-Tag: py35-none-PLATFORM
Pybi-Wheel-Tag: py34-none-PLATFORM
Pybi-Wheel-Tag: py33-none-PLATFORM
Pybi-Wheel-Tag: py32-none-PLATFORM
Pybi-Wheel-Tag: py31-none-PLATFORM
Pybi-Wheel-Tag: py30-none-PLATFORM
Pybi-Wheel-Tag: py310-none-any
Pybi-Wheel-Tag: py3-none-any
Pybi-Wheel-Tag: py39-none-any
Pybi-Wheel-Tag: py38-none-any
Pybi-Wheel-Tag: py37-none-any
Pybi-Wheel-Tag: py36-none-any
Pybi-Wheel-Tag: py35-none-any
Pybi-Wheel-Tag: py34-none-any
Pybi-Wheel-Tag: py33-none-any
Pybi-Wheel-Tag: py32-none-any
Pybi-Wheel-Tag: py31-none-any
Pybi-Wheel-Tag: py30-none-any

Specification:

  • Pybi-Environment-Marker-Variables: The value of all PEP 508 environment marker variables that are static across installs of this Pybi, as a JSON dict. So for example:
    • python_version will always be present, because a Python 3.10 package always has python_version == "3.10".

    • platform_version will generally not be present, because it gives detailed information about the OS where Python is running, for example:

      #60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021`
      

      platform_release has similar issues.

    • platform_machine will usually be present, except for macOS universal2 pybis: these can potentially be run in either x86-64 or arm64 mode, and we don’t know which until the interpreter is actually invoked, so we can’t record it in static metadata.Rationale: In many cases, this should allow a resolver running on Linux to compute package pins for a Python environment on Windows, or vice-versa, so long as the resolver has access to the target platform’s .pybi file. (Note that Requires-Python constraints can be checked by using the python_full_version value.) While we have to leave out a few keys sometimes, they’re either fairly useless (platform_version, platform_release) or can be reconstructed by the resolver (platform_machine).The markers are also just generally useful information to have accessible. For example, if you have a pypy3-7.3.2 pybi, and you want to know what version of the Python language that supports, then that’s recorded in the python_version marker.(Note: we may want to deprecate/remove platform_version and platform_release? They’re problematic and I can’t figure out any cases where they’re useful. But that’s out of scope of this particular PEP.)

  • Pybi-Paths: The install paths needed to install wheels (same keys as sysconfig.get_paths()), as relative paths starting at the root of the zip file, as a JSON dict.These paths MUST be written in Unix format, using forward slashes as a separator, not backslashes.It must be possible to invoke the Python interpreter by running {paths["scripts"]}/python. If there are alternative interpreter entry points (e.g. pythonw for Windows GUI apps), then they should also be in that directory under their conventional names, with no version number attached. (You can also have a python3.11 symlink if you want; there’s no rule against that. It’s just that python has to exist and work.)Rationale: Pybi-Paths and Pybi-Wheel-Tags (see below) are together enough to let an installer choose wheels and install them into an unpacked pybi environment, without invoking Python. Besides, we need to write down the interpreter location somewhere, so it’s two birds with one stone.
  • Pybi-Wheel-Tag: The wheel tags supported by this interpreter, in preference order (most-preferred first, least-preferred last), except that the special platform tag PLATFORM should replace any platform tags that depend on the final installation system.Discussion: It would be nice™ if installers could compute a pybi’s corresponding wheel tags ahead of time, so that they could install wheels into the unpacked pybi without needing to actually invoke the python interpreter to query its tags – both for efficiency and to allow for more exotic use cases like setting up a Windows environment from a Linux host.But unfortunately, it’s impossible to compute the full set of platform tags supported by a Python installation ahead of time, because they can depend on the final system:
    • A pybi tagged manylinux_2_12_x86_64 can always use wheels tagged as manylinux_2_12_x86_64. It also might be able to use wheels tagged manylinux_2_17_x86_64, but only if the final installation system has glibc 2.17+.

    • A pybi tagged macosx_11_0_universal2 (= x86-64 + arm64 support in the same binary) might be able to use wheels tagged as macosx_11_0_arm64, but only if it’s installed on an “Apple Silicon” machine and running in arm64 mode.In these two cases, an installation tool can still work out the appropriate set of wheel tags by computing the local platform tags, taking the wheel tag templates from Pybi-Wheel-Tag, and swapping in the actual supported platforms in place of the magic PLATFORM string.However, there are other cases that are even more complicated:

    • You can (usually) run both 32- and 64-bit apps on 64-bit Windows. So a pybi
      installer might compute the set of allowable pybi tags on the current platform as [win32, win_amd64]. But you can’t then just take that set and swap it into the pybi’s wheel tag template or you get nonsense:

      [ "cp39-cp39-win32", "cp39-cp39-win_amd64", "cp39-abi3-win32", "cp39-abi3-win_amd64", ... ]
      

To handle this, the installer needs to somehow understand that a manylinux_2_12_x86_64 pybi can use a manylinux_2_17_x86_64 wheel as long as those are both valid tags on the current machine, but a win32 pybi can’t use a win_amd64 wheel, even if those are both valid tags on the current machine.

  • A pybi tagged macosx_11_0_universal2 might be able to use wheels tagged as macosx_11_0_x86_64, but only if it’s installed on an x86-64 machine or it’s installed on an ARM machine and the interpreter is invoked with the magic incantation that tells macOS to run a binary in x86-64 mode. So how the installer plans to invoke the pybi matters too!So actually using Pybi-Wheel-Tag values is less trivial than it might seem, and they’re probably only useful with fairly sophisticated tooling. But, smart pybi installers will already have to understand a lot of these platform compatibility issues in order to select a working pybi, and for the cross-platform pinning/environment building case, users can potentially provide whatever information is needed to disambiguate exactly what platform they’re targeting. So, it’s still useful enough to include in the PyBI metadata – tools that don’t find it useful can simply ignore it.

You can probably generate these metadata values by running this script on the built interpreter:

import packaging.markers
import packaging.tags
import sysconfig
import os.path
import json
import sys

marker_vars = packaging.markers.default_environment()
# Delete any keys that depend on the final installation
del marker_vars["platform_release"]
del marker_vars["platform_version"]
# Darwin binaries are often multi-arch, so play it safe and
# delete the architecture marker. (Better would be to only
# do this if the pybi actually is multi-arch.)
if marker_vars["sys_platform"] == "darwin":
    del marker_vars["platform_machine"]

# Copied and tweaked version of packaging.tags.sys_tags
tags = []
interp_name = packaging.tags.interpreter_name()
if interp_name == "cp":
    tags += list(packaging.tags.cpython_tags(platforms=["xyzzy"]))
else:
    tags += list(packaging.tags.generic_tags(platforms=["xyzzy"]))

tags += list(packaging.tags.compatible_tags(platforms=["xyzzy"]))

# Gross hack: packaging.tags normalizes platforms by lowercasing them,
# so we generate the tags with a unique string and then replace it
# with our special uppercase placeholder.
str_tags = [str(t).replace("xyzzy", "PLATFORM") for t in tags]

(base_path,) = sysconfig.get_config_vars("installed_base")
# For some reason, macOS framework builds report their
# installed_base as a directory deep inside the framework.
while "Python.framework" in base_path:
    base_path = os.path.dirname(base_path)
paths = {key: os.path.relpath(path, base_path).replace("\\", "/") for (key, path) in sysconfig.get_paths().items()}

json.dump({"marker_vars": marker_vars, "tags": str_tags, "paths": paths}, sys.stdout)

This emits a JSON dict on stdout with separate entries for each set of pybi-specific tags.

Symlinks

Currently, symlinks are used by default in all Unix Python installs (e.g., bin/python3 -> bin/python3.9). And furthermore, symlinks are required to store macOS framework builds in .pybi files. So, unlike wheel files, we absolutely have to support symlinks in .pybi files for them to be useful at all.

Representing symlinks in zip files

The de-facto standard for representing symlinks in zip files is the Info-Zip symlink extension, which works as follows:

  • The symlink’s target path is stored as if it were the file contents
  • The top 4 bits of the Unix permissions field are set to 0xa, i.e.: permissions & 0xf000 == 0xa000
  • The Unix permissions field, in turn, is stored as the top 16 bits of the “external attributes” field.

So if using Python’s zipfile module, you can check whether a ZipInfo represents a symlink by doing:

(zip_info.external_attr >> 16) & 0xf000 == 0xa000

Or if using Rust’s zip crate, the equivalent check is:

fn is_symlink(zip_file: &zip::ZipFile) -> bool {
    match zip_file.unix_mode() {
        Some(mode) => mode & 0xf000 == 0xa000,
        None => false,
    }
}

If you’re on Unix, your zip and unzip commands probably understands this format already.

Representing symlinks in RECORD files

Normally, a RECORD file lists each file + its hash + its length:

my/favorite/file,sha256=...,12345

For symlinks, we instead write:

name/of/symlink,symlink=path/to/symlink/target,

That is: we use a special “hash function” called symlink, and then store the actual symlink target as the “hash value”. And the length is left empty.

Rationale: we’re already committed to the RECORD file containing a redundant check on everything in the main archive, so for symlinks we at least need to store some kind of hash, plus some kind of flag to indicate that this is a symlink. Given that symlink target strings are roughly the same size as a hash, we might as well store them directly. This also makes the symlink information easier to access for tools that don’t understand the Info-Zip symlink extension, and makes it possible to losslessly unpack and repack a Unix pybi on a Windows system, which someone might find handy at some point.

Storing symlinks in pybi files

When a pybi creator stores a symlink, they MUST use both of the mechanisms defined above: storing it in the zip archive directly using the Info-Zip representation, and also recording it in the RECORD file.

Pybi consumers SHOULD validate that the symlinks in the archive and RECORD file are consistent with each other.

We also considered using only the RECORD file to store symlinks, but then the vanilla unzip tool wouldn’t be able to unpack them, and that would make it hard to install a pybi from a shell script.

Limitations

Symlinks enable a lot of potential messiness. To keep things under control, we impose the following restrictions:

  • Symlinks MUST NOT be used in .pybis targeting Windows, or other platforms that are missing first-class symlink support.
  • Symlinks MUST NOT be used inside the pybi-info directory. (Rationale: there’s no need, and it makes things simpler for resolvers that need to extract info from pybi-info without unpacking the whole archive.)
  • Symlink targets MUST be relative paths, and MUST be inside the pybi directory.
  • If A/B/... is recorded as a symlink in the archive, then there MUST NOT be any other entries in the archive named like A/B/.../C.For example, if an archive has a symlink foo -> bar, and then later in the archive there’s a regular file named foo/blah.py, then a naive unpacker could potentially end up writing a file called bar/blah.py. Don’t be naive.

Unpackers MUST verify that these rules are followed, because without them attackers could create evil symlinks like foo -> /etc/passwd or foo -> ../../../../../etc + foo/passwd -> ... and cause havoc.

Non-normative comments

Why not just use conda?

This isn’t really in the scope of this PEP, but since conda is a popular way to distribute binary Python interpreters, it’s a natural question.

The simple answer is: conda is great! But, there are lots of python users who aren’t conda users, and they deserve nice things too. This PEP just gives them another option.

The deeper answer is: the maintainers who upload packages to PyPI are the backbone of the Python ecosystem. They’re the first audience for Python packaging tools. And one thing they want is to upload a package once, and have it be accessible across all the different ways Python is deployed: in Debian and Fedora and Homebrew and FreeBSD, in Conda environments, in big companies’ monorepos, in Nix, in Blender plugins, in RenPy games, …… you get the idea.

All of these environments have their own tooling and strategies for managing packages and dependencies. So what’s special about PyPI and wheels is that they’re designed to describe dependencies in a standard, abstract way, that all these downstream systems can consume and convert into their local conventions. That’s why package maintainers use Python-specific metadata and upload to PyPI: because it lets them address all of those systems simultaneously. Every time you build a Python package for conda, there’s an intermediate wheel that’s generated, because wheels are the common language that Python package build systems and conda can use to talk to each other.

But then, if you’re a maintainer releasing an sdist+wheels, then you naturally want to test what you’re releasing, which may depend on arbitrary PyPI packages and versions. So you need tools that build Python environments directly from PyPI, and conda is fundamentally not designed to do that. So conda and pip are both necessary for different cases, and this proposal happens to be targeting the pip side of that equation.

Sdists (or not)

It might be cool to have an “sdist” equivalent for pybis, i.e., some kind of format for a Python source release that’s structured-enough to let tools automatically fetch and build it into a pybi, for platforms where prebuilt pybis aren’t available. But, this isn’t necessary for the MVP and opens a can of worms, so let’s worry about it later.

What packages should be bundled inside a pybi?

Pybi builders have the power to pick and choose what exactly goes inside. For example, you could include some preinstalled packages in the pybi’s site-packages directory, or prune out bits of the stdlib that you don’t want. We can’t stop you! Though if you do preinstall packages, then it’s strongly recommended to also include the correct metadata (.dist-info etc.), so that it’s possible for Pip or other tools to understand out what’s going on.

For my prototype “general purpose” pybi’s, what I chose is:

  • Make sure site-packages is empty.Rationale: for traditional standalone python installers that are targeted at end-users, you probably want to include at least pip, to avoid bootstrapping issues (PEP 453). But pybis are different: they’re designed to be installed by “smart” tooling, that consume the pybi as part of some kind of larger automated deployment process. It’s easier for these installers to start from a blank slate and then add whatever they need, than for them to start with some preinstalled packages that they may or may not want. (And besides, you can still run python -m ensurepip.)
  • Include the full stdlib, except for test.Rationale: the top-level test module contains CPython’s own test suite. It’s huge (CPython without test is ~37 MB, then test adds another ~25 MB on top of that!), and essentially never used by regular user code. Also, as precedent, the official nuget packages, the official manylinux images, and multiple Linux distributions all leave it out, and this hasn’t caused any major problems.So this seems like the best way to balance broad compatibility with reasonable download/install sizes.
  • I’m not shipping any .pyc files. They take up space in the download, can be generated on the final system at minimal cost, and dropping them removes a source of location-dependence. (.pyc files store the absolute path of the corresponding .py file and include it in tracebacks; but, pybis are relocatable, so the correct path isn’t known until after install.)

Backwards Compatibility

No backwards compatibility considerations.

Security Implications

No security implications, beyond the fact that anyone who takes it upon themselves to distribute binaries has to come up with a plan to manage their security (e.g., whether they roll a new build after an OpenSSL CVE drops). But collectively, we core Python folks are already maintaining binary builds for all major platforms (macOS + Windows through python.org, and Linux builds through the official manylinux image), so even if we do start releasing official CPython builds on PyPI it doesn’t really raise any new security issues.

How to Teach This

This isn’t targeted at end-users; their experience will simply be that e.g. their pyenv or tox invocation magically gets faster and more reliable (if those projects’ maintainers decide to take advantage of this PEP).

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

23 Likes

Is there a conflict here? Does the vanilla unzip do this kind of verification?

Aside from that: how would I properly install such a binary, as opposed to simply using it relocatably? Would there be more to it than putting it in a system-blessed location and configuring PATH? What about py on Windows - how would it become aware of an unzipped .pybi? What about Linux distributions that like to split Python between /usr/bin and /usr/lib?

If the idea going forward is to say goodbye to all of that and have every Python installation be a self-contained, relocatable thing, that’s definitely something that Linux vendors, and Apple, would have to take into account. Or are we going to treat the system Python as a one-off and teach everyone to install a separate relocatable Python if they want to write their own code? (Arguably good practice anyway, but still.)

Vanilla unzip does have countermeasures against writing outside the target directory, yeah. I’m not sure exactly what semantics it enforces – the ones I wrote in the PEP might be a bit stricter than necessary – but they’re similar in spirit at least. And anyway, if someone really wants to make and use an unpacker that messes up their system we can’t really stop them :-).

Think of pybis as a low-level building block. They’re unopinionated. They say, here’s a Python and the core information you need to work with it – you figure out what you want to do with it, if anything. If someone wants to put one on PATH or if py wants to come up with a scheme to manage environments that’s up to them. It’s certainly not going to stop Linux distros or Apple or Microsoft or conda from distributing Python in other ways. It’s just about giving you more options.

Personally I am excited about the idea of having a single workflow manager that automatically handles installation/pinning/etc for both the interpreter and the packages within it, so “installing python” becomes an invisible detail that users don’t even think about. But there’s nothing about pybis that forces you to use them that way; you can imagine lots of different workflows or UIs around them.

3 Likes

Okay, I like the sound of that.

As an aside: I’m not really sure what people are talking about with terminology like “workflow manager”. Exactly what tasks are involved in this?

1 Like

I guess mostly it is about “development workflow tools”, such as Hatch, PDM, Poetry, and so on. Such a tool can cover tasks like packaging project into distributions, uploading distributions to PyPI, bumping versions, dependency and dependency constraints management (including lock files), task runner (test suite for example), and more. Where you once had to use setuptools, build, twine, tox, and pyinvoke, you might now be able to do all of this with just one tool.

And the idea is that if we had relocatable Python binaries, then tools (old and new) might be able to cover even more use cases (or implement existing use cases more efficiently). For example, tox would be able to automatically fetch a PyBI to test your library against a Python versions that is not installed yet. Maybe it would also open to new tools that cover non-development workflows as well (personally I wish for something like a pipx combined with pyenv and py launcher).

2 Likes

When I talk about a workflow manager (I can’t speak for others) I mean a tool like hatch, poetry or PDM that provides commands to do the various parts of your development process (and usually has an opinionated view on how you do those steps). Creating virtual environments for you to run tests and automate build processes, initialising a project structure, etc.

Tools like tox and nox are similar, but less opinionated and don’t try to cover the full development process. I tend to refer to these as “task runners” or “environment managers” to distinguish them from the workflow tools.

But none of this is precise, or formally agreed terminology, as far as I know.

I think of workflow tools for Python as tools that handle the underlying details for the user and give them a single unified tool that has install / publish / run / test etc commands. They usually provide functionality that the “default” tools do not (eg: environment-agnostic lockfiles, automated environment management etc) and serve as an end-to-end tool enables them to trim the scope and define it as they deem appropriate.

(yes, I copy-pasted from https://pradyunsg.me/blog/2023/01/21/thoughts-on-python-packaging/#on-existing-workflow-tools)

1 Like

…Huh. I’ve been using Poetry to write pyproject.toml files for me / template the initial folder structure, and provide a build backend, and take the dependency resolution as a free perk that might actually do something for me some day. But it never really occurred to me that I might actually want something else to run venv or pip for me.

I do still like the idea of being able to swap out individual pieces of that kit - I really was never a fan of twine but it’s hard to put my finger on why, and the setuptools part has been an annoying moving target. It’s nice to see build replacing that (and moving away from having an executable configuration file) but it’s a little hard to swallow that it doesn’t come with Python when both distutils and setuptools did. I haven’t yet had serious issues with dependencies on anything and I can’t see the point in a task runner at all. For my needs I probably would be satisfied with build and a more sophisticated initial-project template (especially with TOML support entering the standard library), I suppose.

First of all, this is awesome! I’ll need to start digging into this, building some of my own PyBIs, and use it in my Python provisioning scripts :+1:

A few questions from my first impression of the format:

  1. How should this handle CPython’s optional dependencies, e.g. readline and sqlite3? OpenSSL is mentioned in the Security Implications section, which seems to imply that they are expected to be vendored in the PyBI? Although I assume the system pthread can be linked since it’s traditionally a part of manylinux?
  2. In a similar vein to the first point, do you know of any limitations in the distributed Python? Existing solutions on the same topic generally have a few of them, such as python-build-standalone, but the PEP doesn’t list any. If there are limitations (whether known or not at the moment), do you expect them to be resolved in the canonical CPython code base if needed?
  3. How do you distinguish this from the embedded distribution for Windows? Aside from that only works for Windows, of course.

I also kind of envision dependencies may make sense at some point (especially considering optional dependencies mentioned above), but that can certainly be considered out of scope for now.

4 Likes

Thank you so much for all the work you’ve put into this, this is great! After a first read of the proposal, it looks pretty good. I have a couple suggestions. :heart:

It would be good if we could add more metadata to register the provenance of the release (eg. Source-Hash, Source-URL, etc. fields in the PYBI file), and to add a hard requirement to build from pristine sources (other than minor build system fixups, if necessary), so that we avoid getting into the same situation we are right now with vendor patching (eg. if someone builds from modified sources, they should be required to add a local version tag and a Source-Dirty: False, or distribute it with a different implementatation name). What do you think?

One question, is the pybi-info directory required to be installed? I don’t think that’s very clear, unless I missed something (sorry in advance for my dyslexia and ADHD combo :smiling_face_with_tear:), and it would be supper useful to register that information (sidenote[1]).

IMO we should define the file name, and do it per platform.

Eg.

A Python interpreter entrypoint (an executable on most platforms) must be present on the following path:

Cheers!


  1. Perhaps it would make sense to standardize a way to record provenance (build information) directly on Python, as that would be super useful for the other distribution channels. ↩︎

Another afterthought, would it make to recommend standardized user-writable locations where tooling cold install and cache PyBI interpreters (eg. XDG_DATA_DIR/pybi and XDG_CACHE_HOME/pypi on Linux)? I think this is one of the things that we should probably have done for wheels, for eg.

One additional question I forgot to include in my previous comment:

  1. Does this use the framework build on macOS? Either way, are both modes supported, and if so how are they distinguished?

Can you use JSON instead? The format of METADATA is really janky and underspecified (line continuations alone …). Pretty much every new file we have put into wheels for metadata have been JSON-based for a reason. :sweat_smile:

My suspicion is your answer to this question/suggestion is, “let’s start small” (and I do appreciate including all the metadata necessary to do a resolve for library dependencies without executing code with the interpreter), but it would great to have more data about the interpreter for other tooling like editors. Support a way for other tools to assist in environment/interpreter discovery · brettcannon/python-launcher · Discussion #168 · GitHub is where I have been thinking on this topic and it’s around details about the interpreter and how to use it, which editors and the Python Launcher would like to know. Having that available without having to run the interpreter would be handy.

I’m also noticing that the PEP doesn’t say whether the PYBI file gets installed anywhere. Is the idea it doesn’t? If not it seems like an unfortunate waste because of what could be done with it after the interpreter is installed.

The key word there, though, is “did”. :wink: Since that’s no longer the case I would not expect the stdlib to ship packaging-related stuff like installer and build so they can update faster than Python does.

I talked to @thomas about this at the core dev sprints, and if I remember correctly there’s a way to do the builds such that you can ship e.g. sqlite3 as a .so with the interpreter but let a copy of sqlite3 on the machine override it if available. This would help with the typical “what about OpenSSL” concern (although PyPy has been shipping OpenSSL in-box and they told me they haven’t had any issues or concerns from doing that).

3 Likes

I finally stopped dragging my feet about this and posted about it at What information is useful to know statically about an interpreter? . Perhaps we can standardize this metadata along with or independent of this PEP? That way PyBIs can focus more on the distribution bit and less on this small part.

2 Likes

I’m assuming they’ll be vendored – same as we do for wheels, and for all the existing Python.org binary distributions. Or it would be possible for someone to make alternate builds that leave out modules they don’t care about, to use internally or publish under an alternative name like pypi.org/p/sir-robins-minimal-cpython or whatever.

The only limitation I know of currently is that setuptools sometimes generates the wrong include path when invoking the C compiler (#3786), but that’s a minor issue and can be fixed on setuptools side, or might even be fixed already. It’s possible that we’ll run into some more, but I expect they’ll all be clear bugs that are easy to fix – these are much more conventional installations than python-build-standalone; really the only thing special is that we can’t compile-in any absolute paths.

The embedded distribution for Windows is actually a weird beast – it’s trimmed down and if you tried to use it as a general purpose python environment you’d run into weird errors. But my Windows pybis are very similar to the nuget distribution for Windows that Steve maintains – in fact they are literally the same files, just wrapped with different metadata :slight_smile: That’s the main thing pybis bring to the table: metadata designed to fit in with our other packaging standards. (I’m even using auditwheel to vendor binaries into the Linux pybis.)

I like the idea, but since it makes just as much sense for regular packages, I think it’s orthogonal to this proposal – you should make a PEP to add these to the core metadata standard, so that both pybis and wheels can benefit :slight_smile:

I don’t think we can enforce this. If the Python core team starts making official pybi releases at pypi.org/p/cpython then obviously those will follow best practices, but if someone else wants to upload pypi.org/p/sir-bevederes-mangled-cpython then I think that’s fine? Good, even – there are interesting cpython forks like cinder and it would be fantastic if you could test them out by just adding cinder >= 3.11 to your environment specification. And obviously we won’t let randos upload to pypi.org/p/cpython.

I’d rather not mandate anything about the install process, because these are designed to be flexible low-level building blocks that might get used in all kinds of ways. Consider eg a project like Blender that supports Python plugins – they might want to grab a pybi as part of their build process and vendor it into their final binary, after transforming it in various ways (eg deleting stuff they don’t use, rearranging stuff to fit their layout, whatever). I don’t even know what these mandates would mean in that situation.

(Also a shared cache requires a shared cache format, and I don’t feel up to designing one right now.)

I know, I had to port Python’s email.parser to Rust :-). And I’m sure you noticed me jamming JSON into METADATA rather than deal with trying to extend the format.

But the PYBI file is almost exactly the same as the WHEEL file that wheels include, so I think it’s simpler to keep similar things similar? Eg posy uses mostly the same code to parse and unpack pybis and wheels, which is doable because the formats are so close.

Oh, totally down to add more metadata if it’s useful. The current set was refined based on what I was trying to do with it, so there might be blind spots. I’ll reply over in your thread.

(It’s probably the pybi-info/METADATA file that you want?)

The way posy is handling this currently is to just unpack the whole zip file and drop it into an internal location (inside some content-addressed store with hashes as variable names). So the pybi-info/ metadata is all there, but OTOH you probably can’t find the environment without some kind of help, so I’m not sure how useful that is…

But I can see how if you find a python some other way (eg searching $PATH?) then it’d be handy if you had a chance to find a corresponding pybi-info/ (I guess in this case by walking up the path from the bin dir?) We could put some hints about this in the non-normative guidance section if you’re worried?

I wonder if allowing dependencies can be useful for this—we can have a minimal pybi, a few pybis containing the optional parts, and a “full” pybi that depends on those, with extras defined for users to choose what they want. But I’m probably getting too far ahead of things.

I think it makes sense to mandate the metadata directory is always unpacked with the rest of the files, to guarantee that any manager down the line could display the description of long description, or uninstall the Python using the RECORD file.


I also think the metadata file should only have one text format, requiring one text parser. I prefer JSON as well, as it’s less ambiguous, but if not then I think the following is a solution (similar to the HTTP Authorization header):

Pybi-Path: stdlib lib/python3.10
Pybi-Path: platstdlib lib/python3.10
...

You mention 3 fields from the core metadata spec, but what about Provides-Extra, Providers-Dist and Obsoletes-Dist?


I could see this being used to set up a new Python in virtual environments. Is there anything fundamentally blocking this?

1 Like

For what it’s worth, I’m excited by this prospect - one of my own long-term projects is re-imagining the standard library (in a world where its sole mandate is to provide a full suite of “batteries”, and it allows itself to rely on third parties, be incrementally updated etc.), so this sounds like a great way to publish that vision.

On the other hand - what if someone uploads something in PyBI format that doesn’t (even try to) implement Python? What if it’s malware? Do we care? I don’t think PyPI cares about the analogous concern (and neither does the core NPM repository, etc. for other languages). But if we don’t care, then how do I (and sirs Robin and Bevedere) establish trust?

1 Like

Not quite. What we talked about is how to link the third-party libraries into the extension module in such a way that it doesn’t interfere with other copies of the third-party library – and more importantly, not interfere with other extension modules depending on (a different version/build of) the same third-party library. (You do that by statically linking the third-party library in the extension module, and hiding all the symbols except for the PyInit_ function.)

Bundling the third-party .so’s but letting them be overridden by system-installed versions is possible, but not as straight-forward. (You would need to try and resolve the third-party dependency via the regular linker, and explicitly dlopen() the bundled version if it isn’t found.) The bundled copy isn’t isolated that way, so you still run the risk of conflict with other extension modules using the same kind of trick.

Why not just use Nix? Nix already allows Python environments that can import arbitrary packages from PyPi without rebuilding the entire interpreter environment, and achieves the goal of interoperability with multiple systems (without needing to bother with “many” Linux, it should just work).

Moreover, the goal of avoiding absolute paths is laudable but ultimately untenable due to the dynamic linker - unless you are aiming for static linked this is why patchelf exists.

1 Like