User experience with porting off setup.py

Are we okay with the only tutorial on PyPA is a packaging tutorial that requires the users to learn about what a build backend is and go figure out how to choose one? Is this what we want for first-time package builders[1]? Why only have one tutorial[2]?

The hybrid approach of including too much for beginners and too little for experienced users is making this a bad tutorial for all. Just as an exercise how well does the PyPA packaging tutorial follow diataxis’ tutorial?


  1. Given you can perfectly build a pure python package without specifying the backend at all. ↩︎

  2. I’d like to see a tutorial that is frontend-centric that guides a beginner without mentioning backend. Perhaps an excerpt at the end to point them to a advanced tutorial or a dedicated backend guide. ↩︎

6 Likes

IMO, one place with 5-7 separate topic guides / howtos rather than “tutorials”. FWIW these are the ones I’d personally love to see:

Topic Guides

Packaging a pure-Python project with basic defaults

Walk through the simplest possible case. Given some of the more important definitions. This needs to mention that backends, etc exists, and link to other guides. It is important to convey to beginners that if they hit a wall with simplest things, that there is more to look into, and some idea where.

Using and choosing build backends

Has many definitions: what is a backend, why does it matter? Discussions and comparisons of common/popular backends and their configuration, at least one full example, and lots of links to backend projects docs.

Handling advanced cases with compilation steps

A couple of full examples, discussion and links of tools like scikit-build or advanced build capabilities of other backends. There’s lots to potentially cover here, or at least mention and link to things about: cython, wrapping C, C++ or Fortran, building/bundling Typescript. Maybe suggest DPO as a place for deep technical questions.

Modernizing legacy setup.py usage

Address all the things that came up for the OP here, links to the other guides as appropriate. Quick tips/FAQs for common things e.g. “what do I do with these command line arguments?”

Publishing packages

Information about PyPI and twine.


For all of the above, having corresponding GH repos that users can clone and immediately toy with would, where appropriate, be really nice as well.

17 Likes

3 posts were split to a new topic: Is it possible to go back to setuptools and setup.py as a frontend?

Captured in this issue: Add topic guides based on Discourse Discussion · Issue #1334 · pypa/packaging.python.org · GitHub

5 Likes

I wanted to chime in that I’ve been very encouraged with the activity on this thread!

I had typed up a lot of content that I ultimately stripped from my blog post on what I thought should be done and folks here seem to be gravitating towards a lot of what I was going to say.

In case you missed it, there was some additional discussion on this blog post on Twitter/X, Lobste.rs, and HN. The largest themes I got were:

  • Me too. I’m clearly not alone in this boat. Some people even said they gave up porting off python setup.py because they couldn’t figure out how to do it.
  • There was a lot of sentiment that authoritative guidance on what to do in Python packaging land is severely lacking. That good, trustworthy, modern documentation is hard to find.
  • Lots of people incorrectly believe that all of setuptools is deprecated and they need to delete setup.py.
  • People seemed to be largely dissatisfied with the extra complexity from the introduction of pyproject.toml. However, I think the reasons are highly varied. (Some people just don’t like any change. Others are complaining about the lack of porting docs. Etc.)

In addition to the topics discussed so far, I want to raise a few more from my post.

Lack of a Build Frontend in the Default Distribution

I don’t fully understand why a build frontend isn’t present in Python distributions by default.

I think that shipping a build frontend in Python distributions could make things vastly simpler for end-users by eliminating a lot of cognitive overhead with having to think about which build frontend to use and how to install it.

Many languages do things this way (Ruby’s gem, Rust’s cargo, Go’s go). End-users seem to love the unified toolchain approach. And the presence of a default tool doesn’t undermine innovation in the larger ecosystem.

Before Python 3.12 (or earlier 3.x releases if we want to be pedantic about setuptools availability), we had the ability to produce sdists and wheels using just the standard library’s distutils + [ensure]pip. But 3.12 fully removed this capability. If we want to be customer focused and ease the transition for existing package maintainers, shipping a [simple-to-use] build frontend in the distribution seems like an effective way to do that.

Securely Installing Packages in pyproject.toml

Are my blog’s assertions about pyproject.toml build system package installation being intrinsically insecure accurate?

This question can be answering by stating how to deterministically bootstrap a Python build system frontend and backend and all transitive dependencies in a way that is robust against new package versions being published and is resistant to content tampering.

Is there any documentation on how to do this bootstrapping securely? Are there any discussions on it folks can link me to?

FWIW I have a half baked idea for package registries to store content digest indexed manifests - think requirements.txt files or poetry’s equivalent - and then for package installer frontends like pip to be able to do something like pip install flask@sha256:deadbeef42... to download a content indexed manifest stating all transitive dependencies to install. This way, deterministic install descriptors can be generated and used for reproducible, tamper-resistant installs. All an end-user has to do is refer to a short, immutable content digest instead of having to maintain the manifest themselves. This is conceptually similar to how OCI (read: Docker) image registries work - image manifest & image index.

9 Likes

2 posts were split to a new topic: Proposed alternative governance structure for Python packaging

build has a dependency on packaging and pyproject_hooks. If they are installed normally, it is problematic because there are other tools with a dependency on them, and you could no longer get a consistent environment in case the packaging or pyproject_hooks version wanted by a tool is incompatible with that wanted by build. Effectively, you would be adding dependency constraints on all Python environments. Vendoring is not good either because build also has an importable API. You would easily get two versions of the same package (packaging and build._vendored.packaging) imported into the same Python process.

I am sympathetic to the desire of a unified toolchain, but I don’t think the stdlib is the best way to go about it.

Also note that it has been the case for a long time that you need an extra tool, twine, to upload packages to PyPI. This is not new under the sun.

This is a legitimate concern.

So? What’s wrong with that? I know of a handful of projects that vendor their dependencies and then import them into package-local namespaces so they don’t conflict with modules from site-packages. Pip is in that list. I admit it is a crude and somewhat less efficient solution. But it works.

For the rare project that wants to import build as an API and also needs to use packaging and/or pyproject_hooks, it seems to me that build could gain an API that forces it to import the global version of dependencies to avoid the multiple version problem. Again, a pattern I’ve seen in the wild.

I also believe this is sub-optimal and somewhat user hostile. But the set of people who want to upload packages is smaller than the set who want to build them is smaller than the set who want to install packages. So it isn’t the highest priority to address.

I will note that at the point there exists a unified packaging tool that does everything under the sun, this problem becomes moot because you must ship a tool to install packages out-of-the-box and if that tool also does building and uploading, you get those features for free without having to debate whether to include another tool.

6 Likes

Pip does not have an importable API, though.

You also have to consider that packaging standards are quite in flux, and if everyone gets build preinstalled (or if it even becomes part of the stdlib), then rolling out changes will be much harder. Pip has “please upgrade” automatic notices to help with that, but as an installation tool, it is priviledged.

1 Like

I pinned versions of build dependencies in my pyproject.toml and the OpenIndiana package maintainer filed an issue that it breaks downstream packaging if my pinned version is different from whatever OpenIndiana is using.

This is exactly the kind of [undocumented] negative downstream packaging effect I was worried about when adopting pyproject.toml :confused:

So I guess as a package maintainer I have to choose between determinism and the convenience of downstream packagers.

If I ignore pyproject.toml and just invoke python setup.py build from a [deterministic] virtualenv [using a requirements.txt with all versions pinned], I can have both.

Maybe the problem here is downstream distro packagers aren’t yet using modern packaging workflows. But something tells me they won’t like the new workflows for a myriad of reasons (including the non-determinism and the fact that the build frontend really likes to download things from the Internet).

3 Likes

You might want to read this discussion: PEP 665: Specifying Installation Requirements for Python Projects . PEP 665, proposing a lock file format (with hashes for security), was rejected due to lack of sdist support.

Yes, this is a key problem with pyproject.toml. It is actually not generic, the constraints you write in it are specific to how you build wheels for PyPI - they do not apply for other distros. This is not really documented anywhere, and widely misunderstood by distro packagers.

This is not a pyproject.toml (metadata) issue, but is because of build isolation. You can still have both here, just set your virtualenv up the way you did before and turn off build isolation with, e.g. python -m build -wnx.

4 Likes

This is a significant problem because there is no other way to communicate this information to distros. I would have thought that the dependency spec in pyproject.toml should represent compatible ranges such that if everything is built from source then the build should be expected to succeed.

Separately of course there is a need to be able to pin specific versions of dependencies for generating the specific wheels that go to PyPI. Ideally that should be specified in some other way though. Maybe that means that you need a separate pyproject.toml file for e.g. cibuildwheel or is it possible to configure these in a [tool.cibuildwheel] somehow?

If I understand correctly numpy uses version pinning because of ABI compatibility between PyPI wheels but there should really be a different way to say “this wheel requires this exact other binary wheel”.

Not well. Pip’s vendoring causes problems for Linux distributors who don’t like vendoring (for legitimate reasons), and debundling causes problems for pip.

It’s the best we can do, not a good approach to recommend.

If a lack of bootstrapping is a problem, would it be feasible to move build (and its dependencies) into the stdlib on the same terms as importlib.metadata and importlib.resources? I don’t know how painful these are from a maintainer’s perspective, but from a downstream developer’s perspective it’s very nice to be able to say “I need XYZ feature in importlib.metadata, so I will add the dependency importlib_metadata >= X; python_version < Y, and will drop it when python_requires >= Y”.

I get that it would be a sort of undoing of the distutils purge, but build and packaging seem at least as important as importlib.*, and that approach seems to work to balance the flexibility/stability tradeoff of stdlib vs PyPI.

2 Likes

Not at all. Everything that this tool would have to do is specified by PEPs, which gives us stability that distutils never had (specifically, distutils had to invoke platform-specific compilers, which are not covered by our own PEPs).

The challenge is persuading the maintainers of these libraries to give up their current flexibility. We’re somewhat failing that with importlib.resources at the moment, leading to more drastic API changes over time than some of us are comfortable with. But I think with a clearly specified API and willing maintainers, there’s nothing actually stopping this from happening.


Also, we’re onto a very new topic here. Whoever continues this path, please start a new thread.

4 Likes

I would not recommend pinning build dependencies in pyproject.toml for the same reasons that project.dependencies or setup.py install_requires should not be pinned nor bounded more than strictly necessary.

The difference when using pyproject.toml is that it enables build isolation by default. But you can disable it with options of pip (--no-build-isolation) and build (--no-isolation), so as to use the same deterministic build environment you were using by invoking setup.py directly.

So you can have pyproject.toml and determinism.

1 Like

With cibuildwheel the environment is always isolated but could you use something like CIBW_BEFORE_BUILD='pip install -r requirements-build-pinned.txt?

How would you pass this over to someone who does pip install foo and ends up installing from the sdist?

Is it not expected that anyone with the sdist should be able to reproduce the same versions of build requirements that were used to generate the PyPI wheels?

1 Like

They can, but it’s not expected to be the default.

The default is that they get a successful build of their own. To get a build with matching dependencies they’re going to have to find another reference (such as a doc page) that explains how to do it.

The difference is that for the default, they no longer need to find that page. Pre-pyproject.toml, it was impossible to build any package without looking up its specific docs.

1 Like

This is a valid use case, but my point was about the difference between building with a standard build front end and direct setup.py invocation. I wanted to highlight that both stand on the same foot with respect to build reproducibility.