Python packaging documentation feedback and discussion

pf_moore · March 14, 2023, 11:03am

That would be an extremely good PEP, following on from PEP 621. No-one has proposed it yet, though. I don’t have a feel for how controversial it would be - I’ve not investigated enough to know for sure if whatever differences currently exist between backends are shallow, or if they are fundamental disagreements.

oscarbenjamin · March 14, 2023, 12:09pm

It isn’t easy to glean this from the docs for the different projects (another reason why it’s problematic that the guide says to go read their docs).

The setuptools way of specifying files is too complicated and likely should only be preserved for backwards compatibility. Otherwise hatch probably has the clearest documentation but altogether the rules seem most complex. The flit, poetry and pdm docs for this are quite short which leaves me wondering if that’s because the rules are very simple or just incompletely documented. It seems that there are differences around defaults and whether the tools assume src-layout. They all seem to exclude files based on .gitignore except pdm which doesn’t mention this. I can’t tell whether any of them is completely documented. Probably someone needs to be an expert to know what the differences are.

The complexity in understanding how this works in each case mainly comes from these tools trying to be too convenient by default. If the various defaults and implicit rules were things that had to be spelled explicitly like ignore_files = [".gitignore"] then it would be easier both to document and understand how this works.

Relevant docs for hatch:

For poetry:

For PDM:
https://pdm.fming.dev/latest/pyproject/build/#include-and-exclude-files

For flit:
https://flit.pypa.io/en/stable/pyproject_toml.html#sdist-section

For setuptools:
https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#setuptools-specific-configuration

CAM-Gerlach · March 14, 2023, 2:37pm

FYI, just to make sure everyone is aware, @ambv helpfully split this off from Structure of the Python Strategy Discussions into a separate thread for us. Thanks @ambv !

lwasser · March 14, 2023, 3:25pm

Could someone please link to that thread by chance? there are so many here many thanks @ambv !!

ambv · March 14, 2023, 3:27pm

The link to the original topic is already below your first post here. For ease of finding it:

pf_moore · March 13, 2023, 5:57pm

I agree 100%. I’m about as far from a “packaging newcomer” as it’s possible to get, and yet I still regularly look for advice on how to set up a new project. Where do I go? Who is doing feature comparisons between packaging tools (the tools themselves aren’t!)? Where can I get views on more advanced things like how to set up automated releases of my project? Not necessarily simple “follow these steps and you’re done”, but “this tool works better if you prefer/need X, and this tool if you prefer/need Y”.

I don’t know about raw beginners (most raw beginners I’ve dealt with consider packaging their code for publication to be a distant dream, not a consideration for right now ) but it’s the intermediate users, with bolted-together but out of date and difficult to maintain workflows, who want “something better” but who may not even know what “better” could mean, that I think we are letting down. People who’ve used setuptools forever, but have heard that there are newer, better, tools and want to try them. But who can’t find any way (short of trying out all the alternatives, or spending days reading docs) of working out what’s best for them.

That’s a great summary. It should be in the docs somewhere!

I’m tempted to mention the Diátaxis framework here, but I suspect others know more about it than I do. But in the terms of that framework, you seem to be talking about tutorials, whereas I’m more concerned that we don’t have enough in the way of how-to guides and explanations (reference is where the tool-specific docs come in). “How to choose a build backend”, “how to set up automated documentation for your project”, “how to automatically deploy yourt project”, “an explanation of task runners, and how they should be used”. That sort of thing.

I’ve just skimmed the Scikit-HEP docs, and there’s a bunch of stuff like that in there. Could that be included in the PUG? If not, then it would be informative to understand why not.

lwasser · March 14, 2023, 3:41pm

thank you so much!

steve.dower · March 14, 2023, 4:32pm

FWIW, I (pymsbuild) would probably just ignore any standard here, unless it happened to align perfectly with how mine currently works. I’m not even interested in contributing to developing a standard, frankly. Everyone should get to design their UX here, or else there should actually be only a single backend with all the functionality and all the rest should cease to exist.

(I know that sounds dramatic, but specifying files to include and their particular metadata/behaviour is the core UX of defining a package. I’m not going to work on a developer tool where I have no control over the UX.)

jeanas · March 14, 2023, 5:02pm

Is the distinction between different build backends about the UX or about the functionality?

I don’t know about MSbuild, but backends like meson-python, scikit-build or SIP have fundamental differences in what they allow your to build. The UX for basic stuff from the pure-Python side is mostly orthogonal to that, AFAICS. I for one would appreciate standardization on that point.

lwasser · March 14, 2023, 5:20pm

my two cents:
for most users they just need to know whether the back end can handle custom build steps / compilation steps (or not). But also the front end that they chose needs to be able to talk to that back end that handles custom steps.

Pure python packages they can use any back end. Those users don’t need to know too much about back ends if they don’t want to know about back ends

case in point - PDM can work with meson-python as a back end. (there was a bug that was fixed but it works now). that allows you to then build a package that requires more complex steps (many packages are moving to meson-python in the scientific core space). There are some distinctions in each backend in terms of what it includes by default in the sdist (eg: setuptools needs a manifest.in file or else it includes EVERYTHING in your repo. I learned this the hard way). but you can normally customize what is included in the dist in modern packaging tool builds (flit, pdm, etc) with a pyproject.toml entry

steve.dower · March 14, 2023, 5:35pm

I choose based on both. Sometimes I use flit, sometimes I use setuptools, sometimes I use pymsbuild. For certain projects, they all provide sufficient functionality, but the UX differences are worth taking into account.

For some projects, the functionality really justifies choosing one over another - for example, if a native dependency is built with MSBuild or dotnet build, then I can use pymsbuild to reference that project and know that it’ll build the correct configuration with correct dependency tracking, because it links it all together using the same build tool (MSBuild).

For basic, pure-Python apps that just zip up everything in the repo, yeah it’s possible to standardise things. I think I’ve written maybe 2 of those in the last five years

But if we standardise that, we’re setting a really low bar for “you’ll need something non-standard if you do X”. Most of my teams at work that I support would need something non-standard - and most would need different non-standards.

At that point, you don’t have a standard anymore: you have a distraction. So if we were to recommend something for this case, I’d want it to be really clearly labelled as a suggestion for the most simple cases, and not as a “standard” or recommendation for users generally. And I’m not sure that’s useful, which is why I’d rather have a range of suggestions with rationale and target audience for each one.

rgommers · March 14, 2023, 6:55pm

Same here - fine to align where possible between backends with similar designs, but it’s not spec/standardization-worthy. And this particular idea won’t work at all for meson-python. The design uses .gitignore and .gitattributes, but also understands install targets (install: true in meson.build files). The design in hatchling would be duplicate and quite cumbersome; listing files in pyproject.toml doesn’t work for us.

pf_moore · March 14, 2023, 7:08pm

Cool - sounds like that’s my answer, then, file discovery is, and should remain, backend-specific, controlled by the backend’s own config in [tool.backend]. I certainly have no desire to argue otherwise (even though I do personally wish that backends were more consistent in the “simple” case of a pure-Python project).

henryiii · March 14, 2023, 7:14pm

With the exception of the “simple Python packaging” page, which is pretty much was upstreamed to the current tutorial, and does contain selections for backends, most of it pretty much assumes a specific tool: nox for task runners, pytest for testing, mypy for type checking, Ruff for linting, black for formatting, build for building, GHA for CI, cibuildwheel for binaries, GitHub for git hosting, pre-commit for running linters. I have been looking for somewhere to move the pages (along with the cookie cutter & WebAssembly repo review tool) that’s not HEP specific and more obviously general, and briefly suggested the PyPA, but was told it would likely not be a good idea in the PyPA, as it it became too official it would become hard to keep it focused. People don’t like it when their favorite package is not in the “official recommendations”, which is what at least packaging.python.org is seen as.

I do think we should have tutorial pages, guide pages, etc (Diátaxis). Unfortunately, people seem to obsess over the tutorial page, and some of the other existing guide pages have gotten pretty badly out of date. I’d really love to see everything completely describe PEP 621, rather than the current mix of updated pages and old pages. I think that’s a much better use of time than repeating the six months of arguments (without looking them up!) about the current tabbed box. And, I’d forgotten, the current tutorial says

You can choose from a number of backends; this tutorial uses Hatchling by default, but it will work identically with setuptools, Flit, PDM, and others that support the [project] table for metadata.

I think that’s perfect - it’s making a selection (Hatchling), it’s mentioning the ones it will visually show in a second via tabs, and it says others should work too. I’d probably only switch “will” for “should”, since some more obscure PEP 621 backends have not yet taken the time to make sure they work for a simple example like this out of the box - some of them don’t support src layout without configuration, for example. This recommendation might be why we aren’t seeing confusing in the issue tracker. And we do see people with other issues, so it’s not like people abandoned this as a useless resource when this was updated to PEP 621 & tabs!

I’d love to see a separate “guide” page covering things like this, with friendly comparisons of tools. But I think the reason the current tutorial page works so well is it doesn’t distract too much with the choices - it tells you to do the tutorial uses hatchling, briefly mentions there are others, and forces to interact (by clicking) if you want to visually see exactly what it would look like to use a different backend.

In my mind, I’d think this would be a [source] section. If present, a tool would have to at least include the files it specifes; without it a tool could use whatever mechanism it wanted. ignorefile = ".gitignore", include=[...], and exclude=[...], and base="src", for example, would probably cover a lot of simple to moderately complex packages. I expect there would be some significant challenges, though - some tools likely wouldn’t want to have to be able to parse a .gitignore file (and other ignore styles might be included?), and some might be convinced it’s better to talk to git than use an ignore file (it’s not! ).

FYI, meson-python relies on meson to do the file placement; it would probably opt out of supporting this, but it could probably support it via a hybrid system. Scikit-build-core already does a hybrid - you can mix CMake installed files (usually/recommended to be the binary parts) and Python packages. Setuptools uses MANIFEST.in still, hatching uses .gitignore, flit-core uses manual specification (unless it’s being triggered from Flit, in which case it uses git commands, I’m not a fan of the inconsistency), IIRC pdm uses git commands I think. Because of this tutorial, the main tools support automatic src discovery.

fungi · March 14, 2023, 7:31pm

PBR’s pbr.build as a pyproject build-system.build-backend asks Git
what files to include on top of specific generated/untracked files
configured for the project, which is almost the same as “using
.gitignore” I guess but not quite. Also it operates as a SetupTools
plugin, so it’s really telling SetupTools what goes into the sdist
more than anything. But basically the idea is that it builds things
you don’t want checked into revision control yet want included in
your release tarballs, particularly things built from Git metadata
like tags, notes, authors, change history, and similar stuff that is
“in Git” but not necessarily in files contained in the worktree for
any branch.

BrenBarn · March 14, 2023, 8:31pm

I’m going to pull in some quotes from another thread because I think they’re relevant here:

That comment was made in reference to whether PyPA should make “official” recommendations, but I think there are overlapping considerations between that and this thread. For me personally (and I think for others looking for more opinionated references), the point of an “official” recommendation is it would be so official it would go into the docs.

I agree with this, but to me the question is, if those users can use any back end, then in documentation that is tutorial/howto style, why is it necessary to tell them anything but “use this one, click here for information about others if you’re interested”? For instance in the PyPA guide the very first sentence is:

This tutorial walks you through how to package a simple Python project.

If that is the goal of that tutorial, that tutorial only needs to tell people one recommended way to do that task. We don’t need to tell people how to do other tasks in that particular tutorial. (I think we do need docs explaining how to choose a build backend for more complex tasks, but we don’t need to burden the entry-level tutorial with that.)

But let’s remember you’re an expert! It’s true we need documentation that can help people become experts, but I think what’s more urgently needed is documentation for non-experts.

There’s one thing that I strongly believe, but that I think is getting a bit lost in the discussion of various build backends: documentation for Python can and should privilege the use case of creating a Python package. For me, that also includes pure-Python packages that have dependencies on non-pure-Python packages. As long as the dependencies can be satisfied by auto-getting pre-built binaries^[1], that to me falls into the category of “simple thing that we should have an end-to-end walkthrough for”.

Although many people in this discussion are intimately familiar with the complexity required to build non-Python dependencies, I think that does not represent the broader userspace. There are orders of magnitude more packages that depend on, say, Numpy or Qt while being pure Python themselves, than there are packages like Numpy or Qt that directly build non-Python parts as part of their build process. I think we should recognize this and have a tutorial that clearly states its limitations (i.e., it does not handle direct non-Python deps), and that tutorial can give explicit examples using a single backend.

From the other thread again:

I may be misunderstanding this, but let me ask again: if there are multiple backends that all the work the same. . . why do we need them all?

whether those come from PyPI or anaconda or conda-forge ↩︎

steve.dower · March 14, 2023, 8:47pm

Well, we don’t need multiple that work the same. But I don’t think any actually work identically, certainly different enough that the authors felt motivated to create one.

I don’t actually think any are close enough to start asking them to stop. They all have a reason to exist.

abravalheri · March 14, 2023, 9:20pm

Sorry, I could not help but comment about these simplifications, which are very rough and do not provide a good overview of how setuptools work (other than helping to spread the misconception that it needs a MANIFEST.in).

I think we should refrain from incentivising some of these myths. Instead the message I would like to get across is the following:

Setuptools can without any additional configuration. By default it will include a pre-defined set of files. In a basic project, that should be enough to get you started.
Most of the times, maintainers will suggest deriving the information from the VCS system (e.g. via setuptools-scm or pbr).
- In this aspect setuptools is not far from hatch/hatchling, flit and others. The difference is that instead of being monolitic, setuptools decided to offload the “file finding” to plugins, extracting code that previously lived inside the project into other projects and incentivising/contributing to new ones.
Some people don’t like to use VCS and have complex use-cases for which the defaults are not enough. Those can use MANIFEST.in.
- I expect this to be the smallest portion of the user base.

More information can be found in Controlling files in the distribution - setuptools 69.0.3.post20231214 documentation.

If we are just talking about things in the realm of the possibilities, setuptools can also be very simple to configure . If you don’t have dependencies touch pyproject.toml will get you started, and then you can incrementally add on top of that.

lwasser · March 14, 2023, 10:40pm

hi friends - i just wanted to chime in to say that if the work that we’ve been doing with pyOpenSci can be helpful to this bigger picture of what PyPA wants to present just let us / me know. My only intention here is to help beginners (and scientists) navigate the ecosystem given our peer review process. That is the focus of our work right now. there are so many tool options!

I also want to say that i appreciate all of the work around tools such as setuptools, etc, the packaging tutorials and content on PyPI and the incredible wealth of technical information in scikit-hep! so any comment I make relates to an outcome of a tool that surprised me (i use setuptools but it surprised me!). Or perhaps a technical knowledge pre-requisite that some materials require (which is ok for a chunk of our shared audience!) you all do incredible work here for the open source ecosystem.

i think collectively we all share an end goal of wanting the best for packaging users at all technical levels! and at the end of the day it’s HARD to write content for users at varying levels and often better to focus on intended audiences.

i want to help achieve this shared goal!

oscarbenjamin · March 14, 2023, 11:24pm

How does it currently work?

I think I’ve heard you say before that one of the ways that these tools get into difficulty is by trying to guess what the user wants rather than just making everything explicit. My impression after briefly reviewing the docs for a few of these tools is precisely that: I would rather a simple explicit spec to say “include this, exclude that, don’t make any guesses”.

I don’t think anyone proposed the hatchling design in particular.

Does it not work to list most files in pyproject.toml?

Of course build-generated files need something else to specify them but even in SciPy the majority of files going into the sdist/wheel are just .py files waiting to be zipped up.

Maybe these UX differences are worth taking into account for someone who knows a lot about packaging and spends a lot of time packaging things. For someone who does not want to spend lots of time on packaging (despite bizarrely being on the equivalent of this mailing list for over 10 years!) my preference is to minimise the amount of time that I spend thinking about this and just get the job done. If the UX has some rough edges but the way to do everything is clearly documented and easy enough to understand and maintain then that is fine for me.

The different tools may have a “reason to exist” for some people who want to use them. That’s not the same as saying that because they exist I should waste time learning all of them just to figure out a reasonable way to do simple things (at a level where they are all comparable in capability).