Modernising my packages - am I thinking about this all wrong?

cameron · March 24, 2022, 12:48am

I’ve got quite a few personal packages, and they’re all in a monorepo. I’ve got a release script which at its end generates a setup.py and uses it to make an sdist and then uses twine to upload it. My script is quite old.

As when I first set up my stuff, I’ve just spent some days wading through the packaging docs, learning not enough and coming to the conclusion (just now) that I’m thinking about this all wrong.

My objectives: shift from setup.py to setup.cfg and/or pyproject.toml, and to stop running setup.py directly (which I gather is now discouraged - I’ve read some long articles about why that is so).

What I thought was happening was that I should be making setup.cfg and/or pyproject.toml files so that my package could be usable by modern tools i.e. that I was shipping the config files to end users or end user tools, and that pip install itself is running my setup.py at install time.

What I now think is that maybe that’s not the objective, but instead that I need to upload the required end products such as the sdist and possibly wheel files to PyPI. And that those end products contain the metadata and where it comes from at my end (eg pyproject.toml) is irrelevant.

This is (to me) completely unobvious from the packaging docs. Instead, I’ve spend a lot of time wading through the (several, apparently all optional) mechanisms.

So, my questions:

Is the functional approach:

choose a toolset (flit, poetry, something which can read a pyproject.toml?)
use it to generate the sdist and wheels?
then choose an upload tool (twine seems nice and is working for me now with sdists) and upload to PyPI?

I suspect that I’m a user in that annoying middle ground:

not someone to can be satisfied with a simplistic “get started” recipe that walks through a single static approach - I hate “magic”
not someone with deep knowledge of the apparently constantly in flux packaging landscape

What I feel the packaging site lacks is a core overview of the objectives and flow. Maybe I’ve missed it. For myself, what I would find useful is some short doc which outlines what “publishing a package” actually has to achieve - the essential concepts:

specify your package (metadata)
build the upload artifacts (sdist, wheels?)
upload the artifacts (eg using twine)

Now, I’m just making up the above list because it is not clear to me from the packaging docs, which are Very Many.

Is the above list sensible or useful?

Does it imply that I bring no benefit to myself or others by trying to make multiple config things (setup.py, setup.cfg, pyproject.toml) and that I can just pick one at my end (eg the TOML file) and a suitable tool which can read it and make artifacts? i.e. no end user sees these config files?

Is there a short page listing working tools and what config inputs they work with?

Cheers,
Cameron Simpson cs@cskk.id.au

abravalheri · March 24, 2022, 1:13am

Hi @cameron, thank you very much for getting the time to provide the feedback.
I am sorry to hear that you have been facing some problems with the existing documentation. Hopefully we can incrementally make it better!

I think you manage to capture a central aspect of the discussion:

If you manage to publish well-formed wheel files to PyPI, than most of your users will have a good experience (and yes, I believe that most of them will not look into the contents of the file…)

I also believe that is good to adopt a packaging approach that is compatible with PEP 517, i.e. anything that you can build using python -m build.

That said, achieving those two points is not that hard (unless you have a very personalised setup.py script that adds a lot of custom behaviour). Setuptools does support PEP 517, so you can benefit from it almost instantaneously and start uploading wheels to PyPI quite easily.

With time, you can also migrate to another configuration format if you so wish, and even start exploring other backends if that is important for you.

abravalheri · March 24, 2022, 1:21am

I don’t really know any page like that, but in general it works as the following:

setuptools: setup.py, setup.cfg, (and soon experimental support for pyproject.toml)
flit: pyproject.toml
poetry: pyproject.toml (but in a non-standard way)
hatch: pyproject.toml
pdm: pyproject.toml
whey: pyproject.toml
trampolim: pyproject.toml

… there might be many other backends but most of them should work with python -m build and pip
(some of the tools listed above may also have custom files or at least used to have).

Also note that when we talk about pyproject.toml there are 2 things to consider:

Having a pyproject.toml with at least the [build-system] table is pretty much mandatory these days so pip can install your project. The exception of this rule is setuptools which works as “fallback” if the setup.py file exists (but even in this case, it is recommended to add one).
Some tools adopt a standard way of specifying the configuration in the form of a [project] table inside pyproject.toml. However, it is not mandatory for tools to use this [project] table, so some of them may implement their own thing (e.g. poetry) or use a different file (e.g. setuptools). That said, even when tools adopt the [project] table it might be necessary to specify extra configuration parameters via tool-specific/non-standard tables.

CAM-Gerlach · March 24, 2022, 1:54am

This is indeed the big migration in Python packaging, yup, from dynamic, tool-specific packaging metadata to static, declarative metadata in standard, tool-independent formats.

The ideal is for users to not have to do this (unless they have specific needs and want to build from source), but rather to ensure you ship already built packages in a standard, tool-independent format (wheels) that don’t require executing a build step to install, and rather can just be unpacked and ready to use immediately.

What you describe, in fact, is the legacy approach, which was the case when Setuptools was the assumed build backend, and all distribution was only via sdists, which to a rough approximation are basically source archives with a (loosely) specified format and some extra metadata. All packages were assumed to have a setup.py, which was executed (e.g. by pip, or directly in legacy approaches) on the user’s machine to build the package, which was then installed.

Nowadays, that’s changed on both counts:

While you should always still ship a sdist just in case, the great majority of packages upload and users consume built wheels, where the build backend (Setuptools, Flit, Poetry, etc) runs at packaging time, and the resulting archive can be (more or less) directly unpacked into site-packages directory by any compatible installer tool (e.g. pip, installer), without needing to run any bespoke code or the project’s chosen build backend.
Moreover, for building project source or a sdist into a distribution package, the approach of just assuming all projects use setuptools and running a dynamic setup.py file, has been replaced with the standardized hooks specified by PEP 517, that allow any build frontend (pip, build, etc) to interface with any build backend (Setuptools, Flit, Poetry, etc), and for specifying the build-system and build-time dependencies, various assumptions and bespoke hacks with the [build-system] table in pyproject.toml, as originally specified in PEP 518.

Yes, you should always generate and upload both sdists and wheels, which is the default with build, especially if your packages don’t contain platform-specific C extensions (if so, cibuildwheel is very helpful to run in your CIs to generate wheels for the various platforms and Python versions automatically). Wheels are (generally, not always) smaller, faster and more reliable to install, standardized and avoid a lot of the complexity, pitfalls and dynamicism of sdists, while the latter are a useful fallback for platforms that your wheels don’t support and users who want to build from source.

Wheels (by design) help insulate your users from your chosen build backend, and PEP 517 gives you much more freedom with the build backends you use for both developers and users who build from source (either from your source tree, or a sdist). However, this certainly doesn’t mean your choices in terms of how to specify your metadata are “irrelevant”:

At minimum, you need to specify [build-system] in pyproject.toml so the tools you, other developers and users consuming sdists use to build, package and install your package know how to do so.
Specifying your metadata in a standard format (i.e. PEP 621 [project] table) allows you to more easily switch build backends, and other tools (dependency checkers, linters, project management, etc) to read, understand and write it.
Preferring static/declarative configuration (pyproject.toml, setup.cfg) over dynamic executable files (setup.py) has major advantages for security, reliability, integration and introspection
Invoking setup.py directly is bad, as it is deprecated, doesn’t take advantage of modern tooling and has many issues and caveats
The “old ways” are a pain for tool maintainers to maintain, and will eventually stop working

More specifically:

You choose a build backend (e.g. Setuptools, Flit, Poetry, etc), the tool that actually takes your source tree and configuration, standardized and bespoke, and turns that into a wheel that can be installed.
You then use a build frontend (e.g. build; some of the aforementioned packaging tools also include frontend functionality themselves) to invoke the build backend and tell it to build your project (Pip is also a build frontend, as well as an installer)
You then choose an upload tool (as you mention, twine is the standard tool, but like before some packaging tools include that functionality themselves)

That’s basically the list, the fourth step being actually installing the package (yourself to test, and your users for real). It may not be perfect, but that is what the Packaging Tutorial covers; after basic setup steps for the project, the top level headings closely mirror the aforementioned list:

Configuring metadata
Generating distribution archives
Uploading the distribution archives
Installing your newly uploaded package

If there’s things you found confusing, I’m sure more specific feedback would be helpful to improve it.

Not exactly, but there is this listing of major packaging-related projects and what they do.

barry · March 24, 2022, 5:26pm

This recommendation appears to break two important use cases: Linux distros like Debian/Ubuntu which require source builds, and enterprises consuming open source packages from PyPI.

I’ve been out of Debian/Ubuntu development for a few years, but at my $work (and I’m sure I’m not alone), our CI systems cannot access the internet (i.e. pypi.org). Therefore we have processes and tools to import PyPI packages into our internal repositories, after performing some static analysis checks for security and licensing. While we can and do import binary wheels – especially for libraries that have complex, difficult to reproduce build environments – these have to be exceptions. We prefer to import sdists, do the automated static analysis, and build the wheels internally for consistency, security, and reproducibility.

The move away from uploading buildable sdists to PyPI always worries me that we’re taking too much risk into our software supply chain.

pf_moore · March 24, 2022, 6:40pm

I don’t think there’s any suggestion (in general, that is) that projects shouldn’t provide sdists. Quite the opposite, sdists are important, and I’d expect all projects (at least, all open source projects…) to ship buildable sdists. But shipping wheels as well makes the installation experience smoother and easier for users on platforms where wheels are available, because the build process can be tricky, so letting end users avoid it is a win. This is especially true for code with a compiled part, but it also applies for pure Python projects.

barry · March 24, 2022, 6:46pm

Okay, good to know! I think we should be careful in our messaging so that the meme of wheels-only uploads doesn’t get baked in. Tool developers should ensure that they are preserving the default of uploading sdists. I’d argue that should be the case even where complex build requirements make it unfeasible for most consumers to build the wheel, because there will be consumers willing and able to invest in their internal infrastructure.

No argument there! I don’t mind that tools make it easy to also upload wheels, with our without compiled parts. I hope that library authors with compiled binary artifacts do continue to strive for external buildability, but I understand that’s not always easy. I’d still expect that these are rare, even if they’re highly important to the ecosystem.

brettcannon · March 24, 2022, 10:39pm

6 posts were split to a new topic: Should sdists include docs and tests?

mwichmann · March 24, 2022, 7:10pm

I think the years of experience of Linux distributions is a parallel
here: when producing for example an rpm, both an srpm and an installable
rpm are produced, and the release to the package repository consists of
both. Very few ever use the srpm, but it’s there and provides the proof
of preprodicibility; the vast majority of users consume the “binary”
package (which just like with Python wheels can be “noarch” - no binary
bits)

abravalheri · March 24, 2022, 7:12pm

Just to clarify my first comment:

If you manage to publish well-formed wheel files to PyPI, than most of your users will have a good experience

The intention here was not to promote disregarding sdists.

I think ideally all the packages should have both sdists and wheels published. Also I always include tests and docs in the sdist (big fan of setuptools-scm here).

CAM-Gerlach · March 24, 2022, 7:14pm

I realized this didn’t come across as I intended—it wasn’t meant to be a recommendation against shipping sdists on PyPI, but rather that ideally users consume wheels, while still providing sdists. In fact, several of my later points rely on the latter when I’m explaining why the project configuration format package authors use is still important, because it is still completely relevant when users install from sdists. In particular,

But I definitely realize how that other line could be interpreted to imply that you shouldn’t ship sdists, which was not what I meant. I’ll update it.

Coming from a package author/upstream perspective, I don’t have a problem with doing so, but what I’ve had a hard time understanding is why downstreams don’t just use the source tarballs, as they are the definitive source form of the project, whereas the sdist is nominally for user consumption. I can see why that is the case for special cases like @pf_moore where very restrictive corporate policies are in place, but not for Linux distro downstreams or other open source projects I have more inclination to spend my volunteer time supporting.

CAM-Gerlach · March 24, 2022, 7:21pm

(post deleted by author)

cameron · March 24, 2022, 9:51pm

By C.A.M. Gerlach via Discussions on Python.org at 24Mar2022 02:05:

[… detailed response …]

I just wanted to post to thank everyone here for the near immediate and
helpful replies. I’m running down the Tutorial, which seems more
explainatory than I’d thought, to clarify what to update in my
processes.

Thank you all,
Cameron Simpson cs@cskk.id.au

barry · March 24, 2022, 9:59pm

Thanks for the clarification @CAM-Gerlach !

cameron · March 28, 2022, 11:50pm

By Cameron Simpson via Discussions on Python.org at 24Mar2022 22:01:

I just wanted to post to thank everyone here for the near immediate and
helpful replies. I’m running down the Tutorial, which seems more
explainatory than I’d thought, to clarify what to update in my
processes.

Well, I spent a big chunk of the weekend on updating my release script.
The tutorial was, while providing a bit more context than I’d expected,
still a tool specific recipe for a simple setup.

I still feel that PyPA lacks a “package release overview” document which
outlines the steps involved and their purpose, in order to provide
context for the specific mentioned in places like the tutorial and a
mental framework on which to hang individual documents like the PEPs.

So I’ve written what I would like to have available, ideally listed just
above the reference to the tutorial:

https://github.com/cameron-simpson/css/blob/pypi/doc/pypa-the-missing-outline.md

Are you folks open to additng this, suitably revised for correctness?

The objective is the flow: what to do, and why.

Cheers,
Cameron Simpson cs@cskk.id.au

EpicWink · March 29, 2022, 5:39am

cameron:

So I’ve written what I would like to have available, ideally listed just
above the reference to the tutorial:
https://github.com/cameron-simpson/css/blob/pypi/doc/pypa-the-missing-outline.md

Some content suggestions:

Mention that you need setuptools >= 61.0 for pyproject.toml support (maybe a version for each of the backends?)
Include references to further reading for package data-files, extension modules, testing, CI
Rename “Upload Artifacts” to “Build Artifacts” (or similar): the former to me sounds like an action, not a thing
The default for build is to build an sdist, then use that to build a wheel. To me that sounds more resilient (passing --wheel builds the wheel directly from source, not testing the sdist)

How does this improve the packaging story over Packaging Python Projects — Python Packaging User Guide? That seems to provide all the steps you provide, while being more in-depth and including more steps. Is it that you think it’s too detailed? Is it because it has mainly setuptools-specific configuration and examples (I agree: now that setuptools supports pyproject.toml, it should be updated for generic backends)?

cameron · March 29, 2022, 8:52am

By Laurie O via Discussions on Python.org at 29Mar2022 05:49:

cameron:

So I’ve written what I would like to have available, ideally listed just
above the reference to the tutorial:
css/doc/pypa-the-missing-outline.md at pypi · cameron-simpson/css · GitHub

Some content suggestions:

Mention that you need setuptools >= 61.0 for pyproject.toml support (maybe a version for each of the backends?)

Bumped. I don’t know enough about other tools to recommend versions;
suggestions welcome (with terse reasons, maybe).

Include references to further reading for package data-files, extension modules, testing, CI

I’ll see what I can dig up. This is not meant to be filled with detail -
being swamped with detail in a bazillion separate documents was what led
me to the conclusion that there’s no concise overview, which this is
meant to address.

I came to this wanting to update my ancient setuptools
setup(lots-of-arguments) incantation to modern approaches. And found
myself bogged down in PEPs which were both specific and vague and all
the other documents. I spent hours in that swamp.

My problem was that there was no big picture: what is required, with
what pieces. And links to the specs for the pieces.

Rename “Upload Artifacts” to “Build Artifacts” (or similar): the former to me sounds like an action, not a thing

Ok.

The default for build is to build an sdist, then use that to build a wheel. To me that sounds more resilient (passing --wheel builds the wheel directly from source, not testing the sdist)

It is; I build that way myself. I wanted to talk about the source and
built distributions separately though, so the incantation is specific to
the topic. I figure someone setting this up will at the least see the
help text for “build”.

How does this improve the packaging story over Packaging Python
Projects — Python Packaging User
Guide?
That seems to provide all the steps you provide, while being more
in-depth and including more steps. Is it that you think it’s too
detailed? Is it because it has mainly setuptools-specific configuration
and examples (I agree: now that setuptools supports pyproject.toml, it
should be updated for generic backends)?

I had a run at the tutorial again after the original post on this topic,
and found it wanting (for me) as before. It does talk a bit about what
it is doing.

It’s ok to bootstrap a single package with a specific tool (setuptools)
in a specific opinionated layout. Which is great for the new packager
with no opinions who just wants their package out the door and on PyPI.

However, I’ve got opinions and my repo is not much like the example. To
make my setup use the modern approach I need to understand what all
the bits are for and where they sit. And that overview is not apparent
to me at the PyPA site.

So this document is supposed to:

be short - a layout of the flow to distribute something, describing
the pieces and their relationships
not be a tutorial; there’s a nice tutorial already
with some examples but not prescriptive except in the sense of
prescribing “you need to make some distribution files”

So:

it has a point list of the objectives, starting at the author and
arriving at the end user
it goes over each of those points in a little detail after the main
list, to make them clear
it has some references to the places where relevant things are
specified (PEPs, tools)

My remark above about the PEPs being both specific and vague? 517 and
518 were the ones that particularly gave me that impression. They are
written for people who already understand the larger picture and the
existing ecosystem. I can go to them to find specific information, but
they taught me little as a basis of “what do I need to do?”

This may sound like a litany of complaint, but my core issue is lack of
a concise overview. With an overview I know what needs doing, and what
things do. What the various bits of pyproject.toml are for.

'Soup: This is the one that Kawasaki sent out pictures, that looks so beautiful.
Yanagawa: Yes, everybody says it’s beautiful - but many problems!
'Soup: But you are not part of the design team, you’re just a test rider.
Yanagawa: Yes. I just complain.

Akira Yanagawa Sounds Off @ www.amasuperbike.com

Cheers,
Cameron Simpson cs@cskk.id.au

pf_moore · March 29, 2022, 10:07am

cameron:

So I’ve written what I would like to have available, ideally listed just
above the reference to the tutorial:
https://github.com/cameron-simpson/css/blob/pypi/doc/pypa-the-missing-outline.md
Are you folks open to additng this, suitably revised for correctness?

I like this! I think it’s a good overview, avoids getting into details (which as you say, is what you want if you’re just looking for a feel for “how everything works”) and covers the main points well. I’d love to see this as part of the packaging user guide. I’m not directly involved in maintaining that document myself, so it’s for others to ultimately approve it, but it definitely has my +1.

There’s a few terminology points where your wording seems slightly unusual to me (as someone used to the packaging ecosystem)^[1]. But nothing significant, and if this does get incorporated into the PUG, terminology can be tweaked as needed.

Nothing specific, just a general feeling that the document wasn’t written by a “native speaker of packaging” ↩︎

CAM-Gerlach · March 29, 2022, 9:09pm

Overall, it looks very helpful. I’ve noticed a handful of specific points, but haven’t gotten around to writing it up yet, sorry.

I also noticed some specific terminology points of confusion, specifically around being clear when you mean project vs import package vs. distribution package, which are all quite different things, as well as a few smaller nits, e.g. using “build backends” and “fields” vs “keys” vs. “options”, but that’s not too hard for us to help clean up. The PyPA glossary as well as the PEP 639 Terminology section might be of help here.

cameron · March 29, 2022, 10:02pm

By C.A.M. Gerlach via Discussions on Python.org at 29Mar2022 21:20:

pf_moore:

There’s a few terminology points where your wording seems slightly unusual to me (as someone used to the packaging ecosystem)[1]. But nothing significant, and if this does get incorporated into the PUG, terminology can be tweaked as needed.

I also noticed some specific terminology points of confusion,
specifically around being clear when you mean project vs import
package vs. distribution package, which are all quite different
things, as well as a few smaller nits, e.g. using “build backends” and
“fields” vs “keys” vs. “options”, but that’s not too hard for us to
help clean up. The PyPA
glossary as well as
the PEP 639 Terminology
section might be of
help here.

I’d welcome cleanup of these nits. Bearing in mind that the objective is
clarity of the overview; while I definitely do not want wrong
terminology in there, I do want limited verbiage.

Very happy for every technical term’s first use to be a hyperlink to
the glossary and/or specification document.

And as a personal style point, I like abbreviations lke “VCS” to always
be written with their “full name (abbrev)” on first use, eg “version
control system (VCS)” and then just the abbreviation thereon. Not that
there is much of that in the doc.

Cheers,
Cameron Simpson cs@cskk.id.au