Request for feedback: packaging.p.o discussion on helping downstream packaging

Just a short note: I’ve been working this week on preparing a new discussion to packaging.python.org that aims to help upstream Python package maintainers understand downstream packaging better and suggest how they may choose to help it.

I’d appreciate all the feedback, and especially suggestions from other downstreams who may be facing different issues than my Gentoo (and a short Conda-forge) experience tells about. TIA.

The pull request: Start a discussion on helping downstream packaging by mgorny · Pull Request #1791 · pypa/packaging.python.org · GitHub

5 Likes

Looks good, although I’ve only skimmed the document so far.

One thought, more for discussion than as a specific suggestion - would it be possible to give some idea of priorities for the various suggestions? Speaking as a package maintainer, not all of these seem like they would be something I’d necessarily find easy, and having an idea which to focus on would be helpful. It would also emphasise that this isn’t an “all or nothing” proposition, and anything that upstream maintainers can do to help is useful, even if it’s only a small part of the job.

1 Like

Thanks for your suggestions. I guess it would make sense to try to make priorities cleaner. Perhaps some kind of “summary” section that repeats the most important points.

Deciding what to prioritize is going to be harder. My first thought would be to prioritize being able to build without Internet access and to use system dependencies (when possible), since these will otherwise require non-trivial downstream fixes. But that also implies that we’re willing to write patches to implement that. (Perhaps that’s another thing I should mention explicitly.)

Having complete source distributions is a nice thing long-term, though right now we can reasonably workaround it by using GitHub archives. Though of course all these workarounds add to a huge pile of technical debt. I mean, the moment GitHub archives suddenly change contents, we’re swamped with work to update everything (though at least Gentoo mirrors would lighten the immediate impact).

As for testing, it’s something we generally deal with for better or worse. Similarly with backporting stuff when releases are suboptimal.

Could you tell me which of the points you find hard?

The ones that struck me were building release artifacts from the sdist and being able to build without internet access. For most of my smaller projects, the release process is simply build . That’s simple to do, and easy to incorporate into a publishing workflow. I have little interest in anything more complex, because maintaining CI workflows is of no real interest to me - it’s a chore I must do to publish my project, but not something I want to spend time on.

Apparently, build builds the sdist and then builds the wheel from the sdist, so I guess I’m covered on that one. But I didn’t know that, and more importantly it’s something that build implements, not me. I don’t know if uv build does the same, and it’s not something I’d factor into a decision over whether to swich.

But building without internet access actually is problematic. Both build and uv will download the build backend automatically, and I expect that. I don’t test whether I can build offline, and honestly I’m not sure how I would - put my build backend in a local directory and use something like --no-index --find-links when building? Manually create a build environment and do a non-isolated build? I have little or no interest in dealing with the complexity of either of these.

As I already said, I like the document. I’m not trying to say that any of the points made are invalid. I just think it’s important to acknowledge that they may not be important to the package maintainer. After all, the maintainer probably didn’t ask for their project to be repackaged - they may be quite happy with users simply using pip install to get it. And for smaller projects, any work to help repackagers could be a significant extra maintenance cost (even if it’s just implementing best practices, many “best practices” are actually quite complex, and can be daunting to a maintainer with little experience in Python packaging).

Maybe in addition to this document, the sections of the packaging user guide relating to building and publishing your package should be updated to recommend practices that support repackagers? If that turns out to be too complex for new users, then maybe we should look at our tooling and processes, rather than suggesting (no matter how gently) that maintainers take on the complexity? The example of build, which implements the “build from sdist” pattern, is a good one here - if all of the tools implement that pattern, there’s no need for that whole item in the document (although you might want a preliminary section that says “use modern, recommended practices from this guide, as they by default do a lot of the work of making life easier for repackagers”).

The ones that struck me were building release artifacts from the sdist […]

Ah, yes, changing the release workflow to fully use sdist is not going to be trivial. Long term, this is something I think we could implement in cibuildwheel — i.e. have an option to use sdist rather than the original git repository, but it’s not critical.

[…] and being able to build without internet access.

Ok, there seems to be some misunderstanding here. What I meant is that the build backend must work without Internet access — it’s fine for frontend to install the dependencies as usual.

Perhaps I need to clarify that somehow. This isn’t something people normally need to worry about — it’s rather for projects that alter their build process and explicitly start doing stuff like fetching vendored dependencies or calling npm install ... or likewise from inside their backend scripts.

I don’t use cibuildwheel - all my projects are pure Python. So that wouldn’t help me at all.

OK, that seems less problematic. Although I don’t think it’s a constraint that’s likely to be well received. People only do complicated things like this in their build process if they have very specialist needs[1], and they’ve probably already looked at the alternatives and decided that this is the best approach for them. Again, remember that most people don’t actually want to do any of this packaging stuff - they simply want to publish a useful tool. And their goal is almost certainly pip install package, and not getting their package included in a distribution.


  1. And honestly, I’d say that anyone doing this probably isn’t in the target audience for the packaging guide in the first place. ↩︎

1 Like

That was just a general idea. I’m not aware of any commonly used workflow for pure Python packages — I myself also just copy over the same template from my other projects. In the end, it’s just an option to implement — perhaps someone will come up with a nice reusable workflow.

The whole point of this discussion is to raise the problem and try to convince people that there are ways to support both use cases. I’m not doing this because I want to tell people what to do — I’m doing it because some people have voiced that they would like to support repackaging better, but they don’t know how. And to have a single source of truth that we could point people to when they’re asking questions.

2 Likes

I’m very supportive of this, and I think the document is a good start in that direction. I’m simply cautioning against making it seem too much like “requirements” rather than “information”.

But I’m +1 on the document, even with the reservations I’ve expressed here.

1 Like

I’ve just pushed an update. I hope it makes it clearer that I’m talking about things like vendored dependencies, and not about stuff fetched by the frontend. I’ve also weakened the part about CI workflows, and indicated that build does that out of the box.

2 Likes

I mostly agree with all that is written there, with these additions:

  1. concerning versions for required packages (be it in requirements.txt or elsewhere): don’t use < in released versions! I understand that it is sometimes necessarily to limit version temporarily until the compatibility bug is resolved, because too many CIs pull from the master branch, but it should be always a temporary promise to make fix before releasing new version.
  2. _no-internet-access-in-builds … yes, but often it is enough to mark test cases requiring network access with some kind of mark (e.g., @pytest.mark.network in Add a network pytest mark for tests that use the network by mcepl · Pull Request #1403 · mvantellingen/python-zeep · GitHub, there are ways how to do the similar with plain unittest.skipIf and os.environ). Some test cases just really need network access.
1 Like

I think this is a contentious point. I mean, I understand that when your primary distribution channel is PyPI and the tools user use to install your packages don’t involve any testing, then < dependencies are the only way to proactively prevent breakage.

Sure, sometimes pins are unnecessarily tight. Some projects use SemVer but have such a wide API that most of the “breaking” changes don’t really break the particular reverse dependency. These are all valid points.

However, downstreams are in better position not to pin dependencies, and in the end we can just use sed — so I don’t think it’s fair to force upstreams to do something potentially harmful to PyPI users.

Sure. In Gentoo we have a handful of ebuilds that specifically permit Internet access during testing. However, that’s the exception rather than a rule.

2 Likes

Could we put it in but frame it as doing so makes things harder [1] as opposed to you shouldn’t do it?


  1. maybe briefly mention that most system packaging ecosystems have no equivalent of venv or version locking so tight dependency constraints generally have to be ignored or patched over ↩︎

1 Like

Just FYI, openSUSE (and I think Fedora and Debian as well, not sure) have no such option and all builds (after pulling in all dependencies) are always 100% offline.

Ok, then I’m sorry, but I don’t really know what you’re asking of me here.

Could we do that in a followup iteration? I don’t really want to block the existing version from getting merged on additions.

1 Like

I would say putting an upper limit on dependency versions without a known breakage is more harmful to PyPI users than not. There’s an excellent article on the topic here: Should You Use Upper Bound Version Constraints? -

1 Like

Hmm, that’s an argument I didn’t expect to see. Though it’s a double-edged blade. One addition I could make is that not pinning something causes trouble to downstreams, particularly when:

  1. Package X doesn’t pin its dependency on Y.
  2. Package Y breaks API and X isn’t released for some time.
  3. Packages depending on X start adding transitive pins on Y to workaround the problem.
2 Likes

This is a very useful documentation page, thanks for writing it!

I’d recommend not rehashing the “upper bounds yes or no”, since (a) it’s been debated over and over in many places, and (b) it’s not directly relevant to the main issues for distros. I think there’s rough concensus on “don’t add upper bounds unless you have a very good reason to do so”. Anything more opinionated, like “never add any upper bounds because they’re always harmful” are impossible to agree on.

One thing that does help distro packagers here is that if one adds an upper bound, document in code comments what the reason for it is, and whether it’s expected to be safe for distro packagers to loosen or remove the bound.

2 Likes

Just suggest marking network-requiring tests as such (see the linked PR for one example of what I mean).

I would say “don’t think that you have actually fixed the bug, because it just a workaround, which is hopefully just a temporary one”.