Requires-Python upper limits

It looks to me like there’s both a pragmatic “what do we do given the current state of tools” argument (which is the one @henryiii is making), and a more principled “what’s the design we’d ideally like to have” argument. A few thoughts:

  • The current state of things clearly doesn’t work well. Editing the requires-python definition to forbid upper bounds is one way out. And that way is the least amount of work (this is important).
  • On the other hand, from a design perspective removing accurate metadata in order to let an installer tool download an sdist which that metadata indicated was not going to work and then have the build system error out (potentially after setting up an isolated build env which may download a lot more data first - for scientific projects this could be hundreds of MBs, and if you rely on the likes of TensorFlow or PyTorch potentially >1 GB), is a poor design choice long-term.
  • Python is just one of many dependencies. It’s treated differently on PyPI than other build-time or runtime requirements, however (a) there are tools that can install/control Python versions, and (b) sdists are not only for PyPI - they are also for conda-forge, Homebrew, Linux distros, etc. all of which treat Python the same as other dependencies.
  • Matthias’ proposal for editable metadata is probably the best long-term design (and goes together with @henryiii’s idea 2, “implement upper capping properly”). It’s way too much work to consider only for this python_requires issue, however it’s very valuable for adding caps to other dependencies long-term. For conda-forge this is possible, and many maintainers describe that capability as a life-saver.
  • The current description of requires-python is not “terribly worded”, it’s just the intuitive way of describing a dependency requirement, and it matches what I (and I assume many other maintainers) would assume if I wasn’t familiar with this discussion - supporting the PEP 508 specification language. We recently had a discussion on the Pip issue tracker about why build and runtime dependencies are treated so differently (the former cannot be overridden), and the conclusion there was also that there’s no good reason for that. The design reasoning is similar here; Python is not special enough that upper caps must be forbidden.

On the need for this:

  • On the list of real-world issues that we (scientific package maintainers) have with packaging, this doesn’t rank very high. If the outcome is that we go with erroring out in the build system, we can live with this for some years to come.
  • Our concerns are real though. We’ve always had this metadata info in all release notes “this release supports Python 3.8-3.10”, and users(/packagers) do not read release notes. Improving metadata quality and not downloading a lot of data before erroring out does matter.
  • As already pointed out by a few others, the “you don’t know if it will or won’t work, hence you should not cap” is extremely misguided as the blanket response to caps. Package authors should default to no caps in the vast majority of cases, but there are valid reasons to add caps on any dependency (as also laid out in @henryiii’s excellent recent blog post). For packages like NumPy and SciPy we are sure things will break with future Python versions, so a cap is valid. Note that we do think about this carefully - for example for NumPy 1.21.2, released before Python 3.10rc1, we already set the cap to <3.11 because we planned to upload wheels later on, after Python became ABI-stable and we had our wheel build infra updated.
  • It’s also worth pointing out that this is not just about NumPy and SciPy. The way sdists are treated in general by install tools isn’t great, which is causing other projects to not upload sdists at all. For example, take what are probably the three largest and most actively developed Python projects (several dozens to several hundreds of full-time engineers): TensorFlow, PyTorch and RAPIDS. The latter has given up on PyPI completely, and the former two do not upload sdists, because they are too problematic (failed installs highly likely) - which is a shame, because sdists have significant value for archival and code-flow-to-packagers reasons. This requires-python issue is not a main driver for not having those sdists, but it does show how problematic it is to try installing sdists that aren’t going to work.

On locking install tools:

  • Poetry and PDM clearly have usability issues here.
  • The Poetry/PDM behavior, and the resulting flow of packages to PyPI with unnecessary caps seems to drive most of the opposition to adding any caps at all. This is understandable, but it’d be much better to push those tools to stop doing that rather than to continue pushing back on all caps.

This isn’t true. The transition mechanism I had in mind for SciPy is to upload a new sdist for the last release which was missing the upper cap in requires-python to error out in setup.py with a clear error message. That’d be equivalent to what @henryiii is advocating for (modulo it doesn’t solve the immediate issue with locking solvers), and that then becomes irrelevant once the final design is implemented in install tools.

@henryiii is correct that this is a much more import wish/problem. It’s a little orthogonal though, as I hope my first points on pragmatism vs. good long-term design made clear.

@pf_moore that’d be great, and is Speculative: --only-binary by default? · Issue #9140 · pypa/pip · GitHub (your original proposal). It’s a significant amount of work, and it’s still not clear to me that it has enough buy-in from install tool maintainers (?). I already replied on the issue after you asked about potential funding:“If it looks like there will be buy in for this idea from the relevant maintainers/parties, I’d be happy to lead the obtain-funding part.”. Still happy to do that, and confident I can actually obtain that funding in 2022. I’m not quite prepared to do the significant amount of work of getting all the buy-in we need before arranging funding though, or to arrange funding and then don’t get it done because of lack of consensus. So this is a bit of a chicken-and-egg problem. Someone within the PyPA who understands what’s needed and has connections to the relevant parties would be better placed to do this initial alignment (if a smaller amount of funded time would help there, please let me know - that’s easier to arrange).

Second thought on this: it must be the default. Any user level opt-in switch (e.g. writing in your docs pip install --only-binary scipy) is useless, because users don’t read docs - and when you have O(20 million) users, that’ll be a lot of bug reports and wasted time.

Third thought on this and on capping in general: about half of all Python users are scientific / data science users now. These users are not developers - they are scientists and engineers first, and programming is a tool to do their actual job. Expecting them to figure out how to fix up their install commands after a new release of some dependency has broken their pip install some_pkg is a poor idea. The prevalent attitude to caps around here is “don’t add them, when it breaks just fix it”. This just plain doesn’t work for these users. And unfortunately these users do sometimes(/regularly) work in places with outdated (or non-Linux) HPC systems, and may therefore need to build from an sdist. So building from sdists therefore needs to be reliable - I wish we could rely on “only build from source if you’re an expert”, but we can’t.

9 Likes