tools like Poetry and uv, which have chosen not to follow the standard.
So far as I know poetry does follow the standard, uv is unique in choosing not to.
poetry used to insert an upper bound by default, but has not done this for some time. Again so far as I know - there is nowadays no reason to expect that non-expert users would accidentally include an upper bound on their Requires-Python.
I don’t know that anything much more is needed than
Some authoritative documentation / convincing blog post explaining expectations around how this field should be used
Persuade uv to follow the rules
Then allow the ecosystem to self-correct: users who encounter problems with unwanted upper bounds simply report those problems (linking the authoritative documentation) and/or submit pull requests.
Thanks for that clarification, and apologies to the Poetry team for perpetuating the idea that they still did this.
We have a problem because we can’t change historical metadata, so anything that was done in the past remains to haunt us, but it’s good to know we’re not continuing to make the issue worse.
I think that it would be good if pip defaulted to --only-binary but also it is really a separate issue. There can be many reasons for not providing binaries for any given platform such as file size limits on PyPI, build times in CI, availability of platforms in CI (e.g. intel Macs are being dropped now) and so on. For many users running pip install foo the presence or absence of a suitable wheel is going to be a good way of deciding which version to choose but we need this metadata in other situations as well.
Checking which wheels are provided is not at all the same question as “which Python versions and implementations is this release compatible with when building from source”. This latter question is what matters everywhere else apart from when doing pip install foo to get the wheel from PyPI. Any downstream packagers in conda, linux distros etc don’t care about the wheels at all and want to know which Python versions it could be reasonable to build the package for.
In python-flint there is requires-python = ">= 3.11" but I have just checked and the main branch builds and runs fine with CPython 3.8 (if you manually edit requires-python). It likely also works with older CPythons but 3.8 is the oldest that uv is offering me right now so I haven’t tested anything earlier. The minimum Python version is set for 3.11 because I want pip to backtrack on 3.10 and get an older python-flint wheel rather than trying to build from the sdist (building would almost certainly fail for anyone who does not know what they are doing). This means that requires-python is inaccurate and this issue would be solved if --no-binary was pip’s default.
The reason python-flint works with older CPython versions is that there just is not much need to break compatibility with older versions (in python-flint’s case). There is however a very regular need to make changes to fix issues with new versions of the dependencies.
The Cython 3.0 release broke the build of all previous versions of python-flint. Those previous versions had no cython < 3 constraint and so all will now fail to build unless someone manually constrains the Cython version and the cryptic build failure error messages give no indication that constraining the Cython version is what they should do nor tell them what version to constrain to. Then the exact same thing happened with Cython 3.1. With Cython 3.2 (not yet released) I can already see that there will be some test failures but at least the build is not yet broken.
Another core dependency is FLINT whose versions 3.0, 3.1, 3.2 and 3.3 have all broken the build of python-flint. Each time only small changes are needed in python-flint but it always means that all previous python-flint versions are broken if building with any newer version of its dependencies than what was tested at the time of python-flint’s own release.
I would say that CPython is probably the most well behaved of python-flint’s dependencies and sometimes it is possible that there can be a new minor release of CPython that does not completely break the build of python-flint. My general expectation though is that some changes are needed for each release of CPython much like each release of FLINT and Cython. The big one for CPython was 3.12 which removed distutils and hence NumPy removed numpy.distutils which is what python-flint was using. That was a huge break and the fix (migrating to meson) was not at all trivial.
The conclusion is that python-flint now has both upper and lower version constraints on all dependencies (except Python) with an option to bypass version checks at build time. The requires-python lower bound is completely wrong but is needed because pip does not use --only-binary by default. There is no require-python upper bound but that is also wrong because I have no expectation that future CPython versions will be okay without any changes in python-flint itself. There is also nothing that constrains different Python implementations and I can say that PyPy works but I have no idea about GraalPython for example.
The fact is that there are upper bound version constraints. You don’t know what they are when releasing except to say “this is what we tested” but sometimes that is quite close to “this is what works”. The difference between pure Python packages and packages with C extensions is that with a C compiler it only takes one thing to make the build fail and then from a user perspective the packages is completely unusable. For pure Python packages it is still the case that new Python versions break things but the breakage is just less noticeable and I’m not sure that is actually better.
I think it’s very important to understand what a universal solver is: both uv’s high level interface and poetry are universal solvers. That means they solve for a range of Pythons (given by project.requires-python), and they have to resolve for every possible Python version. So if they see a single library with <4, they require the user to also specify <4, regardless of where the author falls on this issue. That’s because the input metadata is tied to the universal solver resolution, when in fact, they are two different values, it just happens they are both Python ranges. To the best of my knowledge, this is still true, though because uv ignores upper bounds it’s not an issue for uv, and if the lower bound doesn’t match (which is also an issue!), uv does have a way to override it by specifying “environments” that each contain a reduced range (and recentish versions auto split on some things). It looks like Poetry allows you to separate them now, which helps a lot! Just try >=3.9 and require numpy to test; see if it back solves an old source version on 3.14. Universal solvers are surprisingly complex, I’d recommend fully understanding them!
The second problem is clarifying what package authors expect of an upper bound. If you set <3.14 on pycompile, what do you expect to have happen? It’s not going to magically cause it to work on 3.14, so it is can only be an error message that you are after. You almost never want this to affect the solve! But it’s solver metadata, the whole point of this field is to affect the solve, not to produce an error message. So if you’ve released a version in the past without a limit, or with a higher limit (like `<4), then most solvers will (correctly, according to solver algorithms) back solve and give you the older version, which will both circumvent the original intent of a better error message, as well as generally cause worse error messages, unnecessary compiles, waste time, etc. If you’ve always had upper bounds (remember you can’t edit metadata after the release!), then instead you might get very close to the original intent of a better error message, though in my experience most tools seem to say something like “there’s no compatible package ”, though that could be improved on the tooling side to be better at explaining why. But this requires historically accurate metadata. It might be possible for projects that never support a new version, like numpy and numba if they had started like this, but it can’t be fixed after the fact, which makes it pretty useless. If you yank every version before the first one with accurate upper bounds, I think this mostly works today. Again, though, this is only changing the error message and maybe elevating the failure to compile time vs. runtime. Though simply placing a check in your build system for this does pretty much the same thing in most cases. In scikit-build-core, you can do:
In setuptools, you can do a version check in setup.py and fail there. Etc.
Now I should mention that no one asking for this wants universal solvers to backtrack on this, which is what they would do. You’d have to specify a special meaning here; it’s not the theoretically correct answer to the solve. I believe this means you’d have to throw out a lot of the existing solver theory and implement custom algorithms to handle this “monotonic” as a special case, or maybe you could just manipulate the metadata based on the most recent one to apply this upper bound across all previous versions. And inevitably you’d have some maintainer complaining here that they want the old behavior because they did it “right” and they actually want the back solve for some reason.
For what it’s worth, I think the back solve is problematic for package resolution too. If I get reports that numpy 2 is not supported right after it is released, then make a quick patch release requring numpy<2 to fix the problem while I work on a solution, revolvers may back solve the previous version of my library that didn’t have the limit if someone else requires numpy>=2 instead of reporting an error.
I think we’d have to clearly have a recommended standard behavior, and it would have to include considerations for universal solvers, which we currently don’t have. And any new field would have to be aware of the two maintainer expectations: guaranteed compatibly and “I’ve tested with this” compatibility (which also include “I’ve set upper bounds on everything but I don’t actually know what will really fail”). Things like locking solvers (and using your package with lots of others) break if you constrain too tightly. But things like applications tend to work pretty well with upper limits, because they are more isolated. Some sort of new “known working versions” field might help with this. I think un-editable metadata like requires-python and dependencies should only record known failures, not “hasn’t been tested on this yet” limits.
Also, the focus of this thread was to try to make something useable out of requires-python upper limits, currently they can’t be used, packages that set them partway through their lifespan enviably have to roll them back (see numba, numpy). From my experience, uv has been much better than Poetry at resolving when there’s an upper limit involved.
Currently upper bounds don’t work, packages that implement them have to roll them back (at least if they have enough users to notice!), and different tools treat them differently. So I think some of the options to “fix” it would be possible to use.
For example, let’s say that the recommendation was that solvers ignore the upper bound, and that installers error if an upper bound fails. Then packages like numpy could start using the upper bounds, but older versions where there weren’t upper bounds or even older versions where there were upper bounds or even older versions where there were not upper bounds would be fine, since this is not allowed to affect the solve.
Or if a newer version’s upper bounds applies to all older packages. This is a really bad case for solvers, because introducing a new version, even if you don’t use it, can completely change the solve. But it too could be done, since the previous versions upper caps no longer matter once there is an upper cap.
Not fully up to speed on the entirety of this thread, but If you’re referring to @mikeshardmind ‘s suggestion from above, I’d imagine the prior upper cap for older packages would still be respected as long as it’s lower than the new upper cap.
So say we have foo
In 1.0.0 it has an unspecified upper bound
In 2.0.0 it has an upper bound <3.9
In 3.0.0 it has an poor upper bound of <4
In 4.0.0 it has an upper bound of <3.11
The logic would be that, if a newer package has a lower upper bound than an older package, than that older package implicitly also has that same new upper bound. However, if an older package has a upper bound that is lower than the newer packages upper bound, it retains it’s old upper bound.
So, following that logic, the implicit upper bounds for the different versions of foo would be:
In 1.0.0: <3.9
In 2.0.0: <3.9
In 3.0.0: <3.11
In 4.0.0: <3.11
In other words, it shouldn’t completely change the solves if the packager is doing something reasonable when specifying incremental changes in upper bounds. That said, the solver has more work to do now in managing implicit states for each package/version.
Not sure how that actually fits with the remainder of this issue, but just thought to clarify as I had the same thoughts as @mikeshardmind for that particular point.
Thanks for the detailed explanataion, although I’ll be honest, I’m still not 100% sure I understand the problem.
From what you’re saying, it sounds more like your issue is with building from sdist, as that won’t be possible for most users. With --only-binary, you could control what users can use just by using wheel tags. Is that correct?
IMO, we need better control over when sdists can be used - not just --only-binary, but something that the project can set. Because pure Python code that’s only published as a sdist is usable anywhere, and blocking it with the same flag that says that the user can’t compile C code is a problem. That’s a separate discussion, though.
Similarly, the fact that installers backtrack to ancient versions of a project just to find one that doesn’t have a Python version limit isn’t a problem with version limits, it’s a problem with unconstrained backtracking. Maybe what we need is some sort of project-level metadata[1] which allows a project to specify what project versions installers are allowed to backtrack to? That puts control back into the hands of the project.
I’m not sure there’s a consistent view of what a “universal solver” is across all tools[2]. And there’s been very little input into the standards process (outside of the lockfile PEP, where their input was explicitly requested) from the developers of universal solvers, so it’s hard to have a good view of what they need.
I don’t think that’s necessarily true - it’s a UI decision to pick the Python versions to solve for like that, surely?
Yes, if there’s a library somewhere in the dependency tree that requires Python < 4, then it’s impossible to find a solve for Python 4.1. But that’s just what version specifiers mean - you can’t magically produce a solve when a component doesn’t work (or at least claims not to work) for a particular Python version. It’s just as true that a constraint of Python > 2 stops the resolver finding a solve for Python 2.
I have no idea what you mean by this. The input metadata constrains what versions can be solved for. The versions the solve is calculated for is presumably whatever the user asks for. Yes, if the user asks for too broad a range, there’s a UI question of how the solver tells the user what’s wrong (“Solve failed” is a much worse error than “Solve cannot include Python 3.6, constraint python_version>=3.7 was added automatically”, but the former is far easier to implement ). But it’s all UI, rather than being about interoperability standards.
I agree that the way universal solvers work means that upper version limits like < 4 are “viral”, in the sense that once they have been encountered, they propogate through the solve all the way to the final result. But again, that’s just bad data having unwanted effects - ignoring the bad data, or wishing it away, doesn’t actually fix anything.
What’s “pycompile” here? If I say I want to build an environment (or lockfile) for Python < 3.14, then yes, I absolutely do want it to affect the solve. I want the latest version that works with the Python versions I’m trying to support. Now “< 3.14” is too broad, because esentially nothing supports Python 2.6 (or Python 1.4!) these days, so there is no universal solve for that request.
Maybe the point here is that when requesting a universal solve, users shouldn’t be using specifiers, because by default they are too broad. Maybe a universal solve should take as input a Python version range, or an explicit list of versions. Something like "3.10 - 3.13", or "3.9, 3.11, 3.12". This goes back to your point “they are two different values” - but I think the problem comes from using specifiers to state the target versions for a solve, not from how specifiers are used as constraints in metadata. That’s a UI question, but when we’re being asked to update standards for something which can be fixed with a UI change, I think it’s fair to push back.
Correct. And my “supported version metadata” suggestion above was intended to do basically this without needing to manually yank every release.
You say it’s “only” changing the error message. Why “only”? I thought that was all we wanted anyway - what else could it do, if not just give the user a better error message, rather than an obscure compile failure?
This is probably the thing I find most difficult to understand about this discussion. If that works, then (a) why aren’t people doing that, and (b) why are we having this discussion at all?
I don’t know if you failed to quote some context here, but what are you referring to with “monotonic as a special case”? And I’m not aware of what “solver theory” you’re thinking of. The way pip solves for an install isn’t that complex or sensitive (the mechanisms behind what it does are complex, but the basic idea is easy enough). Maybe you’re still talking about universal solves here, but no-one has really explained yet what the problem is for them, and what I have heard makes it seem more like a UI issue than a fundamental algorithmic difficulty.
I also don’t want to over-focus on universal solvers here. The overwhelming majority of package installs are installing into a single, well-defined target environment, or at best creating an environment-specific lockfile that will be used to install into a precise environment configuration. While we have to consider the needs of universal lockfiles, we should optimise the UX for the single environment case, as that’s what matters most here.
At this point, I think the discussion is getting too abstract. Can I suggest that someone (maybe you, maybe someone else) propose a concrete change, and explain why it’s needed[3] and how it will help? Not quite a PEP (I don’t think we have enough clarity yet for a PEP) but at least a clear proposal. Otherwise the discussion will continue to get nowhere.
My experience with universal solvers is limited, coming mostly from what was brought up in the lockfile discussions ↩︎
In particular, can we have some worked examples of failures - not just “numba had problems”, but exactly what problems, and why those problems were caused by correct use of the existing features. And not problems that result from pip not using --only-binary, or from inaccurate metadata in older packages - we know about those issues, and they aren’t specific to version limits. ↩︎
Possibly just because there is a more obvious and attractive looking[1] alternative in requires-python? I would be interested in hearing from anyone who can’t do as Henry suggests.
(TBH, I’m not actually sure where I stand on this topic. I think overall I agree with the proposal even though I disagree with a lot of the claims it makes, saying that upper limits don’t achieve their intention or are to be blamed for backtracking or environment solvers picking unexpected versions due to some mismatch in arbitrary < '4.0'/< '3.next' guesses about future compatibility[2].)
This is already the case with other metadata and future uploads. Importantly, in this case, this only removes compatibility that was already missing, so the change to solves is likely a fix (hey, we realized python 3.16 (hypothetical version nobody should actually know breaks them right now) breaks us, older versions are more than likely going to be broken too).
If someone needs a solve not to change, they should be producing a lock file for consistent use afterward. Other things can change solves than this, including yanking, deletions, or new wheels rather than just sdist, with different requirements (because of static linking no longer requiring something, may result in a different solve).
I can put the idea I expressed before about it being a temporal concern into actual specification language, and cover implications for both universal solvers and normal “I’m just installing into one well-defined environment” this weekend if you think this is something we can actually move towards solving if there’s a specific plan on the table to be discussed. I wasn’t flippantly suggesting it, but I also am not sure there’s enough alignment in what different people expect here based on other comments. It seems that people expect things that cannot ever be true right now, like deterministic solving without locking, when solver algorithms are not specified.
Treating upper bounds on python requires as implicitly applying to past versions to treat it as a temporal concern should reduce the work solvers have to do. This being required to be strictly monotonically increasing (as I originally put forward) isn’t necessary, but there’s more work if it isn’t something solvers are at least allowed to choose to assume, and it only applies in the absence of an upper cap on older versions (which is what @seakroswent into that could improve some, but not all, behaviors for universal solvers)
I think right now if I were to put this into specification language, I would say that solvers are allowed to use their choice of those two behaviors, and that for consistency, package authors should avoid when possible intentionally supporting a lower cap in new versions of their library (non-monotonically increasing upper bound on Requires-Python).
I don’t have a strong opinion on your specific proposal (beyond appreciating the fact that it is a concrete proposal) but like you say, I don’t know how much chance there is of getting consensus. People’s expectations and desired outcomes seem a bit too vague at the moment to judge what will be supported.
I should probably split my request into two parts - the first is for one or more concrete proposals like this, and the second is for someone to give some clear examples of actual problem cases that aren’t caused either by the fact that --only-binary isn’t the default, or by the fact that old (and incorrect) metadata is immutable. To be clear, I think your proposal is actually targeting cases caused by old metadata being immutable - so while it’s still interesting, it feels to me like a tactical fix for one aspect of the “immutable metadata” issue, rather than a long term improvement to the ecosystem’s handling of version limits.
I think some way to amend metadata (append only, further restriction on solves only?) would remove most of the problems that currently exist here, and yes, my proposal is essentially a workaround for this not existing already, though I consider it somewhat superior to requiring people go back and ammend the metadata of 1000s of prior releases if a python version has a breaking change they are effected by if we can agree that treating this temporaly makes sense.
Package level ranges of some sort would be less work for authors while allowing it be expressed by authors directly rather than using implicit assumptions like this, but even those might not be easy to express without actual per-version ammendments. An example of the case not covered by the assumption, I wonder how many packages actually have something like:
version 27: supports python 3.8-3.13 (inclusive)
version 28: supports python 3.8-3.12 (inclusive)
where for some reason version 28 is the one broken by something in 3.13, while 27 was fine.
Yes, although it should be only about what happens by default. The option to pip install foo and build from sdist is definitely used but it is much better to have users who want that opt in rather than expecting everyone else to opt out. In other words --only-binary is the better default at least for some packages.
Yes, and actually it is not just a case of control but at least just storing the information for future use and repackaging. The fact that the release is known to require CPython >= 3.11 and has only been tested up to 3.14 is useful information that should be in the metadata separately from whether it is known that any particular later Python version is breaking. Even just if the build fails it would be useful output from pip to see “this version of foo was only tested with CPython 3.14 and so may not be compatible with FuturePython 3.25”. Imagine someone trying to build a current version of something but in 10 years time and how are they supposed to have any idea what is a reasonable version of things like Python to use if there are no version caps?
It is not just about Python versions though but also implementations, platforms e.g. “not compatible with PyPy or Windows on ARM”. Look at the oldest-supported-numpy linked by Antoine above. Those constraints are mostly just describing available wheels for different platforms but I can imagine that similar constraints would exist for whether building from sdist is compatible as well i.e. some changes were actually needed to make NumPy compatible with each different OS, architecture, implementation etc. There is perhaps some NumPy version that is the first to be compatible with Windows on ARM. Perhaps also in future there will be a NumPy version that is the last to be compatible with say Intel Macs and the support code for that platform will be dropped meaning that it won’t even build from sdist any more. Then actually pip backtracking to ancient versions and building an sdist might be the right thing to do for someone on an older Mac in say 10 years time.
For many projects the implicit packaging assumption of compatibility for untested combinations of Python versions, implementations, architectures and so on does not typically hold but there isn’t a good way to communicate what the compatible combinations are even just for informational purposes.
For the first part, Oscar already pointed out that –only-binary is a separate issue. For the funding part, since you brought that up here, let me clarify the status: we did iterate on a detailed plan and effort/cost estimate for implementation + rollout + communication & responding to issues. However, after getting that to a good place, the conversation changed to “build consensus within the packaging community, …., if we find that there’s simply no way we can make this change, then that will have a fundamental impact on the rest of the funded work“.
That’s a lot harder to find funding for, and even harder to staff - you can’t just go find a solid packaging engineer with good communication skills and an interest in the topic. Most likely it’d get stuck, since getting anywhere near consensus on this forum on something with a significant backwards compatibility impact is essentially impossible. So while I’d still like to see this happen, it’s honestly not very high on my list of packaging topics to try to get funding or engineering time for (that could change of course if some form of consensus or a better idea emerges).
I assume you mean here the backwards compatibility impact of pip making --no-binary the default for all packages. There is an alternative approach which is to make it so that projects could add some metadata that installers could use to see that for this particular project it is better not to try to build without an opt-in:
[project]
dont-build-me-by-default = true
Then the backwards compatibility question is handled by each project. You would only need to get consensus within NumPy to add the flag and could consider at a project level whether it is worth breaking anyone who currently uses pip install numpy to build from source. Projects that currently depend on pip et al building from sdist for other reasons could continue to do so with no compatibility break.
That depends how big the changes are to achieve an improvement over the status quo. For example, that metadata could participate in the solve by default in the future (to illustrate, not a concrete proposal).
Wouldn’t pip then try to fetch and build an older version that doesn’t have that flag?
The problem for both this and Requires-Python is that there is no project-level metadata, so projects cannot “go back in time” and set metadata that applies to older versions as well.
A new flag could be defined with some instructions for how installers should handle it and those could say that by default installers backtracking from the given version should assume that the flag applies to previous versions as well if unspecified.
I think classifiers are too imprecise. Compare the oldest-supported-numpy constraints with this:
More precise classifiers could be defined but I don’t think it is a good idea to try to retrofit a precise specification onto something that has historically not been precise.