Right. A package author could add some metadata that by default prevents building from sdist.
I think most of the times when the idea has come up, the concept has been to add a new field in the wheel metadata that points to a verifiable source archive download location, without having to post an sdist to the index server.
For example:
Source-Archive
: the primary download URL for a source archive that can be used to rebuild this artifact. The URL uses the same syntax as the URL portion of a direct URL reference in a version specifier. Unlike direct URL references (where hashes are optional),Source-Archive
references MUST include at least one hash value for verification purposes.
That way redistributors would still have a way to automatically retrieve the corresponding sources for a PyPI project, but installers wouldnât fall back to trying to build from source. Projects that are pure Python or just need a C compiler can keep pubishing sdists, projects with more complicated build dependencies could starting setting the new field in their wheel artifacts.
If you were to build an actual sdist
for such a project then it would also have that metadata field set, but presumably most projects that set the field wouldnât be publishing sdists to PyPI.
Back in my Debuntu developer days, there were advocates for doing away with sdists as the source of truth [1] for the code of a package, arguing that with the pervasiveness of git hosting services, you really just needed a URL+commitSHA.
I wasnât a fan of that idea then, and Iâm still leery of that approach because turning a repository commit into an sdist can sometimes be difficult to determine. You canât just f.e. look to the GitHub releases page, download the targz you find there and assume that its contents would be what youâd find in that packageâs sdist. And it might not be trivial to figure out what to run to turn that targz into an sdist, if you even could do it.
Thereâs also the philosophical question as to whether PyPI should be the canonical repository for sdists.
Maybe the Source-Archive
idea isnât mutually exclusive with non-buildable sdists, and maybe for the (presumably minority) of cases where this situation arises it might not even matter.
pun intended! âŠď¸
I was about to ask âWhy provide an sdist if you donât intend for it to be built automatically?â, but then realised you had already answered that: an sdist indicates that the default Python ecosystem build processes will work, while a mere source archive may be built with anything (e.g. maybe itâs actually a C/C++ project that happens to publish Python API bindings)
Still useful, but not mutually exclusive with discouraging fallbacks to sdist builds for a project on an index server.
The latter still doesnât feel like release metadata to me, though, it feels like a per-project index server setting with a few potential tiers:
- automatic: default state, sdist build is an automatic fallback if no wheel is available
- discouraged: sdist link is published normally, but with a new HTML attribute indicating automated fallbacks to source builds are not recommended. Clients aware of the new flag would allow opting out of discouraged source builds (or require opting in to them)
- manual: sdist link is hidden from the simple repository API even if an sdist is uploaded (there should either be a new index page defined that still shows everything, including hidden sdists, or else the hidden sdists should be present, just using a link format that installers wonât recognise as a valid sdist link by default)
Preventing unwanted attempts to build sdists is relevant
VCS are not as reliable and stable as sdist tar balls.
- GitHub tarballs are incompatible with dynamic version providers such as
setuptools-scm
. They either need a git clone or an sdist with aPKG-INFO
file. Even worse,setuptools-scm
needs enough of the git history to construct a version from the last tag. You often end up with a full clone instead of a shallow clone.setuptools-scm
even warns you:
Make sure youâre either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHubâs tarballs, a git checkout without the .git folder) donât contain the necessary metadata and will not work.
-
Git commits and tags are not immutable. Tags can be moved to a different commit and commits can be detached with force push. GitHub aggressively purges dangling commits. There have been cases where an OSS project has purposefully destroyed its entire git history to go close sourced. Or removed the entire project completely. For PyPI, there are ready-to-use solutions for mirroring.
-
GitHub is not permanent, immutable hosting provider. User accounts can be removed or deactivated for various reasons. Repositories are removed or their history rewritten. GitHub releases can go away. GitHub can be forced to remove a repo for legal reasons. Or decides to remove a repository because it contains malware.
Iâm worried about the fact that projects stop pushing sdists to PyPI. Source dists are useful for all sorts of purposes like rebuilding with different settings, patching bugs, debugging problems with platlib extensions, or simply reading the code. Donât get me started how problematic binary-only artifacts are for security.
Therefore Iâm big +1 for this effort. Once PyPI and pip have a way to signal opt-in of sdist build, then PyPI, build, twine, and other tools should nudge projects to upload sdists again.
Surely if the sdist fails to build, thatâs a pretty good signal?
Do we need to publish a build backend that says âthis project should not be built directly, visit its documentation for instructionsâ and fails, so that projects can âsignalâ this by referencing it from their pyproject.toml
?
(And if showing that on PyPI is important, add a classifier.)
In practical terms, no. Pip[1] has very generic reporting of build errors (because we have no idea what output the build backend might have produced) and a significant number of users typically report build errors to pip, ignoring the âthis is not a problem with pipâ message that we add
Having some sort of metadata would allow installers to report something explicit like âThis project does not support automatic building from sourceâ.
and probably uv, although I havenât checked âŠď¸
Right, but Iâm proposing the entire build output be a message saying where to get help, rather than an obscure âinclude not foundâ message in the middle of 10,000 lines of cc commands.
Youâll never entirely get rid of those reports. Best you can do is detect and automatically respond to the âthis is not a problem with pipâ message if(!) they copy-paste the logs in.
Clever, but doesnât this cause problems for the devs? I.e. wouldnât they need a way for the pyproject.toml
that they use locally (or which a CI system uses) to differ from the one that ends up in the sdist? Otherwise, wouldnât their own build attempt to use the intentionally-failing backend?
Maybe Pip should suppress the backend output by default, and in its own advice suggest re-running to get debug info that should go to the package maintainers?
Exactly. Saying âthis is not a problem with pipâ does not tell the user what they can do and the rest of the output with traceback inside traceback and mountains of build output is tedious to sift through even for someone who understands how it all works. Hereâs a simple example:
$ pip install python-flint==0.2.0
Collecting python-flint==0.2.0
Downloading python-flint-0.2.0.tar.gz (107 kB)
ââââââââââââââââââââââââââââââââââââââââ 107.3/107.3 kB 1.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
Ă python setup.py egg_info did not run successfully.
â exit code: 1
â°â> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-qhgq7e5y/python-flint_0ac5a8dffa404eb4a3cda8e947cfa68c/setup.py", line 7, in <module>
from numpy.distutils.system_info import default_include_dirs, default_lib_dirs
ModuleNotFoundError: No module named 'numpy.distutils'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Ă Encountered error while generating package metadata.
â°â> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[notice] A new release of pip is available: 24.0 -> 24.1
[notice] To update, run: python3.12 -m pip install --upgrade pip
No relevant information is provided in that output for the user to know what they should do.
Projects should be able to opt-out of pip building sdists by default and at the same time provide some information so that pip can give a useful error message. The error message could be like:
$ pip install python-flint ...
python-flint 0.6.0 does not provide binary wheels for your platform. Wheels are provided for
- cp312-win_amd64
- cp311-win_amd64
- cp312-manylinux_x86_64
- âŚ
Your platform is cp312t-linux_x86_64.
NOTE: You are attempting to install python-flint into a free-threading build of CPython but python-flint 0.6.0 does not provide any binaries for a free-threading build. It is possible that python-flint 0.6.0 is not compatible with free-threading or it might just be that python-flint does not yet provided binaries for this platform.
The source distribution for python-flint 0.6.0 indicates that pip should not attempt to build from source by default because external requirements are needed for the build:
Toolchains needed:
- C (std=C11) compiler toolchain
External libraries needed:
- GMP (C library)
- MPFR (C library)
- FLINT >= 3.0.0 (C library)
If you believe you have the external requirements then you can ask pip to attempt to build from source by running
$ pip install --no-binary python-flint python-flint
For more information about installing python-flint see:
https://python-flint.example.org/docs/install.html
ERROR: Unable to install python-flint.
The output should end with a link to information provided by the project being installed. The project documentation can explain why wheels are provided for some platforms and not others and what someone would need to do to be able to build from source if they want to.
It should also be possible for pip to provide some helpful information automatically like âpython-flint has no wheels for CPython 3.13â or âpython-flint has no wheels for PyPyâ etc.
This could all be achieved today, with one small tweak:
If you believe you have the external requirements then you can ask pip to attempt to build from source by running
$ export PYTHON_FLINT_BUILD_SDIST=1 $ pip install --no-binary python-flint python-flint
Then whatever build backend/script is being used just has to look for the environment variable to decide whether to print this nice message or not.
Users will not mind this at all (other than the non-portable export
syntax). So while it would be nice if the entire ecosystem (pip, PyPI, and build backends) added support for it, itâs entirely possible to just do it.
Then whatever build backend/script is being used just has to look for the environment variable to decide whether to print this nice message or not.
It should not be necessary to set an environment variable when building normally e.g. for local development or for downstream distro packagers etc. Anyone who manually downloads the sdist and asks pip to install from it should get the build-from-source automatically without needing to set an environment variable. Likewise other build frontends like python -m build
should work without setting an environment variable.
The problem is the fact that the common end user invocation pip install python-flint
should not default to attempting to build nontrivial projects. The error message from pip is all about build failure when it should really be about explaining to the user that there is no wheel for their platform.
I can see that, but I can also see that a pyproject.toml
is a signal that an sdist is meant to be built. Then the issues are not failing quickly enough, and not failing clearly enough, both of which again can be handled by the build backend checking for what it needs before it tries to build.
I still think the best approach here is going to be to get the semantics of pipâs default right (my best idea: âonly consider an sdist when there are no wheels at all for the selected releaseâ). That doesnât require any direct cooperation between pip and the original package, and can be implemented unilaterally by pip at any time (when someone has the time available).
It can also be ignored by build
, and if you want different behaviours between two spec-compliant tools, then you donât want to be updating the spec.
I can see that, but I can also see that a
pyproject.toml
is a signal that an sdist is meant to be built.
The sdist is always meant to be built. That is its primary purpose!
Your suggestion for the environment variable amounts to releasing source code that deliberately fails to build by default which is really a very odd thing to do.
The problem here is that pip is primarily intended to be an installer rather than a builder and the vast majority of users use it for installing rather than building. It falls back on building when attempting an install but does not have the capability to satisfy the external build and runtime requirements that are needed for many of the projects that users commonly want to install. This fallback behaviour is not appropriate for an installer when the build has a high chance of failing.
It can also be ignored by
build
, and if you want different behaviours between two spec-compliant tools, then you donât want to be updating the spec.
Iâm not sure that any spec actually needs to be updated for this although PEP 725 would be useful for error messages and could potentially provide a basis for pip to refuse building by default.
Can the âpip donât build meâ flag not just go in trove classifiers or something?
I still think the best approach here is going to be to get the semantics of pipâs default right (my best idea: âonly consider an sdist when there are no wheels at all for the selected releaseâ).
I agree. I presume that the reason this hasnât happened yet is because it is not a backwards compatible change for pip and so this would mean pip taking responsibility for much downstream breakage.
An opt-in per-project flag limits the scope of breakage and allows the project being installed to take responsibility. The project is in a better position to judge whether pipâs current attempt to build from sdist is most likely not a good idea. Many projects already have judged this and do not supply sdists because that is currently the only way for them to prevent pip trying to build which is not good.
I presume that the reason this hasnât happened yet is because it is not a backwards compatible change for pip and so this would mean pip taking responsibility for much downstream breakage.
Precisely this, yes. If you want to follow pipâs progress on this, you should look at Speculative: --only-binary by default? ¡ Issue #9140 ¡ pypa/pip ¡ GitHub (which includes discussion of per-project opt-in and many other ideas).
But thereâs also the fact that weâre now in a world where âthe installer behaviourâ doesnât just mean âpipâ. Iâm more than happy to improve pipâs UI, and thatâs something that should be discussed on the pip tracker, not here. But if people want uniform behaviour, or an approach that package maintainers can rely on, then we need a standard that all installers (pip and uv at least, currently) are expected to follow.
I think itâs probably important to separate out:
- Discussion of extra metadata to allow projects to signal âdonât auto-build wheels from the sdist when asked to installâ. Thatâs the core topic of this thread.
- A possible common UI that weâd like âall installersâ to have. Thatâs a nice discussion to have, but ultimately itâs not likely to go anywhere in the short term, as we donât really have an enforceable concept of âtool UI standardsâ right now.
- Discussions on actual tool UI design, which should be happening on the pip and uv trackers. In practice, these will probably be more effective than (2), as uv tends to follow pipâs UI[1].
but not always the other way round âŠď¸
I can also see that a
pyproject.toml
is a signal that an sdist is meant to be built.
I donât really see it that way, no (assuming you mean âbuilt automatically by Pip because wheel installation wasnât possibleâ). A big part of the point is to minimize reliance on setup.py
to the things where itâs actually necessary. Maintainers who want to avoid users building locally should still be able to benefit from giving project metadata in TOML format instead of in keyword arguments to a setup
call, and having a clean way to document what their build backend is (even if they locally just invoke it explicitly instead of expecting the standard tooling to look it up). Explicit is better than implicit.
While this sounds perfectly reasonable to me as a forever âshellâ cli user, do recall that (some) people provision their environment through helper tools inside their IDE where they may just be clicking on things to initiate an install. Of course PyCharm and VS Code will find a way to support whatever direction things go - eventually - but itâs just one more bit of burden.
A big part of the point is to minimize reliance on
setup.py
to the things where itâs actually necessary.
It was to remove the assumption that setup.py
would be the build script, such that pip (essentially) could support backends other than setuptools
. It was entirely motivated by frontends (a concept invented by PEP 517), not just to make it easier for maintainers to avoid setup.py
.
The critical point is that you can build your own project without a pyproject.toml
.[1] Itâs not inherently part of the development process - itâs inherently part of the install process (including âIâm building a wheel now because I plan to install it laterâ).
do recall that (some) people provision their environment through helper tools inside their IDE where they may just be clicking on things to initiate an install.
I recall. I maintained one of these for years, including developing the actual functionality youâre referring to, and handling all the issues raised by users that were fundamentally unrelated to the IDE but belonged to the underlying package (and as bad as open source users can be, enterprises who genuinely do have a multi-million support contract with your employer can also be less than fun to deal with).
Provided your build backend doesnât keep its own data in there, which is in no way a requirement for them, and was not the original intent. âŠď¸
Right, but Iâm proposing the entire build output be a message saying where to get help, rather than an obscure âinclude not foundâ message in the middle of 10,000 lines of cc commands.
A proposal (i.e., a PR ) to change pip so that a failed sdist build simply said âUnable to build a wheel for xxx - use -v to see the build outputâ would certainly be possible. Whether it would be accepted, I donât know. Too many people use pip as a build tool for me to imagine that this wouldnât be a huge compatibility break.
But if you (or anyone else) think it would be a useful improvement, go for it. Output/error reporting is something we know pip can do better on.