Provide a way to signal an sdist isn't meant to be built?

barry · June 27, 2024, 2:33am

Right. A package author could add some metadata that by default prevents building from sdist.

ncoghlan · June 27, 2024, 4:20am

I think most of the times when the idea has come up, the concept has been to add a new field in the wheel metadata that points to a verifiable source archive download location, without having to post an sdist to the index server.

For example:

Source-Archive: the primary download URL for a source archive that can be used to rebuild this artifact. The URL uses the same syntax as the URL portion of a direct URL reference in a version specifier. Unlike direct URL references (where hashes are optional), Source-Archive references MUST include at least one hash value for verification purposes.

That way redistributors would still have a way to automatically retrieve the corresponding sources for a PyPI project, but installers wouldn’t fall back to trying to build from source. Projects that are pure Python or just need a C compiler can keep pubishing sdists, projects with more complicated build dependencies could starting setting the new field in their wheel artifacts.

If you were to build an actual sdist for such a project then it would also have that metadata field set, but presumably most projects that set the field wouldn’t be publishing sdists to PyPI.

barry · June 27, 2024, 9:22pm

Back in my Debuntu developer days, there were advocates for doing away with sdists as the source of truth ^[1] for the code of a package, arguing that with the pervasiveness of git hosting services, you really just needed a URL+commitSHA.

I wasn’t a fan of that idea then, and I’m still leery of that approach because turning a repository commit into an sdist can sometimes be difficult to determine. You can’t just f.e. look to the GitHub releases page, download the targz you find there and assume that its contents would be what you’d find in that package’s sdist. And it might not be trivial to figure out what to run to turn that targz into an sdist, if you even could do it.

There’s also the philosophical question as to whether PyPI should be the canonical repository for sdists.

Maybe the Source-Archive idea isn’t mutually exclusive with non-buildable sdists, and maybe for the (presumably minority) of cases where this situation arises it might not even matter.

pun intended! ↩︎

ncoghlan · June 27, 2024, 11:38pm

I was about to ask “Why provide an sdist if you don’t intend for it to be built automatically?”, but then realised you had already answered that: an sdist indicates that the default Python ecosystem build processes will work, while a mere source archive may be built with anything (e.g. maybe it’s actually a C/C++ project that happens to publish Python API bindings)

Still useful, but not mutually exclusive with discouraging fallbacks to sdist builds for a project on an index server.

The latter still doesn’t feel like release metadata to me, though, it feels like a per-project index server setting with a few potential tiers:

automatic: default state, sdist build is an automatic fallback if no wheel is available
discouraged: sdist link is published normally, but with a new HTML attribute indicating automated fallbacks to source builds are not recommended. Clients aware of the new flag would allow opting out of discouraged source builds (or require opting in to them)
manual: sdist link is hidden from the simple repository API even if an sdist is uploaded (there should either be a new index page defined that still shows everything, including hidden sdists, or else the hidden sdists should be present, just using a link format that installers won’t recognise as a valid sdist link by default)

dstufft · June 27, 2024, 11:51pm

Preventing unwanted attempts to build sdists is relevant

tiran · June 28, 2024, 6:48am

VCS are not as reliable and stable as sdist tar balls.

GitHub tarballs are incompatible with dynamic version providers such as setuptools-scm. They either need a git clone or an sdist with a PKG-INFO file. Even worse, setuptools-scm needs enough of the git history to construct a version from the last tag. You often end up with a full clone instead of a shallow clone. setuptools-scm even warns you:

Make sure you’re either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub’s tarballs, a git checkout without the .git folder) don’t contain the necessary metadata and will not work.

Git commits and tags are not immutable. Tags can be moved to a different commit and commits can be detached with force push. GitHub aggressively purges dangling commits. There have been cases where an OSS project has purposefully destroyed its entire git history to go close sourced. Or removed the entire project completely. For PyPI, there are ready-to-use solutions for mirroring.
GitHub is not permanent, immutable hosting provider. User accounts can be removed or deactivated for various reasons. Repositories are removed or their history rewritten. GitHub releases can go away. GitHub can be forced to remove a repo for legal reasons. Or decides to remove a repository because it contains malware.

I’m worried about the fact that projects stop pushing sdists to PyPI. Source dists are useful for all sorts of purposes like rebuilding with different settings, patching bugs, debugging problems with platlib extensions, or simply reading the code. Don’t get me started how problematic binary-only artifacts are for security.

Therefore I’m big +1 for this effort. Once PyPI and pip have a way to signal opt-in of sdist build, then PyPI, build, twine, and other tools should nudge projects to upload sdists again.

steve.dower · June 28, 2024, 8:09am

Surely if the sdist fails to build, that’s a pretty good signal?

Do we need to publish a build backend that says “this project should not be built directly, visit its documentation for instructions” and fails, so that projects can “signal” this by referencing it from their pyproject.toml?

(And if showing that on PyPI is important, add a classifier.)

pf_moore · June 28, 2024, 9:21am

In practical terms, no. Pip^[1] has very generic reporting of build errors (because we have no idea what output the build backend might have produced) and a significant number of users typically report build errors to pip, ignoring the “this is not a problem with pip” message that we add

Having some sort of metadata would allow installers to report something explicit like “This project does not support automatic building from source”.

and probably uv, although I haven’t checked ↩︎

steve.dower · June 28, 2024, 9:25am

Right, but I’m proposing the entire build output be a message saying where to get help, rather than an obscure “include not found” message in the middle of 10,000 lines of cc commands.

You’ll never entirely get rid of those reports. Best you can do is detect and automatically respond to the “this is not a problem with pip” message if(!) they copy-paste the logs in.

kknechtel · June 28, 2024, 11:20am

Clever, but doesn’t this cause problems for the devs? I.e. wouldn’t they need a way for the pyproject.toml that they use locally (or which a CI system uses) to differ from the one that ends up in the sdist? Otherwise, wouldn’t their own build attempt to use the intentionally-failing backend?

Maybe Pip should suppress the backend output by default, and in its own advice suggest re-running to get debug info that should go to the package maintainers?

oscarbenjamin · June 28, 2024, 11:20am

Exactly. Saying “this is not a problem with pip” does not tell the user what they can do and the rest of the output with traceback inside traceback and mountains of build output is tedious to sift through even for someone who understands how it all works. Here’s a simple example:

$ pip install python-flint==0.2.0
Collecting python-flint==0.2.0
  Downloading python-flint-0.2.0.tar.gz (107 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 107.3/107.3 kB 1.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-qhgq7e5y/python-flint_0ac5a8dffa404eb4a3cda8e947cfa68c/setup.py", line 7, in <module>
          from numpy.distutils.system_info import default_include_dirs, default_lib_dirs
      ModuleNotFoundError: No module named 'numpy.distutils'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

[notice] A new release of pip is available: 24.0 -> 24.1
[notice] To update, run: python3.12 -m pip install --upgrade pip

No relevant information is provided in that output for the user to know what they should do.

Projects should be able to opt-out of pip building sdists by default and at the same time provide some information so that pip can give a useful error message. The error message could be like:

$ pip install python-flint
...
python-flint 0.6.0 does not provide binary wheels for your platform. Wheels are provided for

cp312-win_amd64

cp311-win_amd64

cp312-manylinux_x86_64

…

Your platform is cp312t-linux_x86_64.

NOTE: You are attempting to install python-flint into a free-threading build of CPython but python-flint 0.6.0 does not provide any binaries for a free-threading build. It is possible that python-flint 0.6.0 is not compatible with free-threading or it might just be that python-flint does not yet provided binaries for this platform.

The source distribution for python-flint 0.6.0 indicates that pip should not attempt to build from source by default because external requirements are needed for the build:

Toolchains needed:

C (std=C11) compiler toolchain

External libraries needed:

GMP (C library)

MPFR (C library)

FLINT >= 3.0.0 (C library)

If you believe you have the external requirements then you can ask pip to attempt to build from source by running
$ pip install --no-binary python-flint python-flint
For more information about installing python-flint see:

https://python-flint.example.org/docs/install.html

ERROR: Unable to install python-flint.

The output should end with a link to information provided by the project being installed. The project documentation can explain why wheels are provided for some platforms and not others and what someone would need to do to be able to build from source if they want to.

It should also be possible for pip to provide some helpful information automatically like “python-flint has no wheels for CPython 3.13” or “python-flint has no wheels for PyPy” etc.

steve.dower · June 28, 2024, 11:38am

This could all be achieved today, with one small tweak:

Oscar Benjamin:

If you believe you have the external requirements then you can ask pip to attempt to build from source by running
$ export PYTHON_FLINT_BUILD_SDIST=1
$ pip install --no-binary python-flint python-flint

Then whatever build backend/script is being used just has to look for the environment variable to decide whether to print this nice message or not.

Users will not mind this at all (other than the non-portable export syntax). So while it would be nice if the entire ecosystem (pip, PyPI, and build backends) added support for it, it’s entirely possible to just do it.

oscarbenjamin · June 28, 2024, 11:48am

It should not be necessary to set an environment variable when building normally e.g. for local development or for downstream distro packagers etc. Anyone who manually downloads the sdist and asks pip to install from it should get the build-from-source automatically without needing to set an environment variable. Likewise other build frontends like python -m build should work without setting an environment variable.

The problem is the fact that the common end user invocation pip install python-flint should not default to attempting to build nontrivial projects. The error message from pip is all about build failure when it should really be about explaining to the user that there is no wheel for their platform.

steve.dower · June 28, 2024, 12:16pm

I can see that, but I can also see that a pyproject.toml is a signal that an sdist is meant to be built. Then the issues are not failing quickly enough, and not failing clearly enough, both of which again can be handled by the build backend checking for what it needs before it tries to build.

I still think the best approach here is going to be to get the semantics of pip’s default right (my best idea: “only consider an sdist when there are no wheels at all for the selected release”). That doesn’t require any direct cooperation between pip and the original package, and can be implemented unilaterally by pip at any time (when someone has the time available).

It can also be ignored by build, and if you want different behaviours between two spec-compliant tools, then you don’t want to be updating the spec.

oscarbenjamin · June 28, 2024, 12:49pm

The sdist is always meant to be built. That is its primary purpose!

Your suggestion for the environment variable amounts to releasing source code that deliberately fails to build by default which is really a very odd thing to do.

The problem here is that pip is primarily intended to be an installer rather than a builder and the vast majority of users use it for installing rather than building. It falls back on building when attempting an install but does not have the capability to satisfy the external build and runtime requirements that are needed for many of the projects that users commonly want to install. This fallback behaviour is not appropriate for an installer when the build has a high chance of failing.

I’m not sure that any spec actually needs to be updated for this although PEP 725 would be useful for error messages and could potentially provide a basis for pip to refuse building by default.

Can the “pip don’t build me” flag not just go in trove classifiers or something?

I agree. I presume that the reason this hasn’t happened yet is because it is not a backwards compatible change for pip and so this would mean pip taking responsibility for much downstream breakage.

An opt-in per-project flag limits the scope of breakage and allows the project being installed to take responsibility. The project is in a better position to judge whether pip’s current attempt to build from sdist is most likely not a good idea. Many projects already have judged this and do not supply sdists because that is currently the only way for them to prevent pip trying to build which is not good.

pf_moore · June 28, 2024, 1:29pm

Precisely this, yes. If you want to follow pip’s progress on this, you should look at Speculative: --only-binary by default? · Issue #9140 · pypa/pip · GitHub (which includes discussion of per-project opt-in and many other ideas).

But there’s also the fact that we’re now in a world where “the installer behaviour” doesn’t just mean “pip”. I’m more than happy to improve pip’s UI, and that’s something that should be discussed on the pip tracker, not here. But if people want uniform behaviour, or an approach that package maintainers can rely on, then we need a standard that all installers (pip and uv at least, currently) are expected to follow.

I think it’s probably important to separate out:

Discussion of extra metadata to allow projects to signal “don’t auto-build wheels from the sdist when asked to install”. That’s the core topic of this thread.
A possible common UI that we’d like “all installers” to have. That’s a nice discussion to have, but ultimately it’s not likely to go anywhere in the short term, as we don’t really have an enforceable concept of “tool UI standards” right now.
Discussions on actual tool UI design, which should be happening on the pip and uv trackers. In practice, these will probably be more effective than (2), as uv tends to follow pip’s UI^[1].

but not always the other way round ↩︎

kknechtel · June 28, 2024, 1:34pm

I don’t really see it that way, no (assuming you mean “built automatically by Pip because wheel installation wasn’t possible”). A big part of the point is to minimize reliance on setup.py to the things where it’s actually necessary. Maintainers who want to avoid users building locally should still be able to benefit from giving project metadata in TOML format instead of in keyword arguments to a setup call, and having a clean way to document what their build backend is (even if they locally just invoke it explicitly instead of expecting the standard tooling to look it up). Explicit is better than implicit.

mwichmann · June 28, 2024, 1:44pm

While this sounds perfectly reasonable to me as a forever “shell” cli user, do recall that (some) people provision their environment through helper tools inside their IDE where they may just be clicking on things to initiate an install. Of course PyCharm and VS Code will find a way to support whatever direction things go - eventually - but it’s just one more bit of burden.

steve.dower · June 28, 2024, 2:16pm

It was to remove the assumption that setup.py would be the build script, such that pip (essentially) could support backends other than setuptools. It was entirely motivated by frontends (a concept invented by PEP 517), not just to make it easier for maintainers to avoid setup.py.

The critical point is that you can build your own project without a pyproject.toml.^[1] It’s not inherently part of the development process - it’s inherently part of the install process (including “I’m building a wheel now because I plan to install it later”).

I recall. I maintained one of these for years, including developing the actual functionality you’re referring to, and handling all the issues raised by users that were fundamentally unrelated to the IDE but belonged to the underlying package (and as bad as open source users can be, enterprises who genuinely do have a multi-million support contract with your employer can also be less than fun to deal with).

Provided your build backend doesn’t keep its own data in there, which is in no way a requirement for them, and was not the original intent. ↩︎

pf_moore · June 28, 2024, 6:29pm

A proposal (i.e., a PR ) to change pip so that a failed sdist build simply said “Unable to build a wheel for xxx - use -v to see the build output” would certainly be possible. Whether it would be accepted, I don’t know. Too many people use pip as a build tool for me to imagine that this wouldn’t be a huge compatibility break.

But if you (or anyone else) think it would be a useful improvement, go for it. Output/error reporting is something we know pip can do better on.