Provide a way to signal an sdist isn't meant to be built?

@oscarbenjamin agreed. However, I wonder if we aren’t putting the “is this hard to build” burden on the wrong participant. The more I think about it, the more I’m coming around to the earlier suggestion that PEP 725 is a key part of the solution. Ultimately the package author can’t really know whether the package is hard to build, because they don’t know who the consumer of the sdist is. It might be hard to build for me, but easy for you, because you’re motivated enough to provide all the necessary requirements for building. Therefore, it’s the end consumer of the sdist (and by proxy, their tool) that decides.

A solution scenario then looks like:

  • PEP 725 to express the external dependencies need to build the package. The package author, who writes the metadata, should have a very clear picture of how to fill this in.
  • The installer tool, which either by default or option, has an “only install wheels” flag.
  • A new piece of sdist metadata which is supplied either by the package author, or the tool turning a source repo into an sdist, that says "this sdist is pure Python and needs no additional dependencies (other than Python) to build a none-any wheel from sdist. This covers the use case of uploaders of pure Python packages only providing an sdist, and gives a “build the sdist” exception [1] for the “only install wheels” default.

Installer tools can also implement heuristics for building sdist-only releases that don’t include the PEP 725 or “I’m a pure Python package” metadata. Some of those heuristics have been described here, but that’s a tool UX decision. We can make strong recommendations but not requirements.

Installer tools can also provide options to explicitly override specific package corner cases, but that’s a tool UX decision that also should be strongly recommended but not mandated.

Does that get us everything we need?


  1. with a near certain chance of success ↩︎

2 Likes

This is indeed an important point. For example, I routinely have Visual C++ installed on my system. So for me, “needs a C compiler” isn’t a blocker. But (for example), “needs the GMP library” is a blocker. Expecting anyone other than me to make that distinction is unreasonable. Conversely, though, a system that says everything that’s not pure Python is “hard to build” makes things harder than they need to be for me.

Getting the UI for this right (so that I can set a config somewhere that says “I’m OK with needing a C compiler” and not be repeatedly asked about it) will be non-trivial. But that’s a UI issue that tools can work on once the framework is in place. What matters is getting the foundation set up.

7 Likes

At the cost of a bit of a delay, I bet package authors find out pretty quickly - they’ll get a flood of issue reports like “I can’t install your package on Windows, here, look at this hundred-line mess it gave me.”

Counter-example: as one of the authors of PyArrow, I can definitely say that our package is hard to build. I agree that there’s a large gray zone, but for some packages like ours the answer is clear-cut.

4 Likes

Is that because it’s difficult to assemble all the right external build dependencies, or something else? Or to put it more concretely, can that difficulty be captured in PEP 725 or not?

Let me remind you of the prior stated opinion of one eminent expert on this topic:

Going with PEP 725 is fine but it basically means not doing anything about this now and leaving it unresolved for an indeterminate time.

I also don’t think that PEP 725 really answers the problem as to whether building should happen. I always have C compilers and various other libraries installed but I usually do not want pip install foo to start building nontrivial things. It seems very reasonable to me that I should pass a flag like --spend-ages-building-stuff-that-will-likely-fail if that is actually what I want (although perhaps a shorter name can be found).

Hmm, who let my evil twin near my keyboard? :rofl:

4 Likes

Both because there are many external build dependencies (some optional) and because the build chain itself is non-trivial (with a separate C++ build step and a number of options that can be tweaked). We maintain CMake presets and Docker setups, but still.

The reason to do this at a project level, rather than a version level, is that current tools provide interfaces (--no-binary, --only-binary, --prefer-binary is solver wide) to choose whether wheels are used at a project level. A project level hint on the index about what the state of the project is would mean clients could avoid scanning every release to eventually error out. The starting set of possible hints I can think of as useful would be “unhinted” (i.e. use client defaults), “only-binary-recommended” (i.e. don’t try to build from source unless specified by user) and “build-recommended” (i.e. this project should use the latest version, even if no wheel is available).

Automatically setting “build-recommeded” on sdist-only projects whose latest release is more than 5 years old (on a rolling schedule) would let pip switch to using --prefer-binary, which seems to be a long term plan but one that hasn’t made much progress (due to the breakage which “build-recommeded” would avoid).

2 Likes

The 99% case here is that someone does pip install foo without passing any of those flags. It is this situation where the default behaviour should be to check the dont-build-me flag. I don’t think that backtracking should be a response to the dont-build-me flag because the main purpose is to be able to exit with an error saying “no wheels are available for your platform”.

On the other hand if someone does pass any of the *-binary options then that should override checking the dont-build-me flag because it is irrelevant if the user has already explicitly said what they want:

  • If they pass --no-binary then they don’t want to build any way so no point checking dont-build-me.
  • If they pass --only-binary then they definitely want to build so no point checking dont-build-me
  • Anyone who passes --prefer-binary rather than --no-binary has accepted the possibility of building so again I think no point checking dont-build-me.

If installers want to implement something more complicated then they can but I don’t think that backtracking to find a version that does not have the dont-build-me flag is a good idea.

2 Likes

I’m not so sure --only-binary implies that they always want to build, modulo pure-Python sdists[1] with no wheels.


  1. however that’s figured out ↩︎

I think @oscarbenjamin just got --only-binary and --no-binary switched around in his message. Which demonstrates why it’s so important to come up with good names for command line options :wink:

3 Likes

But if there’s no backtracking, then a single missing/failed upload of a wheel causes much more pain for users of that platform (and bugs filed to maintainers etc.). If an installer is hinted “for this project, use a wheel even if it’s older”, then an older wheel can be used, meaning less users hit the missing/failed wheel case, which results in happier users (because if they want to use the latest version, building from source remains an option, and if they don’t care, the older version works for them) As far as I understand, on PyPI project metadata is changeable, whereas distribution metadata (and files) is fixed, so being able to rapidly update hints if an issue occurs (e.g. a CI system fell over and there’s no wheel for platform X yet), rather than relying on a client to choose a particular sdist to disable backtracking.

A hinting system run by the index (with maintainer overrides) also allows for nicer error messages. If the index provides a hint “no sdists for this project” and the client selects --no-binary for that project, the error message can be more specific. Or if there’s are hints “no musllinux wheels for this project” and “no sdists for this project”, then users on alpine linux can get a message “project X has not uploaded a package for your platform, please contact the maintainers of project X”.

1 Like

FWIW, this is technically feasible today purely from the installer’s end and, if we add this, I’d expect that we’d use language that applies significantly less pressure on maintainers to support additional platforms.

4 Likes

Judging by some help requests on this very forum, many users attempt to install libraries from PyPi onto unsupported platforms.

It would be great if the “contact the maintainers” message could be customised (by meta data determined by the package authors?).

E.g. "This platform is not supported. To request support for this platform, please use this link: … " or:

"I’m sorry, Windows is not a supported platform. Due to the nature of this library, there is no intention to provide support for Windows. The library can still be run in WSL, using the steps described here:… "

4 Likes

Wouldn’t said maintainers want to be informed of a failed wheel upload? (I would)

i didn’t say there should be no backtracking to find a wheel, I said that backtracking to find an sdist that does not have the don’t build me flag would not be a good idea because all old enough sdists won’t have the flag and it is most likely that if new versions have the flag then the same reasons apply to not building old versions. Installers could perhaps take the flag as being like --prefer-binary or --only-binary and apply that kind of logic for backtracking to find a wheel. The backtracking is only useful if it finds a wheel though.

Installers could use hints from the index as you describe to make better error messages but pip already has most of that information without any changes needed to the index. The only bit missing is the possibility for project maintainers to add some text that pip could show. There’s no reason though that the text couldn’t be in the sdist so I don’t think changes to the index are necessarily needed to provide those hints.

2 Likes

That’s the primary motivation behind my proposal, actually. (But implementing it does seem like more work than I could handle at the moment, and certainly more than I could expect of others…)

They get a sense in aggregate, but this doesn’t help individual users. And honestly, they probably usually have a pretty good idea of that aggregate sense to begin with.

… And that’s why I had the bit about splitting up the PEP 725 implementation, so there could be human-readable descriptions first and automated attempts to locate dependencies later (much later, probably).

Although, you have a point that there’s a difference between knowing whether you most likely can build something locally, and deciding whether you want to make the attempt. Especially without a good idea in advance of how long it might take.

1 Like

Have we (me included) set up a false dichotomy here? The longer the thread continues, the more the index metadata and artifact metadata ideas feel complementary to me rather than competitive.

Specifically, a variation on @barry’s idea:

  • index server API that reports “has-external-build-dependencies” as a boolean flag on sdists
  • PyPI project level option to set that flag for all published sdists
  • installers take the new flag into account when deciding whether or not to consider sdists as viable artifacts
  • PEP 725 support in artifact metadata
  • index servers set the boolean flag based on the presence of PEP 725 metadata

That way the first two steps could be implemented independently of PEP 725, but still take advantage of it once it is resolved.

6 Likes

It’s a very interesting idea @ncoghlan. I do like the general idea of thinking about index (project?) and artifact metadata as separate, complementary concepts. The caveat being that IIUC, we don’t really have much in the way of public existing project-level metadata, although the Package database model does define some properties that could be construed as project-level metadata. Also, IIUC [1] there no API currently to modify project-level data.

None of those things are insurmountable, just more work in getting from where we are today to your first two steps (i.e. short of full 725 support).


  1. and I probably don’t! ↩︎

2 Likes

Yanking releases is the main case I am aware of, but I’m not personally aware of the details of how that is implemented on the server side.