Provide a way to signal an sdist isn't meant to be built?

brettcannon · July 3, 2024, 9:13pm

Do we know how many of those are effectively too old to even know about wheels? An issue is we don’t know how many of those sdist-only projects are purposefully avoiding wheels or the maintainers simply don’t know about them.

And people power to keep such a resource running (e.g. look at what it takes to keep conda-forge running).

Probably, but this discussion has started now and PEP 725 hasn’t been accepted yet.

Which brings us back to what sort of sdists are going to flag themselves as installable? Any non-pure Python sdist? If we take the view that pure Python projects that are leaving out wheels due to a lack of knowledge instead of consciously avoiding wheels then I don’t see how the flag will help much compared to just raising wheel awareness (which probably requires making sdist installation opt-in).

kknechtel · July 3, 2024, 9:25pm

… I suppose automatically emailing the maintainers, as a one-time thing, is out of the question?

… Although that actually probably would go a long way. Let the community send the notes to maintainers - hmm, I can see why people might raise concerns about that, but.

pf_moore · July 3, 2024, 9:48pm

That’s basically equivalent to having people raise “please provide a wheel” issues on the project’s tracker. Which is something people could do now (but don’t).

mwichmann · July 3, 2024, 10:16pm

For GDPR-covered localities, yes, unless I misunderstand the way you sign up for a PyPI account (it’s been a while, to be honest) - there’s no active acknowledgment that your email address can be used for contact.

oscarbenjamin · July 3, 2024, 10:32pm

I don’t see why we need any sdists to flag themselves as installable. The purpose of the flag is the opposite: we want to flag the common cases of sdists that are very likely not installable. The vast majority of Python packages would not set this flag but some very widely used ones (NumPy etc) would.

Ultimately the best thing would be if the standard end user tool (pip) did not attempt to build sdists by default and if all pure Python projects shipped wheels. The pip issue discussing that has been open for years and is talking about needing funding and community outreach to flip the switch so clearly it is not an easy change to make and perhaps it will never happen.

In the meantime we can have a flag that is opt-in on a per-project basis. That makes it possible to achieve the desired effect of not attempting likely unsuccessful builds for most end users of the most common packages. Importantly the flag is set by the project so it is not pip that breaks anyone’s workflow but rather the project that makes the decision and takes responsibility for it and documents what users should do. That way pip does not need to get funding and do outreach etc to make this work. It means that pip does not need to break the installation of pure Python sdists that actually build just fine or of sdists that are only ever used by people who all have C compilers.

It is important regardless of our discussion here about which projects should build by default that it is the project that decides to add the flag. The project has a much better idea of whether this is a better default for their users than the pip maintainers do. We can debate definitions here but the project will have their own reasons for wanting to add this flag or not.

If we do want to make a more principled definition though then I refer back to my original one:

In other words if the installer cannot install or even check the requirements then the default (for the vast majority of Python users) should be that it does not attempt to build the project.

pf_moore · July 3, 2024, 10:39pm

To be clear, the actual change in pip is trivial. It’s working out how to do it without causing huge breakage that’s hard.

This discussion is simply another iteration round the same question, and honestly, most of what’s coming up has been discussed before. That’s not to say the discussion isn’t useful, just that this isn’t a way of avoiding the work proposed in the pip issue, it’s just doing it (but on a volunteer basis, and without funding time from specialists like UX experts).

groodt · July 3, 2024, 10:50pm

I don’t think we can practically solve this problem through static analysis of the projects. There will always be edge-cases where it’s not just external dependencies or compilers required to build a wheel.

I think it needs alignment from both package author side (who knows what their project needs and what they are willing to officially support/recommend) and end-user package installers like pip and others.

The build_on_install package setting would be ignored when:
–only-binary :all: (or similar scope that matches package is supplied)
–no-binary :all: (or similar scope that matches package is supplied)

The build_on_install package setting would only be considered as an installation candidate when:
–prefer-binary == True and no matching wheels exist AND the package itself is build_on_install = True
–no-binary :build_on_install: (or similar scope that matches package) AND the package itself is build_on_install = True

mikeshardmind · July 3, 2024, 11:11pm

What about something that both signals to pip (or other installers) and which provides a mechanism for the project to present a message to users?

requires_confirmation_to_build=True  # defaults false
build_confirmation_prompt=" ... "  # defaults to empty string, no effect without the above being True

(I’m aware this doesn’t localize well, and may need further iteration or a mechanism for that as well, perhaps this is a table with a default message, and locale strings)

when installing, reaching a package with this would prompt with the message and prompt for yes/no (y/n) confirmation.

could come with a flag to accept confirmations, but I think such a flag should require the name and version of the packages being bypassed if this is bypassing the prompt.

This allows packages to explain to users why they don’t expect pip to build it, possibly provide a better help resource encountering it, as well as document (and present to users) what’s required to build it if it’s supported, but discouraged.

kknechtel · July 4, 2024, 12:14am

I feel like part of the problem is deciding what even counts as breakage. We’re all hypothesizing about user experiences, most of which won’t actually be experienced by most of us.

domdfcoding · July 4, 2024, 10:15am

What about the Maintainer-Email core metadata field - isn’t its purpose to allow you to contact the maintainer?

oscarbenjamin · July 4, 2024, 12:01pm

Personally I would find it annoying to answer any interactive prompt and I think that the flag for someone who actually wants to build does not need to be made any more difficult than --build-sdists.

Most commonly I would use this flag in CI jobs where interactive prompts don’t work anyway. There is already enough time spent fiddling with versions in CI jobs so we don’t need to create new unnecessary reasons to do it.

A PEP should not presume any particular UI. The metadata in pyproject.toml can have a flag and a help string but it should not be presumed that those are going to be used for an interactive prompt.

steve.dower · July 4, 2024, 1:11pm

If you can’t provide input (PIP_NO_INPUT=1) then the only reasonable thing for pip to do is to attempt to build. So I’d fully expect (correctly configured) CI to be unaffected.

It’s only really got value for interactive installs, when the user has a chance to back out and try a different path completely. A CI script isn’t going to switch to Anaconda halfway through, but a user might.

oscarbenjamin · July 4, 2024, 1:42pm

Defaulting to “yes” rather than opening an interactive prompt seems reasonable but in general I would expect pip to do the same thing when run locally or when run in CI (assuming same OS etc) so if the default is not building locally then the default should be not building in CI as well.

There will sometimes be other possibilities like choosing to install a different version of the project (like --prefer-binary).

You’ll edit the CI script to use Anaconda or add the --build-sdist flag if the CI job fails. In that sense it is not much different from running locally except that it is more tedious to have to edit it on a regular basis just to update version numbers.

aragilar · July 6, 2024, 6:48am

Rather than putting this on a specific sdist, wouldn’t be better for such metadata to be set on a per-project basis on the index. It seems unlikely that a project desiring this only wants to apply this to specific sdists, rather it’s an expression that the project requires users to have more than just Python configured and installed in order to build a wheel. This avoids issues with old sdists lacking said metadata. This would also give a potential solution to the “only old sdists for unmaintained package” issue, as it would mean that such packages could be flagged in such a way that pip et al. could keep building them from source while moving to prefer-binary (such a list could be created by finding all packages with no releases since 2020 (or choose a different date) that contain no wheels for all versions).

pitrou · July 6, 2024, 9:14am

Sidenote: ccache (local compilation cache) or sccache (compilation cache that can reside on e.g. S3) is generally a good way of dealing with this.

barry · July 6, 2024, 7:34pm

I kind of agree, but IIUC the only project-level metadata defined and available is the project name.

ncoghlan · July 12, 2024, 4:52pm

I think that’s misinterpreting what @dstufft wrote. I took that comment as meaning that of what PyPI currently displays on the project landing page, only the project name comes from PyPI, everything else (the long description, the classifiers, the project links, the maintainer info, the licensing info, etc) comes from the most recent release (at one point it was specifically “the first uploaded artifact for the most recent release”, but I’m not sure if that part is still true).

PyPI itself can definitely store additional info about releases that isn’t in the release metadata (e.g. that’s how the whole “yanking” concept works, since it is specifically about changing how an index server advertises a set of artifacts rather than being about changing the contents of those artifacts).

That means I’d expect to see any resolution to this issue to have more in common with PEP 592 – Adding “Yank” Support to the Simple API | peps.python.org than it does with PEP 725 – Specifying external dependencies in pyproject.toml | peps.python.org (since all metadata based solutions are intrinsically subject to the problem where installers just grab the newest sdist that doesn’t include the new metadata that says not to try implicit builds)

barry · July 12, 2024, 5:31pm

I think the point is that most of the metadata is tied to a published artifact and is actually derived from an artifact. The name, while included in the artifact metadata, is at a level above any specific artifact or distribution. Even though not derived from an artifact I’d say data-yanked is still tied to a specific artifact. The question then is whether a hypothetical dont-build-this-sdist metadata is also tied to an artifact, whether that’s provided as metadata in the artifact or out-of-band like data-yanked. If it’s out-of-band then it could potentially span all distributions released under that project name.

ncoghlan · July 12, 2024, 5:44pm

The situation here is stronger than that, though: metadata that is tied to the uploaded artifacts will not solve the problem that needs to be solved (giving installers a hint that attempting to build from source is almost certainly doomed to failure).

It isn’t a design decision that needs to be made, as a decision where only one of the two options is viable is no decision at all.

(I suppose you could argue that publishing a release with the new metadata field and then yanking all previous releases that lack the new metadata would be a solution, but that feels like it would be really stretching the meaning of “viable solution”)

oscarbenjamin · July 12, 2024, 6:08pm

I don’t see why this would not solve the problem. There are two steps in resolving a requirement for a distribution:

Select a version of the distribution.
Select a release artifact for the given version.

The dont-build-me flag applies to the second step. It does not need to mean that an installer should backtrack to step 1 and look for a different version. The installer should most likely error out if wheels aren’t available for the selected version and it should not be built from source (especially since it is likely that old versions won’t have the dont-build-me flag even if new versions do).

The obvious reason for wanting to tie this metadata to releases of the project is that there might be reasons to change the flag over time say because a new version becomes much harder/easier to build than older versions.