Provide a way to signal an sdist isn't meant to be built?

oscarbenjamin · June 29, 2024, 11:19am

For these cases it is fine that pip attempts the build. If pip is called like pip install . or pip install dist/foo-1.0.tar.gz then attempting to build and failing if necessary with reams of build output is the right thing to do. If there were a “don’t build me by default” flag then it should apply to sdists downloaded from PyPI or another index rather than explicitly provided local sources.

I also think that it would be reasonably okay for pip to break these cases in a future release. The problem with disabling sdist builds by default is that there are many pure Python projects that can build and install just fine from sdist and so don’t bother to publish any wheel. I ran into one just now:

If pip did not build sdists by default then pip install pydy would break although various exceptions have been discussed like “if the project has no wheels at all” that could allow that case to continue working.

pf_moore · June 29, 2024, 11:29am

My comment was in the context (if you follow back about 4 or 5 messages) of suppressing the build output “because people who just want to install don’t need to see the build output, and probably aren’t in a position to make use of it”.

So your comment simply reinforces my point, that suppressing the build output in favour of a more intelligible message isn’t going to be acceptable to everyone

Correct. This has all been discussed on the pip tracker issue I linked to above:

… which I know you’re aware of as you’re involved on that issue, but others following along here would be advised to read the background in that issue to avoid repeating discussions that have already taken place.

steve.dower · July 1, 2024, 12:22pm

Perhaps this is the [mis]belief that’s causing so much tension here, then? Along with people believing pip to be the tool that they use to build their project, rather than the one their users will use to install it.

Maybe I’m just building unusually complicated stuff, but my own development process never touches the pyproject.toml, and only uses pip to set up the dev environment (once, and half the time I just .pth-link other directories in by hand because it’s so much faster and more convenient). The pyproject.toml to me is just an artifact for the repo and sdist so people can discover how to build from source conveniently, but it would be such a pain for inner-loop development to rely on it.

On the basis of this proposed solution, I’m inclined to say a PEP is needed and we should do it this way. Maybe a pre-build-warning field with a message to print and wait for user confirmation^[1].

But then, I’m obviously in the minority who believes the pyproject.toml build section is for instructing installation tools what to do, while everyone else thinks it’s meant for the publisher

Unless they requested a non-interactive install, of course. ↩︎

oscarbenjamin · July 1, 2024, 1:43pm

The pyproject.toml build section is clearly for telling build frontends how to build the project. You need to include it so that downstream distributors, cibuildwheel etc can build the project. The question is whether it makes sense for an installer to act as a build frontend to which the answer is that ideally it would not. There is a legacy of build-to-install behaviour that makes it difficult to move to that model wholesale though.

It is up to the installer how they want to handle the situation. The PEP would just provide a way for the sdist to signal that it might not be a good idea to build.

I imagine a very short PEP with the substantive points being:

The [build-system] section of pyproject.toml MAY include a key has_external_requirements = true/false. Tools MAY treat the absence of the key as meaning unknown rather than true/false and MAY handle that case differently.
It is not specified whether has_external_requirements = true refers to build requirements or runtime requirements but the implication is that installing the other explicitly listed build and/or runtime requirements is not typically sufficient to ensure that the project can be built and will work at runtime.
An installer that might otherwise build a distribution from sdist MAY choose to build or not build from source based on the presence and/or value of the has_external_requirements key.
An installer that chooses not to build MAY choose a different version of the distribution or MAY choose to exit with an error.
Tools MAY provide options to control this behaviour such as an --always-build option that would ask the tool to build always regardless of the has_external_requirements key.
The [urls] section of pyproject.toml MAY include an install URL which should be a URL to a page that gives instructions for installing the project and a build URL which gives instructions for building the project.
Installation and build tools MAY present the install and build URLs as information to the user for example if the tool decides not to build and/or install the distribution or if building the distribution otherwise fails.
Possibly an external_requirements_description key could provide a longer help text for users that a tool might want to show besides just the URLs?

I imagine from this then that pip could document its intention not to build a distribution if it has has_external_requirements = true but that it would also have an option to override that. I don’t think that the PEP can really mandate what the behaviour of pip or other comparable tools should be beyond the MAY clauses above.

steve.dower · July 1, 2024, 3:16pm

I don’t think we can call any answer “ideal” at this stage - we’ve already heard from both sides that their side is ideal.

Part of that legacy is the ABI of a particular CPython runtime. Nobody is even seriously contemplating stabilising that across all runtimes,^[1] which means while build-to-install might be legacy, it’s still the only way to guarantee compatibility, so it’s not going to go away in any timeframe we can plan for.

The way to avoid it is to use a package repository that is all built consistently, all of which use tools other than pip, but if one did use pip then it would have wheels for everything and this discussion doesn’t apply. So long as we’re in the realm of having to match arbitrary ABIs at install time, compile on install will be here.

I prefer something that’s an open-ended string, which can start as “will be displayed to the user” but could easily become structured information later on.

Or at the very least clearly linking the setting to the intended action. “Has external requirements” doesn’t directly lead to “avoid building” - that takes a (small) leap of logic. An avoid_automatic_build setting wouldn’t have that leap, so it’s easier for tools and readers to infer what to do. (Not as useful as a human-readable list of things they might need, still.)

By which I mean all non-Windows and *basically* non-macOS runtimes. ↩︎

oscarbenjamin · July 1, 2024, 4:48pm

It doesn’t directly lead to avoid building because in many situations it does not mean avoid building. In a context where the external requirements are detected or arranged to be available the build could be expected to succeed and building might be the right thing to do. Python tools like pip cannot detect the external requirements and cannot make them available so in that context having external requirements means not being able to do what is needed to ensure that the build succeeds. Tools like conda, brew etc can arrange for the external requirements to be satisfied and therefore don’t have the same reason to avoid building.

That being said I would be fine with avoid_automatic_build as a name. I doubt anyone will understand what it means without looking it up either way and the practical definition will be more like “this flag stops pip/uv/poetry/… from trying to build the project”.

A longer human readable string can also be provided. There are pros and cons of using a URL rather than a readable string. The information at the URL can be updated with latest installation instructions after any given release e.g. the instructions might say “don’t use version 1.2” which would potentially not be mentioned in the help text for building version 1.2. On the other hand a URL might become invalid over time for older releases. Probably both a URL and a help string together is best.

barry · July 1, 2024, 5:19pm

Yes, to both things!

That seems like a “code^[1] smell”.

I agree. Is there any research indicating what the majority of Python users think about this? We’re all in unique situations where we have deep, extensive knowledge about how things work… and don’t.

I’m with @steve.dower that something more direct would be better, and I’d prefer something that favors a positive statement over a double negative. Something like autobuild_sdist = true|false|unknown with unknown as the default.

I also think there should be a free-form text field that can be used as an informational output when the installer chooses not to build from sdist. autobuild_sdist_warning perhaps. Or as @oscarbenjamin suggests, a [urls] entry would be better.

An important use case is for pure Python packages where there is no wheel. Those are safe to build, but likely if the package maintainer isn’t building wheels, they also won’t set this field in their pyproject.toml. An installer wouldn’t know that this unknown value should default to true, but it’s possible that a build backend can deduce this and set an autobuild_sdist=true in the sdist metadata.

I wonder if a global --always-build is right though. The use case I’m thinking of is the sdist-stub technique for downloading an appropriate wheel from an external index as the “build” step for an sdist-only distribution. Until we have a generalized solution for that, we need to continue to allow that. So maybe --always-build takes similar arguments to pip’s --only-binary argument^[2].

process? ↩︎
or hmm, these feel like very related concepts, so standardizing on that experience, or leaving it up to the installer tools to decide might also make sense ↩︎

barry · July 1, 2024, 5:21pm

That leads to a double negative. avoid_automatic_build=false is not the most intuitive way to say “it’s okay to build this automatically”!

kknechtel · July 1, 2024, 8:16pm

Well, I use build (pyproject-build, pedantically). But can you blame them?

Pip is the one thing that has provided many years of stability. Setuptools has been a moving target (admittedly for very good reasons, and it’s actually kind of a shame that it has to be burdened with so much backwards-compatibility cruft) the whole time, what with the deprecation of running setup.py directly, the removal of stdlib distutils, having the _distutils_hack in the first place, etc.

Pip gets used to build projects, because it can, and nothing else is obvious. It can, because it can (at least try to) build sdists. It builds sdists because it has to, otherwise everyone complains that half the ecosystem went up in a puff of smoke.

Nothing else is obvious because it doesn’t work with a clearly advertised flow. Setuptools used to be the thing when running setup.py manually was expected. build should be the thing, but how exactly are devs meant to find out it exists? I know because of the interest I took in the Packaging forum here and all the time I spent trawling through the PyPA website. Similarly for a lot of third-party stuff, FWIW.

Do you not ever:

Determine that the project needs a dependency that wasn’t previously listed, or no longer needs one?
Design a new console entry point for something that didn’t have one?
Bump the version number, or change trove classifiers?
Re-license a project?
Add authors or maintainers?

Maybe you have tools to handle all those things. I don’t want to be dependent on tools like that. At least, I want to be able to verify the changes they made, and understand the git diff for my pyproject.toml. Or maybe you don’t think of those as development tasks. I certainly do.

On the flip side: my projects are almost always pure Python. I’m going to make a none-any wheel, and I’m going to do that locally, and it’s doing to rely on the contents of pyproject.toml. If I make an sdist, it’s because build can do it easily anyway, not to distribute the code - I’m going to put it on GitHub anyway. I understand that wheels are still faster and the most basic tooling makes it trivial for me to make that wheel.

So why would I ever think of pyproject.toml as part of the installation process? For me, it clearly isn’t. If my well-meaning users ever touch an sdist, something has gone seriously wrong with PyPI.

pyproject.toml, IMX, doesn’t become part of the “inner loop” just because it’s part of the build process. It represents important, but relatively unusual changes to configuration. I haven’t found it painful at all. (And really, quite a bit of thought went into the choice of TOML as a format, yes?)

steve.dower · July 1, 2024, 9:59pm

Better than trying to keep a PYTHONPATH variable up to date. A .pth in a venv just gives you extra search paths for that venv, and when you’re trying to cross-reference projects within a monorepo, it’s as easy as it gets!

Yeah, they think it’s too complicated and want a single tool that does their workflow perfectly, with no consideration for anyone else’s

Omitting the option would be the way. There’s nothing practical anyone can do with an “unknown” here, and the “avoid” already implies that it’s non-binding.

A readable string can contain a URL - humans know what to do when they see a URL.

Doing a quick scan of some intro docs, both flit and hatch advertise their own frontend primarily^[1], setuptools suggests python -m build and Meson leads with pip install . and then recommends -m build.

So you could argue that there’s no clear preference between “you’re allowed to invoke your build backend directly when developing” vs. “you must use an independent front-end”, but it’s a real stretch to say that nothing else is obvious or advertised.

As does pymsbuild, but that’s mine, so I won’t count it. ↩︎

steve.dower · July 1, 2024, 10:10pm

While developing I’ll install dependencies separately from updating my package metadata.

Console entry points are a convenience for other users - I use python -m near exclusively, and always support it first in my own code.

Version numbers get picked up at build time (usually from the Git tag, since all my releases happen in CI).

Classifiers just live in my package metadata - I can edit them whenever I want, but I don’t have to reinstall or rebuild my project as a result. Similarly for licenses and maintainers.

It so happens in my case that these don’t live in the pyproject.toml, but they could and it wouldn’t change my workflow.

My point is that when I check out my own code to develop, I’m not running any tools that read from pyproject.toml. I’ll create a venv myself (always a custom name, usually with the Python version in it and often with the platform), install from a dev-requirements file, and get to work. None of this process requires anything that’s in the pyproject.toml.

I really think you’re misunderstanding my point. Hopefully already clarified above, but I have no issue at all with the file format, structure, or contents. And yes, I remember how much thought/discussion went into choosing the format - I was part of it I’m pretty sure I advocated for ways to make it more useful for the inner loop^[1] but we chose not to succumb to feature creep.

What it doesn’t do is make my “check out code, start editing it” any easier. Not one bit. So I’m never going to look at the file and think “that’s for my benefit”, when it provides no benefit - it’s clearly intended for other people to do the simplest possible build of my code.

Though TBH, I don’t remember if that was before or after it was all done. It’s certainly happened since then. ↩︎

barry · July 1, 2024, 10:43pm

Ah, a monorepo! Yep, been there and we don’t really have a great story for that case. At $job-1 I looked into that, had my own pile of hacks, and was hopeful that hatch would eventually solve it^[1].

I still am, but in $job TBH I don’t have an immediate need for monorepo support ↩︎

oscarbenjamin · July 1, 2024, 11:06pm

It is not always right which is why I said “such as” and then presented the simplest possible version of how an option could look.

In the case of pip there already many options for controlling this like --no-binary, --prefer-binary, --only-binary and each of these can be per project or global. What most users would be better served by is “build if the build is likely to succeed” but pip can’t provide that right now because there is no way to know.

Project maintainers want to pass that information to pip if it can make use of it and pip maintainers want to get the information from projects if possible. Once the information is available pip can make use of it to improve UX for end users. It is inevitable though that pip will have some sort of --retain-old-behaviour flag if it does so.

Exactly what options an installer should provide are up to the installer to make something useful and preserve compatibility where needed. The purpose of the proposal here is that the installer has some way to get the information to distinguish “build likely to succeed” vs “build not likely to succeed”. From there it is up to the tool maintainers to decide how to make use of that and it is up to the project maintainers to provide that information if it seems useful to do so.

kknechtel · July 2, 2024, 1:17am

I don’t understand. pyproject.toml is my package metadata.

Ah, well. I don’t want other people to build my code (to the extent that “building” ought to mean anything in the first place); I want it to be built for them. As far as I can tell, that’s my responsibility. And all the reasonable ways I can think of to do it in 2024 are PEP 517/518/621 compliant.

I guess we do just inhabit different worlds, then. Fair enough.

brettcannon · July 3, 2024, 1:41am

I think what’s being suggested here is a flag that says, “this sdist is self-contained and only requires a compiler”, not whether the package maintainer is making a judgement call on whether you should build the sdist yourself. This seems more like self-contained expresses the intent. But this does still leave out expressing what sort of compiler is expected if you want a simple e.g. C or Rust extension to be flagged as buildable. For instance, I think having compilers for those languages is reasonable, but not for e.g. Fortran, but I bet some disagree on Rust. I think any PEP for this is going to need to be clear on the expectations, otherwise someone out there is going to think their package is easy to build because it works on their Ubuntu install w/o considering Fedora.

And if I’m wrong about that and you don’t mean that, then this feels like a pure-python flag which seems less useful since pure Python wheels are the easiest to create.

kknechtel · July 3, 2024, 4:39am

… And yet PyPI is full of examples of people not doing it anyway, despite the advantages.

Maybe there’s some way to fix that, like, with automation of some sort? Is it a question of raw computing resources?

aragilar · July 3, 2024, 9:33am

Wouldn’t this be better represented by PEP 725, where the specific requirements can be enumerated? self-contained feels too unclear to me, e.g. C is probably fine given most C codebases seems to stick to C89+things that msvc supports (and those using newer versions are probably not interested in supporting Windows anyway, but how do users find this out?), C++ less so as people are more likely to be using newer features, Fortran isn’t commonly installed, but once installed it’s in a similar position to C, and rust projects (unless they’re only using std) are going to be very dependent on the MSRV of the whole dependency tree (and one minor change could take the project from being widely buildable to not).

groodt · July 3, 2024, 9:55am

attempt_build_on_install - True, False

oscarbenjamin · July 3, 2024, 10:50am

I would also exclude anything that requires a C compiler. I think most end users don’t have a C compiler especially on Windows. Tools like pip have no way to install a C compiler and cannot even check for one. The backend could check but then that is complicating things compared to just a flag that hints at not building the project. A user who wants to compile C code as part of install can use the --build-anyway flag to enable it.

Ultimately though projects will decide whether or not to add the autobuild_sdist or whatever flag so whatever definitions we make here it will end up being a matter of project preference based on whether they think that building is reasonable for their target users.

oscarbenjamin · July 3, 2024, 3:30pm

The problem with autobuild_sdist and attempt_build_in_install is that they sounds like binding instructions so that the installer MUST attempt the build. I like the soft “avoid” in avoid_build_by_default or otherwise the fact that has_external_requirements gives a reason for maybe not building but does not precisely state that building will or won’t happen.

Perhaps you can soften it by adding _hint at the end like:

build_sdist_hint = yes/no