PEP-517 - definition of isolated build environments breaks sysconfig contract

Seems the PEP does only mandate for build environments that the PATH must contain an entry to the build dependencies scripts location.

This specification is sub-optimal for two reasons:

  • at no point mandates that this entry must be the first entry on the PATH (so potentially a shutil.which can pick up a different version than what the build frontend mandates, if an earlier entry on PATH contains that executable)
  • sysconfig.get_path('scripts') contract is now broken. This path should link to where the scripts for the python environment are installed, but this still returns the original caller python, not the isolated build environments scripts folder.

I think we should amend the PEP to fix both these issues.

Simple test:

[build-system]
requires = ["setuptools >= 41"]
build-backend = 'setuptools.build_meta'
import sysconfig
raise RuntimeError(sysconfig.get_path('scripts'))
# or 
import os
raise RuntimeError(os.environ['PATH'])
virtualenv --clear --pip 21.1.3 venv
./venv/bin/pip install .
...
      raise RuntimeError(sysconfig.get_path('scripts'))
  RuntimeError: /Users/bgabor8/tmp/demo/venv/bin

     raise RuntimeError(os.environ['PATH'])
  RuntimeError: /private/var/folders/jh/kx1127wj32l8c3s97131rhmr0000gp/T/pip-build-env-3e4b4lsh/overlay/bin:/private/var/folders/jh/kx1127wj32l8c3s97131rhmr0000gp/T/pip-build-env-3e4b4lsh/normal/bin:/Users/bgabor8/.pyenv/bin:

IMHO we should fix the PEP and also amend pip to respect it.

Procedurally, this should be raised as a new PEP proposing a change to the spec. As the build backend hook spec isn’t currently documented under PyPA specifications — Python Packaging User Guide, that PEP should also cover moving the spec to go under that page, and make the new location canonical.

See here for the process details, and note that the PEP process doesn’t allow for changes to PEPs once they have been marked Final.

(Disclaimer: I would completely agree if someone were to say that the above process is too much overhead for a relatively small change to the spec. I have no problem if someone is interested in starting a PyPA governance discussion to change the process, I’m only describing the current process as I understand it).

Thanks, @pf_moore, but for now I’m interested in consensus before I end up writing a PEP that gets rejected. Until we agree this should happen there’s no point in arguing over how it should happen from a governance POV.

1 Like

No, this can’t be fixed as that would prevent the backends from using native commands. A very simple use-case would be invoking gcc.
The isolated environment path should come in first though, which seems to happen in your example. If the package/build-system is correctly designed, it will not try to use scripts from the source interpreter, as all of the ones it tries to use should be shadowed be the dependencies installed on the environment.

Can you show an example case of this?

In my backend I want to use a tool that uses sysconfig to invoke its dependencies. In a normal virtual environment case to get the script of a dependency you use sysconfig. Why should this not work within build backends? And if isolated build environments violate this contract perhaps we should provide some mechanism to detect such environments.

Just make the changes you are requesting and try building a native module with setuptools or something like that.

What do you mean exactly? If you want to limit yourself to the sysconfig scripts, you can do os.path.join(sysconfig.get_path('scripts'), 'my-script'). Perhaps a helper, sysconfig.get_script could be added?

I am fine changing the spec to require the build-system to only invoke python entrypoints from the scripts folder, and allowing to invoke native dependencies from outside.
Since the target interpreter is run on a subprocess, we could possibly intercept subprocess invocations and bail out if it tries to reach for an external entrypoint script. I think this could be implemented with LD_PRELOAD on Linux. But this enforcement should not be mandatory, it would be provided where possible, and when runners want to.

Currently, as PEP-517 defines and as pip implements, this returns the path within the target Python interpreter and the path in the isolated build environment scripts path. Because the script is part of a build dependency invoking that path will fail.

Can you give a more precise description of the problem? I’m not sure I understand what the issue is here that you’re trying to solve (apart from the theoretical point about what sysconfig is documented to provide).

What is that tool exactly, and what dependencies is it invoking?

I don’t have specifics because most of the code is private. So you’ll need to do with hypothetical.

Imagine someone wraps Code Generation - pydantic to customize the generation of python classes during package build from JSON schemas. This means the wrapper tool depends on datamodel_code_generator. It can access the datamodel_code_generator via the console script because many tools (pip included) do not offer direct module access. The canonical way for a package to use the console script of another package is to do os.path.join(sysconfig.get_path(‘scripts’), ‘my-script’). However, that code will fail because sysconfig.get_path(‘scripts’) is actually not returning the location of the isolated build environments script folder. And it cannot really on PATH because the tool also needs to work when invoked explicitly from the CLI, in which case the user controls the PATH.

A possible workaround for your case would be to invoke it as [sys.executable, "-m", "wrapper_tool"].

Maybe that’s actually the Right Answer as well? There are plenty of everyday python deployments where scripts aren’t necessarily all in one place. E.g., if you’re using a distro python without a venv, and use pip install --user, then you’ll have some script entry point in the distro’s bin path, and some in the user’s bin path. I always understood sysconfig.get_path('scripts') to be a hint for installers where to put scripts, not a rule saying that you can never put scripts anywhere else.

If you don’t trust $PATH because you want to find the script correctly even with the user has messed up their $PATH, then it sounds like that means what you really want is to make sure that the script you find matches the one you’d find if you did an import. And sys.executable -m is a simple/elegant way to get exactly that.

2 Likes

Why not just invoke the tool using python -m? That will work, surely? I’ve never heard anyone claim before that the canonical way to invoke a script from a dependency is via sysconfig the way you suggest.

My concern here is that if we tighten the constraints on the build environment, we get ever closer to mandating that tools have to use a full virtualenv. I assume that we had good reasons at the time for not mandating a virtual environment gets used (I don’t recall what those were, though, so that is just an assumption¹). But if we’re going to continue to discover that backends are making assumptions about the environment that aren’t guaranteed by the current PEP, and our response is to add extra guarantees to the spec, rather than to say that the backend can’t make that assumption, then I think we should just bite the bullet and say that we do require a full virtualenv.

¹ One thing that comes to mind is that we don’t want frontends to have to bundle virtualenv. But if we’re OK to ignore Python 2 now, that concern isn’t valid any more as frontends can use the stdlib venv².
² Assuming we don’t care about installations like debian where venv isn’t supported in a minimal installation.

-m is not console script. Many tools provide console script entrypoints without -m. I know how I can change the tools to make it work, however I don’t think the answer here should be that we don’t support transitive console scripts.

OK, in which case my opinion is that either we leave the PEP as it stands, or we mandate a full virtualenv.

I really don’t like the idea of incrementally adding constraints here. I can’t see enough people being interested enough to work through the implications to get a meaningful consensus. And I have no appetite at all for the possibility of having to do that repeatedly. If backends expect to be able to assume that the build environment is a full virtualenv (something that I don’t think has been demonstrated yet, but is certainly possible) then let’s mandate that. Otherwise, the existing guarantees are pretty clear and well-defined, so let’s stick with those, and backends can do what they need to to work with what the spec provides.

As @njs says, using the sysconfig “scripts” directory seems like it’s a fragile approach anyway. It looks like it works for your situation, but I doubt it’s quite as reliable as a general technique.

I understand that’s a big change, but I quite like the idea and am already doing that in Hatch v1 (unreleased) for all builds.

1 Like

For what’s worth the build tool and tox also do the same… Full virtual environment creation all the time.

Just to be clear, my comments do not constitute support from pip for using a virtual environment. I’m pretty sure that pip has changed the environment build code for performance reasons, so switching to a venv would be a regression there. Plus, we’re not likely to bundle virtualenv, so we’d be relying on the stdlib venv, and having pip break on Debian because they make venv a separate install is likely to be a showstopper (much as I dislike letting Debian’s non-standard policies dictate our decisions).

But this is not my area of expertise with pip, so I’ll let one of the other pip maintainers clarify further.

Ah, sorry! I misunderstood your point.

A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies

Well, one could argue that pip’s implementation is not a fully isolated environment, as it does not handle the scripts directory properly. Though, it is not explicit.

I’d treat this as a pip issue and ask the upstream if it could be improved.

I personally do not think it’s worth the trouble to write a further PEP updating the recommendations, as they are only recommendations anyway, but if you think it’s worth it, go for it :+1:

IIRC I’m the one who wrote that language. I think I was just trying to thread the needle between providing useful guarantees to build backends without overly constraining build frontends? If each tool gets to choose how to set up the environment, it leaves more room for experimentation, optimization (venvs are definitely not the most efficient way to set up a temporary environment!), workarounds (like pip not wanting to depend on virtualenv), etc. If build backends need more guarantees, then so be it, but I do think that flexibility has value.

We do support transitive console scripts, though, using the standard mechanism for finding console scripts: $PATH. I get that for your particular situation you don’t want to rely on $PATH because you’re concerned about other, potentially misconfigured environments that have broken $PATH setups. But it means that we already have multiple different ways to support this that do work in general with different trade-offs. (I guess you could also use importlib to look up the entrypoint and invoke it directly?)

So you’re not asking for console scripts to be supported; your asking for them to be supported in yet-another-way, that would also be broken in some cases (like my --user example), and has its own costs (like potentially making pip install slower for everyone). Maybe it’s worth it, but I’d like to see a more fleshed out argument that acknowledges those tradeoffs.

5 Likes

Surely this cannot be the standard. PATH is only set by the virtual environment activator. However, one case uses virtual environments without activating them, and one can use the OS python. In neither case using PATH to discover python scripts will be successful, :smiley: and you might find a different version than the one installed into the currently running Python executable.

The --user is a valid concern. Not sure what’s a good solution for it, but I consider it a niche use case. Whenever I’ve used --user it just caused issues (because it always conflicts eventually with globally installed packages), so personally I’d be happy to deprecate that flag and mark it not supported. 99% of the time people do global installs and use virtual environments, which is my main target here.

This is not true, pip can use the same caching logic virtualenv does to not incur the extra cost. Also, it doesn’t have to create a virtual environment, it just should patch syconfig.get_path to work as expected for scripts via a sitecustomize.py it already uses.

Where does that statistic come from? Every Python user I’ve asked has a use for user-level installs

1 Like