Seems the PEP does only mandate for build environments that the PATH must contain an entry to the build dependencies scripts location.
This specification is sub-optimal for two reasons:
at no point mandates that this entry must be the first entry on the PATH (so potentially a shutil.which can pick up a different version than what the build frontend mandates, if an earlier entry on PATH contains that executable)
sysconfig.get_path('scripts') contract is now broken. This path should link to where the scripts for the python environment are installed, but this still returns the original caller python, not the isolated build environments scripts folder.
I think we should amend the PEP to fix both these issues.
Procedurally, this should be raised as a new PEP proposing a change to the spec. As the build backend hook spec isnât currently documented under PyPA specifications â Python Packaging User Guide, that PEP should also cover moving the spec to go under that page, and make the new location canonical.
See here for the process details, and note that the PEP process doesnât allow for changes to PEPs once they have been marked Final.
(Disclaimer: I would completely agree if someone were to say that the above process is too much overhead for a relatively small change to the spec. I have no problem if someone is interested in starting a PyPA governance discussion to change the process, Iâm only describing the current process as I understand it).
Thanks, @pf_moore, but for now Iâm interested in consensus before I end up writing a PEP that gets rejected. Until we agree this should happen thereâs no point in arguing over how it should happen from a governance POV.
No, this canât be fixed as that would prevent the backends from using native commands. A very simple use-case would be invoking gcc.
The isolated environment path should come in first though, which seems to happen in your example. If the package/build-system is correctly designed, it will not try to use scripts from the source interpreter, as all of the ones it tries to use should be shadowed be the dependencies installed on the environment.
In my backend I want to use a tool that uses sysconfig to invoke its dependencies. In a normal virtual environment case to get the script of a dependency you use sysconfig. Why should this not work within build backends? And if isolated build environments violate this contract perhaps we should provide some mechanism to detect such environments.
Just make the changes you are requesting and try building a native module with setuptools or something like that.
What do you mean exactly? If you want to limit yourself to the sysconfig scripts, you can do os.path.join(sysconfig.get_path('scripts'), 'my-script'). Perhaps a helper, sysconfig.get_script could be added?
I am fine changing the spec to require the build-system to only invoke python entrypoints from the scripts folder, and allowing to invoke native dependencies from outside.
Since the target interpreter is run on a subprocess, we could possibly intercept subprocess invocations and bail out if it tries to reach for an external entrypoint script. I think this could be implemented with LD_PRELOAD on Linux. But this enforcement should not be mandatory, it would be provided where possible, and when runners want to.
Currently, as PEP-517 defines and as pip implements, this returns the path within the target Python interpreter and the path in the isolated build environment scripts path. Because the script is part of a build dependency invoking that path will fail.
Can you give a more precise description of the problem? Iâm not sure I understand what the issue is here that youâre trying to solve (apart from the theoretical point about what sysconfig is documented to provide).
What is that tool exactly, and what dependencies is it invoking?
I donât have specifics because most of the code is private. So youâll need to do with hypothetical.
Imagine someone wraps Code Generation - pydantic to customize the generation of python classes during package build from JSON schemas. This means the wrapper tool depends on datamodel_code_generator. It can access the datamodel_code_generator via the console script because many tools (pip included) do not offer direct module access. The canonical way for a package to use the console script of another package is to do os.path.join(sysconfig.get_path(âscriptsâ), âmy-scriptâ). However, that code will fail because sysconfig.get_path(âscriptsâ) is actually not returning the location of the isolated build environments script folder. And it cannot really on PATH because the tool also needs to work when invoked explicitly from the CLI, in which case the user controls the PATH.
A possible workaround for your case would be to invoke it as [sys.executable, "-m", "wrapper_tool"].
Maybe thatâs actually the Right Answer as well? There are plenty of everyday python deployments where scripts arenât necessarily all in one place. E.g., if youâre using a distro python without a venv, and use pip install --user, then youâll have some script entry point in the distroâs bin path, and some in the userâs bin path. I always understood sysconfig.get_path('scripts') to be a hint for installers where to put scripts, not a rule saying that you can never put scripts anywhere else.
If you donât trust $PATH because you want to find the script correctly even with the user has messed up their $PATH, then it sounds like that means what you really want is to make sure that the script you find matches the one youâd find if you did an import. And sys.executable -m is a simple/elegant way to get exactly that.
Why not just invoke the tool using python -m? That will work, surely? Iâve never heard anyone claim before that the canonical way to invoke a script from a dependency is via sysconfig the way you suggest.
My concern here is that if we tighten the constraints on the build environment, we get ever closer to mandating that tools have to use a full virtualenv. I assume that we had good reasons at the time for not mandating a virtual environment gets used (I donât recall what those were, though, so that is just an assumptionš). But if weâre going to continue to discover that backends are making assumptions about the environment that arenât guaranteed by the current PEP, and our response is to add extra guarantees to the spec, rather than to say that the backend canât make that assumption, then I think we should just bite the bullet and say that we do require a full virtualenv.
š One thing that comes to mind is that we donât want frontends to have to bundle virtualenv. But if weâre OK to ignore Python 2 now, that concern isnât valid any more as frontends can use the stdlib venv².
² Assuming we donât care about installations like debian where venv isnât supported in a minimal installation.
-m is not console script. Many tools provide console script entrypoints without -m. I know how I can change the tools to make it work, however I donât think the answer here should be that we donât support transitive console scripts.
OK, in which case my opinion is that either we leave the PEP as it stands, or we mandate a full virtualenv.
I really donât like the idea of incrementally adding constraints here. I canât see enough people being interested enough to work through the implications to get a meaningful consensus. And I have no appetite at all for the possibility of having to do that repeatedly. If backends expect to be able to assume that the build environment is a full virtualenv (something that I donât think has been demonstrated yet, but is certainly possible) then letâs mandate that. Otherwise, the existing guarantees are pretty clear and well-defined, so letâs stick with those, and backends can do what they need to to work with what the spec provides.
As @njs says, using the sysconfig âscriptsâ directory seems like itâs a fragile approach anyway. It looks like it works for your situation, but I doubt itâs quite as reliable as a general technique.
Just to be clear, my comments do not constitute support from pip for using a virtual environment. Iâm pretty sure that pip has changed the environment build code for performance reasons, so switching to a venv would be a regression there. Plus, weâre not likely to bundle virtualenv, so weâd be relying on the stdlib venv, and having pip break on Debian because they make venv a separate install is likely to be a showstopper (much as I dislike letting Debianâs non-standard policies dictate our decisions).
But this is not my area of expertise with pip, so Iâll let one of the other pip maintainers clarify further.
A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies
Well, one could argue that pipâs implementation is not a fully isolated environment, as it does not handle the scripts directory properly. Though, it is not explicit.
Iâd treat this as a pip issue and ask the upstream if it could be improved.
I personally do not think itâs worth the trouble to write a further PEP updating the recommendations, as they are only recommendations anyway, but if you think itâs worth it, go for it
IIRC Iâm the one who wrote that language. I think I was just trying to thread the needle between providing useful guarantees to build backends without overly constraining build frontends? If each tool gets to choose how to set up the environment, it leaves more room for experimentation, optimization (venvs are definitely not the most efficient way to set up a temporary environment!), workarounds (like pip not wanting to depend on virtualenv), etc. If build backends need more guarantees, then so be it, but I do think that flexibility has value.
We do support transitive console scripts, though, using the standard mechanism for finding console scripts: $PATH. I get that for your particular situation you donât want to rely on $PATH because youâre concerned about other, potentially misconfigured environments that have broken $PATH setups. But it means that we already have multiple different ways to support this that do work in general with different trade-offs. (I guess you could also use importlib to look up the entrypoint and invoke it directly?)
So youâre not asking for console scripts to be supported; your asking for them to be supported in yet-another-way, that would also be broken in some cases (like my --user example), and has its own costs (like potentially making pip install slower for everyone). Maybe itâs worth it, but Iâd like to see a more fleshed out argument that acknowledges those tradeoffs.
Surely this cannot be the standard. PATH is only set by the virtual environment activator. However, one case uses virtual environments without activating them, and one can use the OS python. In neither case using PATH to discover python scripts will be successful, and you might find a different version than the one installed into the currently running Python executable.
The --user is a valid concern. Not sure whatâs a good solution for it, but I consider it a niche use case. Whenever Iâve used --user it just caused issues (because it always conflicts eventually with globally installed packages), so personally Iâd be happy to deprecate that flag and mark it not supported. 99% of the time people do global installs and use virtual environments, which is my main target here.
This is not true, pip can use the same caching logic virtualenv does to not incur the extra cost. Also, it doesnât have to create a virtual environment, it just should patch syconfig.get_path to work as expected for scripts via a sitecustomize.py it already uses.