PEP 582 - Python local packages directory

njs · April 9, 2019, 9:19am

@steve.dower – My conclusion from all the discussion around PEP 582 is that there simply isn’t any way we can hack the core python executable to handle these workflow cases well, given all the constraints on how python has to work. OTOH, if you switch to the pipenv/poetry strategy of treating the python executable as a low-level detail, and the user always interacts with their project and the interpreter through some higher-level launcher, then we can provide a much better experience, with far fewer compromises. And it doesn’t matter how exactly the launcher sets up the environment, it could use venv, envvars, conda, whatever, you’re free to pick based on technical merit because the setup details are hidden from the user.

uranusjr · April 9, 2019, 11:56am

I wonder whether it would be a good idea to use a wheel-like naming system than simply using the Python version. When a package is imported, multiple places are checked to find the best match to import, like how pip finds a wheel on PyPI to download. This would make the import process much more complicated (and maybe slower?), but I think could be necessary.

Anecdotally, I also face a similar problem using Pipenv that it constantly tries to use the wrong venv when I switch between WSL and Win32 Python.

steve.dower · April 9, 2019, 12:59pm

Core CPython doesn’t know about wheel tags, which is why it would be based on the import suffix Nick and I each mentioned earlier. It doesn’t fully solve the problem though - go find my post where I talk about the frequency of getting ImportError.

This is because the paths are different, so it thinks you need a different venv. And, err, since you do actually need a different venv between the two Python runtimes you are using, this behaviour is correct.

Since WSL can launch Windows executables directly, you could make sure that you never use the Linux Python and it ought to be okay.

FWIW, provided you had pure-Python packages only (which is not uncommon, as odd as it seems to us), PEP 582 (without the version/tagged directory) would have prevented this confusion for you.

ncoghlan · April 10, 2019, 1:02pm

I’m more optimistic about the potential benefits of auto-activation myself, but I do think that to be successful any core support needs to be designed around the notion that folks will expect to be able to apply their existing application dependency management techniques to the new “auto activated” layout. That should only require a few new pieces:

an appropriately scoped relative name for the auto-activated path to make it easy for interpreters to do the right thing when multiple interpreters are used with the same project directory, or if a project directory is shared across different platforms by way of a USB key or network drive (otherwise it will be too easy for folks to get themselves into trouble they don’t know how to debug). (This is similar to PEP 425 compatibility tags in some respects, but simpler in one critical way: we don’t really care if the tag is excessively specific, since we’re not trying to define a broadly shareable distribution format, we’re only aiming for “Sharing a project directory between platforms and implementations should not result in a broken auto-activated venv”)
yet more complication of the sys.path[0] calculation to account for environment auto-activation (i.e. moving the venv calculations far earlier in the interpreter startup sequence, rather than leaving them until site activation)
a blunt-instrument override of the sys.path[0] calculation to handle any cases that the auto-activation can’t cope with (i.e. the suggested --mainpath PATH/--nomainpath interpreter options)

(For anyone inclined to say “But pipenv and poetry are super new, why should we care about building new interpreter level features around their needs?”, it’s critical to remember that while those particular projects are still relatively new, the workflows they support have evolved over the last several years, in other projects like virtualenvwrapper, pew, etc. The new tools simply aim to wrap up existing practices in ways that make them more approachable to new developers looking to adopt those techniques for their own projects. By aiming to make transitioning to and from this pair of tools easy, we’ll also be making things easier for everyone else still using their own homegrown approaches to project dependency management)

So my concern with the PEP as currently structured is the fact that it currently defines a new installation layout that differs from the one generated by venv for no good reason:

[ncoghlan@localhost ~]$ python3 -m venv --without-pip /tmp/example_venv
[ncoghlan@localhost ~]$ tree /tmp/example_venv/
/tmp/example_venv/
├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── python -> python3
│   └── python3 -> /usr/bin/python3
├── include
├── lib
│   └── python3.7
│       └── site-packages
├── lib64 -> lib
└── pyvenv.cfg

6 directories, 6 files

If PEP 582 instead said that the scoped auto-activated subtrees were each just normal virtual environments, and focused solely on when and how interpreters should auto-activate them, then the adaptation to existing tools would be something on the order of ln -s __pyvenv__/[TAG_FOR_DEFAULT_RUNTIME]/ .venv, such that the in-tree default environment was an alias for a particular auto-activated venv.

All the complicated installer side questions then acquire an easy answer “It’s a venv, just like any other. It’s just in a special place where newer interpreter runtimes will auto-activate it if no other environment is active, and where directory and zipfile execution with those newer runtimes will always activate it”.

Edit: thinking about that a bit further, a few potentially curly runtime questions acquire simpler “It’s a venv” style answers as well (like what the various site methods should return at runtime).

steve.dower · April 10, 2019, 1:25pm

The inconsistency of venv is a good reason to define something On Windows the layout is basically just “Lib, Scripts, pyvenv.cfg”, and Scripts contains special executables that are better avoided if possible.

Think of this more like __pypackages__ == site-packages.

To make this a more concrete example, say you have numpy in a non-specific subdirectory. When you move this to a different OS, you get an ImportError when the tagged shared libraries aren’t found.

Compare this to an interpreter-specific subdirectory, which when moved to another OS gives you… ImportError.

Now do the same with Django or requests. In the non-specific directory, they work fine. In your proposal, you have a broken project, which is the thing you’re trying to avoid.

Either way, the failure case is identical and the fix is identical (install more packages). But by trying to fix this preemptively, you don’t allow tooling or packagers any way to solve it themselves.

For example, someone may just take multiple builds of numpy and merge the extension modules into the one directory. Now they have a cross-platform cross-version package that will work. Maybe this could be automated one day? Maybe this could be the default one day? I don’t know. But assuming up front that only we can resolve this and we have to do it now will prevent any possibility of these kinds of developments.

The critical difference from a venv is that a venv is tightly tied to a specific installation of Python. There is no need for that here - we’re just talking about Python libraries, which for the most part are not tied to a specific install, and those that are don’t need to be babied. They’re built by smart people who come up with great ideas frequently. We should set this up so they have ways and opportunities to improve their users experience themselves, not by having to come beg from the core runtime.

So I stand by the untagged install directory, and I’m back to not even having the version number that’s mentioned in the current PEP text. The failure case and the fix is the same in both scenarios, but by complicating it now we will actively harm future innovation in this space.

ncoghlan · April 10, 2019, 1:32pm

I know, and that’s the model I think needs to be rejected, because it’s the model that creates all the unanswered questions on the installer side (and a few on the runtime side). It’s a completely unnecessary, self-inflicted, complication in the PEP, and it should be dropped from the proposal.

njs · April 10, 2019, 1:34pm

My question is, why should we care about building new interpreter-level features around their needs, when they aren’t asking for new features, and indeed already have a solution that works better than this proposal? Auto-activation is always going to have quirky limitations (never put your scripts in a subdirectory! don’t use tools like ipython/black/pytest/pylint!), won’t support as broad a range of python versions, and isn’t appropriate as a foundation for non-toy workflow tools (since those all center around some kind of pin metadata, not just a raw environment on disk – this also means you can’t upgrade from a pure PEP 582 setup to something like pipenv/poetry, even if the environment layout is compatible.)

What problem are you trying to solve here?

steve.dower · April 10, 2019, 1:50pm

The installers want to answer those questions for themselves.

The only questions you’ve asked about the runtime have been irrelevant (this just appends to PYTHONPATH, which has well understood semantics).

It’s the core of the proposal - make libraries just a directory of code, rather than an executable thing with boilerplate in it. It’s not an “unnecessary, self-inflicted, complication” - that’s how most people describe venv when they first encounter it!

steve.dower · April 10, 2019, 1:55pm

I think these are unfair representations (while being technically accurate statements).

If certain tools don’t work with a workflow, that doesn’t mean the workflow has limitations, it means you need to use the tools correctly.

If something new doesn’t work with old versions (which is totally normal, IIRC?) then it’s a reason to upgrade. (And all you have to do is set PYTHONPATH to the directory to use it with older versions, so not even as hard as creating a venv.)

There are plenty of non-toy workflows that could use this. Every existing workflow has been built up to work around limitations of what already exists, which is why they all work in certain ways. New workflows will be enabled by this, and tools will either adapt to support it or new tools will rise up.

ncoghlan · April 10, 2019, 1:55pm

No, I’m not trying to avoid ImportError. I’m trying to make it so that multiple interpreters across different platform can share the proposed auto-activation directory without stepping on each other’s toes.

Say I have a USB key that I share between multiple Linux and Windows machines. If I’m running on Windows, I’ll be in the Windows venv, and have the Windows version of everything. If I’m running on 64-bit Linux, I’ll be in the 64-bit Linux venv, and have the 64-bit Linux version of everything. If I’m running on 32-bit Linux, I’ll be in the 32-bit Linux venv, and have the 32-bit Linux version of everything.

It’s less efficient than it could be (since I’ll have multiple copies of dependencies that could theoretically be shared), but the purpose isn’t to optimise for anything, it’s to ensure the cleanest possible failure modes, and to avoid the creation of franken-environments that are impossible for either end users or open source project maintainers to debug.

Having the auto-activated environments be extremely specific doesn’t prevent anything because virtual environments and the import system already support arbitrary environment chaining. It’s exposed directly as a feature in pew: GitHub - pew-org/pew: A tool to manage multiple virtual environments written in pure python

So if someone comes up with a super-clever cross-platform cross-version package, then installation tool developers can agree on a scheme for installing shared cross-environment dependencies somewhere inside the new directory, and interpreter implementations don’t need to care, since it would be the responsibility of the tools creating the virtual environments to correctly cross-link them to the shared one. But in the meantime, per-interpreter environments would be cleanly separated by default, avoiding one of the larger pain points we currently encounter with the “one blessed target environment per project” model in pipenv.

PEP 582, as currently written, is at cross-purposes with the past experience and the future direction of the Python Packaging Authority. It can be brought into agreement with those, and I don’t think the changes I’ve proposed to do so are especially radical ones. But I unfortunately wouldn’t be able to support it if it were presented in its current form (which would be a shame, since I like the idea in principle - it’s only some of the technical details I’m suggesting be revised since it’s currently making things more complicated than they need to be by attempting to wish complexity away, rather than acknowledging the complexity, accepting it, and dealing with it by leaning on established tools and techniques).

steve.dower · April 10, 2019, 1:57pm

Yeah, the text is still a very early draft. We haven’t presented it anywhere yet, not even for feedback (except to a few people directly). This thread was started by someone else.

steve.dower · April 10, 2019, 1:58pm

I’d love to hear more about this. Maybe you could jump into the myriad of threads on this topic and share?

steve.dower · April 10, 2019, 2:06pm

Also known as a pure-Python package? Which have historically been the default and platform-specific ones the exceptions.

We already have an agreed way to separate the interpreter-specific parts using import tags. Why do we need to now separate the non-interpreter specific parts as well? (Maybe we could support import tags on packages like on extension modules and separate at that point?)

Anyone shipping different pure Python libraries for platforms or interpreters rather than having it switch at runtime is already doing something we don’t agree with. Why do we need to support that?

pf_moore · April 10, 2019, 2:13pm

Umm… I think you’re basing a lot on whatever @dstufft might have said when the original design was being discussed. For myself, I most definitely don’t want to have to work out how pip should handle all of this stuff with no guidance from the PEP, and with no opportunity to influence the design of the feature based on what installers require.

And if you’re going to say that we do have the chance to influence the design, then I’d say that for pip I’d expect that Nick’s “make it look like a virtual environment” proposal would be a lot easier for pip to handle. So count that as my attempt to influence the design, if you will

Furthermore, I’m still not convinced that the whole feature is sufficiently workable in practice (all the questions about how it will break if you change directory, how will script wrappers work, etc) that I’d commit to implementing support for it in pip - and without installer support, I feel like it’ll be way too hard to use for the target audience of relatively new users. Zipped applications and basic executable directories have seen limited popularity, for basically the same reason - no installer support. Tools like shiv might be making inroads into that, but do we really want another distribution approach with insufficient tooling support?

So put me down as being strongly in favour of the PEP clearly specifying how installers should work with the new layout, and not leaving it to the tools to work out for themselves.

ncoghlan · April 10, 2019, 2:21pm

This is one of the big reasons I’m in favour of making these behave just like an in-tree venv and incorporating the --mainpath PATH option into the proposal. That way, even if the auto-activation only works implicitly from the project’s base directory, there’d be two ways to resolve the issue:

explicitly activate the virtual environment, taking implicit activation out of the question entirely
pass --mainpath ../../ (adjust as needed) in order to tell the interpreter where to look for the venvs to auto-activate

njs · April 10, 2019, 2:24pm

BTW, an interesting technical wrinkle you might not be aware of: you actually can’t share .pyc files between different systems, even if the bytecode is compatible, because .pyc files bake in the absolute path to the .py file, and having the wrong path breaks things like pytest and tracebacks. (I noticed this sharing a pure-Python package directory between Linux and Windows – Python was happy to re-use the Linux .pycs on Windows and vice-versa, but then stuff blew up.)

A nice feature in an environment-managing workflow tool would be for it to automagically detect when the environment had been relocated and fix things up as appropriate, before running anything in the environment. Probably not the most important feature, but nice. Not really doable in the auto-activation approach though.

I think the fundamental disagreement between us is that I think we need to figure out what workflows we want, and then design the best tools to enable those workflows, while you prefer a more bottom up approach where we invent lots of cool tools and then leave users to figure out how to cobble them together into something that works for them. Of course you always want to leave room for people to adapt and innovate, but IMO the biggest complaints from users all come down to how our tools are all mechanism, no policy. That’s why virtualenv is confusing – you have to understand the whole model of how it works before you can do anything with it. But I don’t see any evidence that PEP 582 will actually be simpler in practice, because it’s also all mechanism, no policy.

If users have to know about these limitations, and these workarounds, then IMO the proposal has failed. The only reason we’re even talking about this is the hope that it’s simpler to use than virtualenvs.

pf_moore · April 10, 2019, 2:26pm

I don’t know what Nick had in mind with this comment, but I know what I would consider to be the key point of the future plans for the PyPA, which is to strongly focus on interoperability standards that allow tools to build from a solid foundation up.

In that context, we don’t yet have a standard for “the layout of a Python environment”, but “Like a virtual environment (or a normal Python installation)” is the current de facto standard. PEP 582 is essentially trying to modify that baseline¹, but it’s doing so not by starting with “let’s standardise what we have, and then extend it for a new use case”, but by simply adding more complexity onto an already complex, non-standardised structure. That would of necessity result in adding yet more adhoc, implementation-specific code to tools like pip, making the eventual job of having to standardise the layout even harder.

So, from a “PyPA direction” point of view, I’d request that this PEP be replaced by one that first standardises the current “installed package environment” layout (something like the never-completed PEP 262), and only then extends it with a new local packages structure, if needed.

¹ You’re trying to modify the existing layout precisely because you need tool support from pip etc. If you were to say that it was a purely runtime layout, and users would have to manually copy files into the __pypackages__ directory, then I’d be OK with ignoring this proposal (except to vote against it because I think it’ll die without tool support )

pf_moore · April 10, 2019, 2:30pm

Agreed (sorry for the low-content post, but there’s no way to “like” a part of a post in Discourse).

Not enough guidance on “best practice” (i.e., too much mechanism, not enough policy) is one of the major complaints we see in the packaging environment, and it’s why “opinionated guides” are often so popular.

ncoghlan · April 10, 2019, 2:31pm

Because environment markers allow the dependency resolution tree to arbitrarily change between platforms and Python versions, and there’s no guarantee that all projects will work correctly if platform-specific dependencies for other platforms are present in a franken-environment.

As a simple example: if a library is able to successfully import pywin32, they may interpret that as sufficient indication that they should use their Windows-specific paths without checking the running platform directly. Now, it could easily be said that such a project has a bug, and they should fix their bug, but the fact is that with out-of-tree virtual environments, system wide installations, and per-user installations, doing something like that will probably work fine in practice, since projects typically aren’t even going to try to install pywin32 as a dependency when running anywhere other than Windows.

The way that virtual environments deal with this problem is to associate each virtual environment with a specific interpreter, and then avoid sharing them between interpreters, and only copy virtual environments between compatible platforms. I’d like PEP 582 to adopt those same recommended practices.

steve.dower · April 10, 2019, 2:34pm

I’m going to use post 100 in this thread to withdraw my efforts for this proposal (I’ll leave it to the other authors to withdraw the PEP itself if they want to).

Clearly the packaging status quo is too entrenched to ever shift, so I’ll spend my time on more fruitful ideas.