PEP 582 - Python local packages directory

steve.dower · March 16, 2019, 1:21pm

That’s for the pip developers to figure out, but I’d expect if the directory is there they’d use it, and probably have an option to create it in cwd. And if your skeleton tools create it or git clone contains an empty one then it’ll be the default.

brettcannon · April 2, 2019, 6:30pm

2 posts were split to a new topic: Having the PSF reach out to sponsors more

ChrisBarker-NOAA · April 4, 2019, 11:02pm

A bit confused about how the threading works – this is in response to Steve Dower’s comment that every teacher he’s talked to loved the idea:

Except this one

My main issue is that as a philosophy, I don’t like to use methods to teach that are different than what students will need to use in real work.

And “Hiding” what’s going on is also a no-no for me. LIke it or not, foks will likely need to understand how python packages work, and probably virtual environments.

Making environemtn management easier in a way that makes it easier for everyone, in real production use, is great – making “another way to do it” that is mostly suitable for teaching – not so much.

ChrisBarker-NOAA · April 4, 2019, 11:11pm

This whole thing makes me nervous – it really seems to be shifting responsibility to the wrong place. Aside from the nervousness, I haven’t thought it out enough to see what real problems it might cause, but a few considerations:

Isolated environments are a really key development tool, but maybe not SO necessary for teaching, etc. I have had NO problems with having students getting started ignore virtualenv environments – no reason NOT to have twisted, and Django, and Flask, all installed in the main Python install. They do not need to manage multiple project, maybe in multiple versions, all with different dependencies
tying the environment into a particular working dir seems problematic. I can see how it would work well for, in particular, web frameworks, but it could be problematic for other projects that dont have such a clearly defined single working dir – and as a rule, I’m not sure the working dir SHOULD be where the code lives!
What about other Python systems / environment managers? Im a big Conda fan, and Im wondering how this is going to work with conda Python and conda environments – Im guessing it could get really ugly…

steve.dower · April 5, 2019, 5:03am

Me neither. I think this is at least as viable as venv for “real work” (the fundamental difference is that it’s based on the .py file you start running rather than environment variables). The cases it’s not a good fit for are when you have multiple venvs for a single project.

Sorry, are you trying to hide stuff from students or not?

It’s tied by the script that you launch. If you launch Python without a script file, then it uses the current working directory, just as it does today (except you also get packages that are installed in a specific subdirectory, as well as the actual cwd).

Great question! I should have chatted with the conda team today when I was hanging out with them, but we had other topics

In general though, conda is for managing complete environments. They do make an effort, but ultimately have so much going on that you need to modify environment variables just to launch the environment (whether Python, R, Java etc.) and so I expect they’ll continue to do that. They also put a complete copy of Python in the environment, which is a big hammer to solve the isolation problem with, but it’s also the most effective.

What the proposal here adds is the ability to have an isolated packages directory (isolated from others, rather than from system packages) without having to make a full copy of Python or set environment variables. The PEP does need clarification to reflect the things that aren’t totally clear, and hopefully we’ll get a chance to do that before PyCon, but it really is simpler than what we have today.

steve.dower · April 5, 2019, 5:06am

I wonder how many of the advanced concerns could be addressed by having a single (optional) env variable to specify the directory containing __pypackages__ (defaulting to the script directory if not set)?

Then you could still “activate” environments if you really want, and also use them from scripts outside the project, but we don’t have to symlink/copy Python binaries around, install many copies of pip or mess with PATH.

cjerdonek · April 5, 2019, 1:10pm

I have a PEP 582 question related to scripts in project subdirectories.

The main example workflow it discusses is a project directory foo with a __pypackages__ inside.

If you invoke the Python interpreter from (directly) inside the project directory or invoke a script myscript.py (directly) inside the directory, __pypackages__ will be found.

With projects though, it seems common to have scripts in directories other than the project root (e.g. scripts/myscript.py). For these cases, my understanding is that running the script wouldn’t find __pypackages__. If this is the behavior, it seems like it would limit the usefulness of the feature even in beginner scenarios. For example, as soon as someone moved a script into a subdirectory, it would stop working.

It seems like there are at least a couple possible solutions. One would be to use __pypackages__ in the current directory when running a script – also when the script is in a descendant directory.

Another would be to walk up the filesystem tree looking for __pypackages__ (just as the PEP describes for the os.py marker file), regardless of the current working directory. This possibility was mentioned a couple times above very briefly, but no one said why the behavior was rejected.

I don’t see these options discussed in the PEP, or the scenario of what to do with scripts in project subdirectories. It seems like it would add to the usefulness.

steve.dower · April 5, 2019, 6:53pm

They’re both rejected for security reasons, but that’s probably not clear in the current text (to be clear, we haven’t submitted this PEP for consideration yet, it was just found and brought up).

Searching up one level to allow for “Scripts” directories may be feasible. But we didn’t want to enable new ways to inject malicious packages by modifying other parts of the file system.

jeremychen · April 7, 2019, 12:25am

I’m sorry this is not very related to the technical details we are talking about here.

I am not sure if PEP 582 is the solution but I welcome any attempt to make Python more friendly to teachers and students. I don’t think we are a small subset of users. You don’t need to look at China. It’s happening in every classroom including Australia.

There is a reason why most school teachers use IDLE and Python Turtle because they just work out of box and I hope this convenience can be extended to other packages.

Why is it so hard?

I just spent several hours to help every student install Python on their laptop (Most windows but some Macs) this year. I plan to use either arcade or pygame (pgzero) but I can’t see how I am going to make it work without spending another several hours.

The problem is not because we are teaching beginners. That is not such a big deal. The real problem is that schools are usually required to have a firewall and there are a lot of legal restrictions when using IT.

Common issues:

Pip will not work normally.

I usually need to use --proxy if allowed. Then you have to show students how to work with the command line. I end up just doing it myself.

Internet is often slow.
I tested pip install arcade with – proxy and it did not work because it needed to download a lot of files (maybe I can look into time out to fix it). The bandwidth at school is usually shared by 500 to 1500 students and we are moving more and more subjects online. It is not a surprise the internet is slow when you need it.

We have a lot of students knowing how to download and install python. Also it’s available on Microsoft Store now so it’s even easier for students. However, I’m not sure why paths to python and pip are not added to PATH by default so I had to fix it one by one. This is crazy.

There are some success. Blender is very easy to install and I can use its python. Renpy is another easy to use tool (built with Pygame)–easy to install, update and run games.

Renpy https://www.renpy.org/latest.html

ironfroggy · April 7, 2019, 1:54am

I’d like to raise a potential edge case that might be a problem, and likely an increasingly common one: users with multiple installations of the same version of Python. This is actually a common setup for Windows users who use WSL, Microsoft’s Linux-on-Windows solution, as you could have both the Windows and Linux builds of a given Python version installed on the same machine. The currently implied support for multiple versions would not be able to separate these and could create problems if users pip install a Windows binary package through Powershell and then try to run a script in Bash from the same directory, causing the Linux version of Python to try to use Windows python packages.

I think the number of users with this kind of setup is only going to increase as time goes on.

I raised this issue on python-dev but someone pointed me here. I didn’t know there was also this forum for discussion. I don’t know if bringing it up here now was the right procedure or not, I’m just trying to find the right people to raise my concern to because I’m really worried about overlooked Windows users getting super confused if this moves forward!

steve.dower · April 7, 2019, 2:40am

Don’t worry about this, I’m all too aware that I don’t know macOS and Linux well enough to get it right there

ncoghlan · April 8, 2019, 12:50pm

There are enough different points being brought up here that I’m just going to make an omnibus post of my own

A caution on ‘python -m’ and the current working directory

The way “python -m” searches for scripts to run currently matches the way Windows searches for executables: it searches in the current directory first.

There’s a solid case to be made that it should instead behave more like *nix executable searches, and require a particular prefix to find local packages, and to ignore global packages when asked to run a local one. Specifically, just as you’d write ./some_local_script.py to run a Python script from the current directory, you’d also have to write python -m .some_local_module to run a module from that directory (there’d need to be a transitional deprecation period, but it’s an entirely feasible change to make, and would mean that python -m pip would never implicitly run code from the current directory, while python -m .myapp would never inadvertently run a system package).

(See https://bugs.python.org/issue33053#msg314192 for the context where this came up)

So PEP 582 needs to be designed in such a way that it still makes sense even in a world where python -m has been changed to work that way, rather than the way it works today. In particular, actually making this change would be specifically designed to eliminate the "pip.py in the current directory" problem, so if PEP 582 could reintroduce the problem, then that needs to be accounted for.

https://bugs.python.org/issue13475 is also worth reading, since it may make sense to include the --mainpath idea in PEP 582, as that would address the question of running scripts for a subdirectory while telling the interpreter to look for __pypackages__ somewhere else.

Replacing sys.path[0] may be better than supplementing it

If the PEP 582 packages directory exists, it could be a good idea to cut out the regular sys.path[0] calculation entirely. That way if you create a requests.py file to experiment with the requests module, it will just work, rather than your choice of script name keeping you from importing the real requests module.

Defining a pseudo-venv equivalent is problematic

Environment markers give projects a lot of flexibility in choosing which dependencies to install on which platforms. I believe the most commonly used ones to be:

Python version dependent markers (e.g. installing extra backport packages on old versions)
Operating system dependent markers (e.g. installing extra deps only on Windows)
CPU architecture dependent markers (e.g. an optional C accelerator module that is only precompiled for some architectures)
Python implementation dependent markers (e.g. an optional C accelerator module that is only used on CPython and not on PyPy)

Now, it may be that __pypackages__ isn’t intended to be portable between different operating systems and CPU architectures, in which case PEP 582 should be specifying a metadata file format that installers can use to give information about the environment marker values in effect when each subdirectory was last updated (pipenv already does something like this, by capturing the active environment marker information when generating a lock file), and which interpreters can potentially use to emit a warning if the __pypackages__ metadata for a given version doesn’t match the active runtime.

If we don’t do this, then we’re setting users up for some incredibly difficult debugging sessions when they try to share a __pypackages__ directory between different environments (whether by copying it, by putting it on a USB key, or by putting it on a shared network drive).

The “Yet another way to do it” concern is real

I’m really sympathetic to the goals of the PEP, and in particular the aspect where __pypackages__ is something an interpreter implementation picks up automatically at runtime, rather than requiring an external activation script.

At the same time, we already have a multitude of slightly different ways of people setting up virtual environments (or not) and very few tools or packages get fully tested across all of them:

global site-packages for the specific Python installation
per-user site-packages (potentially shared across Python implementations, but not versions)
“just a directory of packages” (whether bundled with something like zipapp or shiv or not)
virtualenv out-of-tree virtual environment
venv out-of-tree virtual environment
virtualenv in-tree virtual environment
venv in-tree virtual environment
pipx-style dedicated virtual environment (mostly similar to any other out of tree venv, but also with some aspects of a per-user site-packages install when it comes to entry points)

So I think the big alternative that PEP 582 needs to consider is the prospect of a __pyvenv__/<import_cache_tag>/ auto-activated virtual environment, as that shouldn’t require significant implementation work on the part of packaging and development tool authors - just a bit of additional naming logic along with their existing support for the “venv in-tree virtual environment” case. (And separating things out by import cache tag will help ensure that any Python installations using a particular implicit venv are sufficiently compatible with each other for that to work correctly)

In addition to creating less new development and testing work for installation and development tool authors, this approach also has the benefit of automatically defining how script installation will be handled (installed into the venv’s bin directory), and interoperating nicely with per-project venv managers like pipenv and poetry (both of which support having the managed venv inside the tree rather than outside it).

Whether or not to use the global site-packages is also covered, since virtual environments already have ways to specify that, and they also have access to *.pth files if you want to daisy chain environments together the way that pew does.

In other words, “It’s an in-tree venv, just with a special name that interpreters recognise automatically” means that the PEP genuinely doesn’t need to specify anything new on the installer side (rather than handwaving those details away as it does for the new tree layout), it only needs to specify which parts of the specially named in-tree venv directories interpreters will automatically pay attention to (in particular, their site-packages directories)

Doing things this way would also address the concern Chris Barker raised regarding potentially starting students down a path that doesn’t lead them smoothly into the world of full-fledged virtual environment management that professional Python developers are already using.

The fact that the internal layout of the venv is more complicated than the proposal in the PEP doesn’t need to be a big deal - both installers and interpreters already know how to manage venv layouts that match the expected layout of a full Python installation for a given platform, and we don’t expect anyone to be crafting these things by hand.

steve.dower · April 8, 2019, 2:56pm

I mean, this is basically what it specifies. The main difference being that you run the regular python binary and not something that might be a copy or a symlink or a redirect and may or may not be the same version as the one it’s about to use a standard library from.

The choice of subdirectory name isn’t that difficult (either we want people to have to match pip and python versions all the time, or get away with it most of the time - both cases result in ImportError, so it’s just a question of frequency).

Unfortunately, none of these suggestions handle the scripts case totally transparently, which is going to be the biggest pain. Again, it’s a matter of how frequently do users have to do something extra (e.g. Activate) to set up their system terminal properly. For venv, it’s all the time, while for this it would be only when necessary.

All the rest of the details fall out naturally, I believe. Then it’s just a matter of explaining it clearly enough that we don’t all have to take turns misunderstanding it in future

ncoghlan · April 8, 2019, 10:31pm

No, an in-treevenv isn’t what PEP 582 currently specifies at all - what it currently specifies is a variation on zipapp (where the dependencies are in a subdirectory rather than directly alongside the main module), which means it is incompatible with the way tools pipenv and poetry currently work, and hence doesn’t provide a smooth migration path in either direction without a non-trivial amount of additional work on the part of the tooling developers.

It also means that such a tree would only be usable with interpreters that supported the new auto-activation semantics - users wouldn’t have the flexibility to choose to instead use explicit activation or shebang-driven activation the way they would if the auto-activated layout was consistent with the layout of a full virtual environment.

By contrast, if it’s specified as a full import-cache-tag-separated virtual environment, then we not only significantly improve the interoperability with existing techniques, we also open the door to future enhancements in app bundling, whereby naive tools could make multi-interpreter bundles just by including the entire pyvenv subtree, while cleverer ones could either do a deduplication pass for common dependencies after the fact, or else use “editable” installs to incorporate a separate dependency vendoring directory without duplicating the contents.

steve.dower · April 8, 2019, 11:13pm

I don’t know if you’ve looked at the internals of venv recently, but it’s literally just an arbitrary name with a Lib folder in it and a script to set PATH to override python[.exe]. It doesn’t do any separation by version or platform unless you do it manually, and I already suggested we could allow a way to specify a custom name through an environment variable (PYTHONPATH would already work fine for this, but if you want a synonym for it then sure, whatever).

I don’t see any reason why we should optimize for minimal impact on recently released tools (are either of these out of beta yet?) rather than optimizing for the user scenarios. Since when has this ever been an argument? The tools will continue to work, and in fact will be able to make use of both approaches since this PEP extends sys.path without overriding it.

Multi-interpreter is a bad idea for this scenario. Just demand (or provide) the interpreter version that your application needs. (Or merge the trees where the extension modules use the import tag and you get this today without any changes.)

I wonder if you’ve got a specific scenario in mind that I haven’t considered? Maybe you could spell out something concrete or point to the examples of it in the wild?

njs · April 9, 2019, 12:00am

I appreciate all the work that Steve and Kushal and others have put into this proposal, and especially for identifying this pain point and sparking all the discussion about how to solve it.

But I’ve thought about this long and hard for many months now, and I think I should say more clearly than I have previously: I’m pretty much convinced that this whole approach is irredeemably flawed. If I was its BDFL-delegate, I would probably be marking PEP 582 “rejected” now, to free up oxygen for discussing better solutions.

The pipenv/poetry style of approaching this is just strictly better in every way, and the last thing Python packaging needs is even more ways to do it that only solve half the problem.

steve.dower · April 9, 2019, 3:01am

Just to be clear, you don’t think there’s a world where Python doesn’t rely on environment variables to locate installed packages? (Sorry for the double negative, but inverting them both doesn’t have the same meaning - “Python will always rely on env vars to locate packages”)

njs · April 9, 2019, 3:42am

I don’t understand the question. Does Python rely on environment variables to locate installed packages now? Is relying on environment variables to locate installed packages good or bad, and why?

steve.dower · April 9, 2019, 4:35am

Yep, right now it does, unless you launch it with a full path in argv[0] (default on Windows, and apparently not elsewhere). Which means you have to configure your terminal or process launcher in order to find your venv, and hence terminal management becomes an essential topic out of necessity. I jumped through quite a few hoops to make Windows work properly here recently (given new Store restrictions on loading DLLs from outside the app).

If we could do the same thing without environment “activation” or symlinks, and also without a well-known directory name, I’d be all for it. But of those three, I think the well-known directory name has every advantage other than not being implemented yet, particularly when you treat “people who are not yet familiar with shell variables/symlinks” as a significant target audience.

methane · April 9, 2019, 9:02am

I think python launcher is better approach.

Having multiple Python installation is (sadly) very common, even for new Python users.

Activating venv solved “which python command is used?” problem.
But PEP 582 doesn’t solve this problem.
User must remember “which python I used when I created this project” and use it correctly. It is too difficult.

Some users use pyenv (not pyvenv), some use direnv, some use pipenv, and some manually activate venv.
This is one of major confusions for users.

Python launcher which executes pip or python which have commands like pipenv will be best approach. For example (pyx is dummy name for the launcher):

$ pyx init --python=/path/to/python  # creates project local venv
$ pyx install requests  # runs __pypackages__/bin/pip install
$ pyx exec command # runs __pypackages/bin/command
etc...

With this tool, PEP 582 can be used even for Python 3.6.

I would like this launcher is implemented in fast language (C, C++, or Rust).
pipenv (implemented by Python) and pyenv (implemented in shell script) are so slow…