PEP 582 - Python local packages directory

cs01 · March 1, 2019, 1:52am

I came across this PEP a few weeks ago and thought it was really interesting. I wrote a blog post about it and a proof of concept called pythonloc. Overall I like the idea, but I think some minor implementation changes can have big changes on how useful and adopted this feature becomes. I wanted to share some of my feedback here.

Falling back to `site-packages` reduces usefulness of `pypackages`

If I am developing an app or library and I want to test it, I need to know exactly which packages are available. For example, if I blow away __pypackages__, then run pip install . and try to run my code, I cannot guarantee all my dependencies came from that pip install .; some could have been in the fallback location of site-packages.

So because I am not 100% certain which libraries are being used, I basically can’t use it for development. Almost everyone, including tools like Pipenv and poetry, will still have to use virtual environments to guarantee deterministic package resolution.

I suggest only searching __pypackages__ if that directory is found. If that is done, then many more people can stop using virtual environments if they choose to.

Namespacing of packages in `pypackages` does not include OS

One of the really nice things about this is that the directory structure can be copied directly and run on different machines. However, if code gets copied from a windows machine to linux server and attempted to run, it may fail (with cryptic error messages) due to OS differences. I suggest namespacing on Python version as well as os, such as __pypackages__/windows/3.6/....

Running binaries or “scripts”

When installing a package with entry points, they get buried somewhere in __pypackages__. The node/JavaScript world has a similar problem, and they solved it with a tool called npx which

executes either from a local node_modules/.bin, or from a central cache, installing any packages needed in order for to run.

I updated pipx to work similarly (i.e. pipx run flake8, or to only search __pypackages__ and not a temporary installation, pipx run --pypackages flake8). pipx doesn’t have to be the only solution, but a simple program that can determine the expected local bin/ dir can fill this function. For example, poetry or Pipenv could have this functionality added.

Installing/Uninstalling from `pypackages`

The PEP is a little hand-wavey on how this will work and mentions pip adapting to it.

After doing a fresh check out the source code, a tool like pip can be used to install the required dependencies directly into this directory.

In another example scenario, a trainer of a Python class can say “Today we are going to learn how to use Twisted! To start, please checkout our example project, go to that directory, and then run python3 -m pip install twisted.”

In theory, this sounds great. pip will be modified to install to __pypackages__ (by default?). And if a user wants to install to site-packages or their user dir, they can use appropriate flags. However, this is a big change in behavior for pip. Are pip maintainers on board with this? The adoption of the __pypackages__ convention depends not only on Python running from that directory but on how easy it is to manipulate packages in it too.

Freezability

Along the the lines of the last section, being able to create a lockfile of everything in __pypackages__ would be incredibly useful to the community. I’m not sure this is possible with current tooling, but this use case should definitely be considered.

pf_moore · March 1, 2019, 9:27am

(BTW, there’s a typo in your subject - it’s PEP 582, not 528)

Personally, no. I like the idea in principle, but in practice, as you say, it seems like a pretty major change in behaviour and something I’d expect to be thrashed out in far more detail before assuming it’ll “just happen”.

Having said that, @dstufft is one of the authors, so I assume he was on board with the assumptions being made here, and can probably clarify better than I can.

brettcannon · March 1, 2019, 7:56pm

PEP 582 for a convenient link.
@kushaldas @steve.dower @dstufft @ncoghlan so they know about this topic.

When the idea of this PEP was being discussed at the 2018 dev sprints this was a contentious topic (this is more of an FYI as I fell on the "ignore site-packages side ).

But how far do you take this? For instance, that doesn’t cover OS version like wheels do, so do you actually namespace it by the platform tag of the most strict package that you installed? And if you do that then you have now made wheel tags part of the Python’s stdlib which they currently are not so they can evolve independent of CPython’s release cycle.

So I personally disagree with making this a deployment solution.

steve.dower · March 1, 2019, 8:22pm

Unsurprisingly, I have a lot to say, more than I’m going to type on my phone

The idea with site-packages was that pip would ignore it for the purposes of dependency resolution, but Python would allow importing from it even when local packages are found (in large part to avoid having to install pip yet again).

I don’t like characterising this as “node_modules for Python” as people immediately assume heavily nested dependencies. This is far more like venv without activate scripts.

Donald specifically requested we not specify pip changes in the PEP, so we will need him to chime in with his thoughts. But there is as assumption that other tools will need changes for this to all work properly - ecosystem-wide problems can’t be solved unilaterally (except by inventing a new ecosystem).

bernatgabor · March 2, 2019, 5:58pm

I’m not sure I love the whole concept of this PEP. My main concern it would help in a very-very small subset of problems, and once one tries to generalize and gets to something like:

project
├─ __pypackages__
├─ a
│   └─ __pypackages__
└─ b
    └── __pypackages__

It will be really hard to manage. Why do we need to activate envs? One can always just to

python -m venv env
env/bin/pip install whatever

The only benefit is saving up on the env/bin/pip part that can be easily hidden/automated behind countless shell aliases.

steve.dower · March 2, 2019, 6:37pm

One of those very very small subsets is teaching first-time programmers how to use libraries, or providing them with helper modules that they can use without having to see. If you also want to teach them “countless shell aliases” as a prerequisite, be my guest.

I don’t think it’s a coincidence that every single teacher who has seen this idea absolutely loves it.

bernatgabor · March 2, 2019, 6:53pm

I’m not debating that this is great for newcomers. However, feels yet another way of doing things for sake of newcomers only. My doubt is more along the lines of do we need first class support for the interpreter to achieve this? Or is something we can do with a tool that reuses existing mechanism inside the interpreter (aka set pythonpath, shim commands to the pypackages folder if exists).

steve.dower · March 2, 2019, 7:00pm

For the most part, it can be done by telling students “python” is spelled “pythonloc” sometimes. Though making that available on arbitrary machines is its own challenge.

And for something so simple, why not put it in the core runtime? If there is no pypackages folder in the directory of the file you’re launching, it’s as if it just isn’t supported and you’re on your own, just like today. This doesn’t make venv redundant, it just raises the bar before you need it.

You may not have heard, but there are going to be literally millions of students starting with Python over the next year or two as China, France and other countries start mandating it in schools. Every point of friction for both teachers and students will make them regret that choice, so I would love to get ahead of the problems we already know about.

steve.dower · March 2, 2019, 7:04pm

Also, the schools aren’t going to get Python from the core team - they’ll get it from companies willing to support them. So we’ll see more hybrid not-quite-Python versions being used in order to fix or workaround things that are too complex. None of the feedback will make it to the core team - we’re seen as too hostile toward companies for them to ever consider that we might appreciate their feedback (or, gasp, contributions).

So Python becomes known to students by another name with another interface and they wonder why “real Python developers” have such complicated tools.

bernatgabor · March 2, 2019, 7:15pm

I suppose then it’s alright as long as we advertise it as something for beginners/one venv driven use cases.

Falling back to site-packages reduces usefulness of __pypackages__

Granted if we do not fallback we MUST install pip again. However, who creates this pypackages, not pip? If so shouldn’t be that hard to install pip on first use by matching the host version. I assume pip will have some --here flag that creates the folder, not? Then again if we must let’s include global site package, but do remove local site package. We aim it at students as I understand. In universities, users mostly can’t change global site-packages, but easily can install local packages. I would try to minimize the issues with having something installed in user site package causing issues.

Namespacing of packages in __pypackages__ does not include OS

Agreed the scope is not to create deploy-able packages. Furthermore many packages are platform only working (numpy e.g.) this dependency should be very much present in the folder structure.

Running binaries or “scripts”

But python already has -m, we should push using python -m flake8, not (instead of pipx or something similar)?

Installing/Uninstalling from __pypackages__

By default? I would say that’s out of out of question for pip. More likely a flag would be needed to turn on the pypackages thing, for backwards compatibility.

Freezability

I think what you’re describing is possible with current tooling. If we fallback through reproducibility is in question. Would be hard for tool to check if some stuff gets satisfied from global or local, especially stuff that’s runtime only required.

cs01 · March 5, 2019, 12:21pm

Agreed that only the most strict version makes sense to use, and that doing that isn’t a practical solution. So I revoke my suggestion

I understand, and I see your point about not having to re-install. But there are cases where it’s preferable to raise an error if a package isn’t found locally (and is installed in site-packages).

For example, I think a really desirable workflow is to run something like pip install pip.lock --here then run python app.py and be certain that it’s only looking in __pypackages__.

This would result in hermetic, repeatable package resolution and eliminate “runs on my machine” type of issues, which is currently a large use-case for Virtual Environments (and Pipenv/poetry).

Since not searching site-packages would be a (surprising) change from current behavior, maybe a flag could be added to python such as --pypackages, --here-only, etc. where Python only searches the current working dir and the appropriate __pypackages__ directory.

On that note, @bernatgabor listed out a sample directory tree with nested __pypackages__ folders. My understanding is that would not be the case, is that correct? Also, venv has the --system-site-packages option. Hopefully this PEP can have similar flexibility.

The -m only works for a single entry point, and that’s only if the developer included a __main__.py file. But there could be many, and there could be scripts installed by dependent packages. For example, jupyter installs no scripts of its own, but has many dependencies which install executable files to __pypackages__/3.6/lib/bin, for example:

easy_install             jsonschema               jupyter-migrate          jupyter-serverextension
easy_install-3.6         jupyter                  jupyter-nbconvert        jupyter-troubleshoot
iptest                   jupyter-bundlerextension jupyter-nbextension      jupyter-trust
iptest3                  jupyter-console          jupyter-notebook         pygmentize
ipython                  jupyter-kernel           jupyter-qtconsole
ipython3                 jupyter-kernelspec       jupyter-run

uranusjr · March 5, 2019, 12:55pm

This can be easily (IMHO) solved by sub modules, e.g. python -m jupyter.console. For Jupyter specifically, however, the real issue is (again, IMHO) neither site-packages nor __pypackages__ is a good way to deploy it (tieing back to the app deployment story topic).

pf_moore · March 5, 2019, 1:38pm

Agreed. Or using console scripts installed in a “proper” packages directory. Or using some other existing application deployment method.

If you’re installing something in __pypackages__, then I’d assume it’s in support of the Python script(s) in the directory that contains __pypackages__, and so you would not be interested in any executables that may be installed - your interface to the installed packages will be from your Python script using the package’s python API.

I don’t think it’s a big enough deal to want to explicitly remove the console scripts from the installation, but equally I think that anyone running a console script from __pypackages__ is doing something wrong…

steve.dower · March 5, 2019, 4:03pm

Yeah, the console scripts issue is probably the thorniest part, but simultaneously I’m becoming more convinced that the majority of incoming Python users do not want to be tied to the console and are only doing it because we force them into it. So as tools learn about -m, and packages learn to support it, it will become less of an issue.

And if console users have to run some special command to update their PATH to avoid typing “python -m” before their commands (maybe we can call it… activate ) but every other use case works without relying on environment variables, I think that’s a great place to end up.

steve.dower · March 5, 2019, 4:05pm

I thought Jupyter already had subcommands for these, and the long names were legacy? So it’s python -m jupyter console or python -m jupyter notebook.

Agreed that the real problem is that Jupyter is more of a self-contained app than a library though. Just pointing out a valid and generally well understood pattern that already exists for multiple entry points

brettcannon · March 5, 2019, 9:28pm

Correct.

Possibly not because the current solution is literally just a default sys.path entry when the interpreter runs while the latter adds more complication as it now becomes a configuration option somehow.

bernatgabor · March 5, 2019, 10:15pm

Why would that not be the case, what would stop users doing it?

steve.dower · March 5, 2019, 10:43pm

(I assume you mean what’s to stop people nesting __pypackages__ folders down the hierarchy?)

What stops them is the logic of how they work:

python /path/to/my/script.py

This adds /path/to/my/__pypackages__/<version> to sys.path if it exists, and then runs as normal. That’s it. There’s no logic anywhere to recursively search for them or to add it to any subpackages.

(Edit: Add the <version> part of the path, which I personally still don’t really care for, but since it seems more likely to prevent issues than to cause new ones I’ll concede the point.)

pradyunsg · March 9, 2019, 8:39pm

I’m on board for this idea. I’ve been on both sides of the classroom – learning and teaching – and this would be great.

My main concern currently is fleshing this out and how this would look on pip’s side of things. I think @dstufft has likely spent time thinking about (at least) the latter, so I look forward to hearing from him on this front.

cs01 · March 11, 2019, 4:19pm

What if one wants to run a specific version of pyptest/tox/black/flake8/mypy, etc on the code? You’d need to use a virtual environment. Is it wrong to not want to create one?

Running these entry points a desirable feature in a similar community; JavaScript/node has npx which they/I really love. The massively popular babel, among others, recommends its use in their docs. Whether you think it is a good idea or not for Python, it’s likely going to be one of the most brought up questions if PEP 582 is accepted.

PEP 582 was discussed on Python Bytes #117. They were really receptive to the idea, but first thing they mentioned as “missing” was the entry points. Another interesting thought they brought up was along the lines of “what if the content of __pypackages__ was the same as a virtual environment, and if you wanted, you could activate/deactivate it?”

What would they use instead?

I love console entry points (probably no surprise since I authored pipx). But they really do fill a need for many. In many ways, it’s the closest thing Python has to distributing an executable binary. I made pipx because I was trying to distribute a standalone application, and all my README instructions ended up being so cumbersome to non-python people: create a virtual environment, pip install, okay NOW you can run a simple command from the terminal.

Sort of a meta point that has been on my mind is the approach for making these changes. The packaging space is big and complex, with numerous use-cases. It seems like this discussion is revolving around personal preferences and intuition about user behavior. Seems like a (semi-)exhaustive list of workflows people need, from user to expert, laid out somewhere would be helpful (to me at least). We could then discuss which of those workflows and demographics should be addressed by this PEP. Right now I can’t really tell who we’re targeting or which workflows we’re targeting.

PEP 582 - Python local packages directory

Falling back to site-packages reduces usefulness of __pypackages__

Namespacing of packages in __pypackages__ does not include OS

Running binaries or “scripts”

Installing/Uninstalling from __pypackages__

Freezability

Falling back to `site-packages` reduces usefulness of `pypackages`

Namespacing of packages in `pypackages` does not include OS

Installing/Uninstalling from `pypackages`