PEP 582 - Python local packages directory

h-vetinari · January 26, 2023, 4:07am

I don’t maintain conda, but @jezdez does (and has been hanging around here recently). I can also carry this to conda-forge/core for feedback, but I’ve gotten few-to-no responses in the past (example).

AFAICT, respecting stuff installed into __pypackages__ would be surmountable (like conda already respects existing pip-installed packages), but regarding this section of the spec:

I can almost guarantee that conda will never install anything into __pypackages__, but will keep installing into it’s respective environment $PREFIX (unless we get something like PEP704 / PyBi, i.e. a guaranteed location for a (named) environment root path that contains /lib, etc.).

PythonCHB · January 26, 2023, 7:49am

A couple notes:

I saw way back in this thread that every instructor of newbies consulted like this idea. Better change that to “most” – I’m an instructor of newbies, and I don’t like it. After all, there are already multiple ways to handle “environments” and dependencies – adding one more? How does that simplify things? I do agree that starting out with virtual environments before you get to anything else is NOT GOOD for newbies (I did try that once) – but simply using the base environment is perfectly fine for newbies – I get very far into my class before I ask them to install anything other than standard tools, like iPython and pytest. And they are not maintaining four applications, in production and development versions – they are working on a small amount of simple code – plain old pip install works absolutely fine.
I’m not a conda dev, but I am a heavy long time user, and I don’t think this is compatible with conda at all [*]. You could ignore it with conda, and it would probably work within a conda environment, but it would create a serious and confusing mess. That doesn’t mean that cPython shouldn’t do it, but it’ll only create more confusion – the conda world will need to say “don’t use __pypackages__” with a conda environment. And then all those nifty newbie friendly tutorials that do use it will create a lot of confusion. And what about other not-specifically-python package managers?

[*] Why can’t conda do this? Because conda is not a Python package manager – it is a general package manager. It manges Python itself, and any number of other libraries, etc. A conda environment is a self contained directory of everything you need, and it is generally stored in the conda dir tree, not anywhere else in among user’s code. So it would make no sense for conda to install anything inside a __pypackages__ dir, and using a __pypackages__ dir with a conda environment would get ugly because it would only work with a particular conda environment, so now you’ve got TWO collections of packages to keep in sync – yeach!

kushaldas · January 26, 2023, 7:50am

This part I will update to clarify the questions you asked. Any installer should not create the __pypackages__ directory, but instead only the directories inside as required by sysconfig.

Currently it is using default cpython sysconfig values. I don’t want to mention how the installers should install in that location (as @dstufft asked me to make sure not write anything to explain how pip or any other installer should work), it is outside of the PEP’s reach.

In the current implementation the virtual environment takes precedence, I will add this as explanation in the PEP.

pf_moore · January 26, 2023, 8:42am

I’m sorry, but I am a pip maintainer just as much as @dstufft, and even though you’ve used this argument a few times, I do want the PEP to cover at least some implementation details for installers.

In my view, you can’t simply demand in a PEP that installers implement something, without any indication of how. In this case, it’s not so much about the actual coding of the change, but rather about where the installer gets the data from to do the install. An installer can’t simply guess what it’s expected to do with data from the wheel, it needs to be given that information by the Python interpreter. So it needs to know how to get that data - i.e., how to call sysconfig correctly.

It might be relevant here that PDM does not implement PEP 582. I would say this is a direct consequence of the PEP not giving sufficiently clear details of how to determine the layout to use.

pf_moore · January 26, 2023, 8:48am

If that’s the case, the PEP needs to acknowledge this, as it fundamentally undermines the idea that the proposal will make things easier for beginners (many of whom will want to use conda).

If the PEP cannot say that all installers must implement this, where does that leave it? If it’s optional, and pip chooses not to implement the installer part as well as conda, is what remains of any use?

pradyunsg · January 26, 2023, 9:09am

I feel like we’re encroaching around the issue of “deciding the UX of our default tooling isn’t appropriate for our design documents” problem, which is an issue/nerve I also hit with PEP 704.

And, I would disagree strongly.

The layout has changed in the PEP (PEP 582: Updates to internal directory structure (#2750) · python/peps@949dc03 · GitHub) which is what made the two out of sync, not lack of clarity in the PEP.

agoose77 · January 26, 2023, 10:15am

I think this is in itself a problem. If we acknowledge that you can’t long-term just install things into e.g. ~/.local, then eventually users will need to learn about venvs, and how to clean up their local packages. This just kicks the problem down the road.

To be clear, I’m not saying that teaching them like this is wrong; at the end of the day there are only so many hours between sunrise and sunset. But, arguing against “another” environment approach in favour of not using them at all seems like a worse compromise. With PEP 582, one could very easily explain that everything gets installed there, and never mention the system packages or venvs; it’s a safer default.

pf_moore · January 26, 2023, 10:32am

Maybe so. The PEP is firmly stating that installers “must” install to __pypackages__ by default when present. That could indeed be construed as “deciding the UX”. I suspect @kushaldas might have reservations, though, as having __pypackages__ used by default is fairly much the point of this PEP at the moment.

On the other hand, what I’m concerned about is the PEP specifying enough core interpreter machinery to let clients (such as installers) implement the layout without needing to write their own implementation of the spec. That’s definitely not a UX matter IMO.

Whoops, sorry. I missed that detail. However what I was trying to say was that if the PEP said that installers can read the layout from sysconfig in such-and-such a way, PDM could have used sysconfig and been protected against such a change (…although because the PEP isn’t implemented in sysconfig yet, that’s not actually an option, but hopefully my point is still clear).

On reflection, this statementg comes across as more combative than I intended, and as such isn’t helpful. My apologies.

What I was trying to say was that this PEP reads a little like an uncomfortable mix of a core Python PEP (a description of a new feature that will be implemented - usually by the PEP author - in a specific CPython release) and a packaging interoperability standard (a requirement that all packaging tools work in a common way, to ensure that they work together).

Without the packaging standard side, the core feature is fine, but may lack sufficient value. I’m essentially neutral on whether the core feature goes in.

On the packaging side, though, @pradyunsg is right, this is getting terribly close to the question of whether standards should dictate the UX for individual tools, which is not something they have traditionally done. Furthermore, packaging standards don’t have the force of “you must do X” (there are usually multiple tools, each with volunteer maintainers and their own priorities), but rather “if you do X, you must do it this way”.

The packaging side of PEP 582 is difficult because:

It specifies behaviour which installer maintainers may well have opinions about, and the maintainers of a particular installer may not be in agreement (e.g., pip) or may even disagree with the proposal (e.g., conda). So consensus is hard because there’s no clear view of what “the pip maintainers think”^[1].
It leaves a lot of the work of designing the implementation details to the tool maintainers, and doesn’t even offer a firm level of support from the Python stdlib, in terms of interfaces and APIs guaranteed to be present. As a result, the implementation could be a significant effort, and could easily get lost behind other, simpler and equally high priority tasks (particularly in small volunteer teams).
It targets a specific Python version, which doesn’t necessarily match with the release schedules or the resource constraints of packaging tools.

If the PEP came with a reference implementation for pip, in the form of a PR, then a lot of the difficulties here could be bypassed - implementation resource is reduced to review and acceptance (or rejection!) of an existing PR, support of the PEP could be a matter of tool maintainers agreeing a consensus view for their tool “offline” as part of PR review, and timescales could be handled by targetting the PR at a particular release. But without a PR, there’s a lot riding on a fairly unspecified amount of work happening in a timely manner, and I think the PEP needs to be written to cover the possibility that this simply doesn’t happen.

But that doesn’t answer my question. And given the responses that have come up from people familiar with conda, the PEP really should note that conda is unlikely to implement this PEP, and address how the arguments change given that (for example) all three of the bullet points in the motivation section will remain unsolved if we accept that people could turn up for a course with either conda or the standard distribution of Python installed.

Maybe the simple answer here is to drop the whole installer side of the PEP and simply concentrate on the core change to how sys.path is calculated. The arguments are weaker without the installer side, certainly, but they remain valid if you phrase them in terms of “if installers choose to support a __pypackages__ install method, …” And once the core feature is in place, installers (and other tools like PDM and pipx, or even pip) can experiment with UX without needing people to install and invoke the current support wrapper. Timescales are longer, and more uncertain, but that’s pretty normal in the packaging world, unfortunately

And I don’t know about the others, but I don’t like airing our policy disagreements in a general forum. ↩︎

PythonCHB · January 26, 2023, 6:06pm

I think this is in itself a problem. If we acknowledge that you can’t long-term just install things into e.g. ~/.local, then eventually users will need to learn about venvs, and how to clean up their local packages. This just kicks the problem down the road.

My core point is that there is not absolute consensus that this proposal would be better for teaching newbies.

But yes, it IS “kicking the can down the road”, which is just fine – I’m trying to teach folks about, e.g. what an iterable is – it’s just fine to use the plain old Python, and not get into (yet) the whole issue of dependencies and deployment and what all else.

eventually users will need to learn about venvs …

Well, maybe – in fact, not everyone learning Python is going to be developing production systems – I’d argue that most aren’t [*]. And even if they are, they may end up needing to use conda environments, or who knows what other solution. I know i was very productive for years before I started using environments – and I never did use virtualenv – I’m heavily dependent on conda environments because for the kind of work I do, it’s the best solution.

I think this is an issue with a lot of the packaging, etc. systems and documentation – the folks developing and documenting them are, pretty much by definition, thinking about, and worrying about, and documenting what systems developers need. But not everyone is a systems developer.

Python is still very, very, useful as a “scripting language” – it can, and I think should, be taught to newbies with a focus on the language, rather than on the systems surrounding it.

Anyway, I agree this PEP is probably a bit easier than the existing environment systems, but having yet another way to do something similar seems to me to be adding, rather than removing, complication.

[*] I teach for the Univ. WA continuing education program – over the years, I would say it is absolutely a minority of my students taking our Python Certificate program that are going to (in the short term, who knows in the long term?) be developing systems that need environments. Some want to automate what they used to do by hand in Excel, some want to do sys admin scripts, some just want to learn what “coding” is all about, some want to do data analysis, etc, etc.

Kwpolska · January 26, 2023, 6:14pm

Which use-cases are “advanced”? The PEP draft only mentions scripts, but are there are any other features that are missing? I don’t think it’s a good way forward to label this PEP “beginners-only” — while it is understandable that it does not do everything venvs do, many non-beginners will find the feature set good enough for themselves. The PEP should list the most important things missing to help people make an informed decision.

Also, the latest draft still mentions keeping system site-packages in sys.path:

This PEP proposes to add a new step in this process. If a __pypackages__ directory is found in the current working directory, then it will be included in sys.path after the current working directory and just before the system site-packages.

I’m worried that this behaviour will severely degrade the user experience with PEP 582, because mixing system site-packages and PyPI-sourced packages doesn’t always go well. The system package might be in an older version, or can be modified by the distribution^[1] in unexpected ways. While installing something, pip/the installer might decide the site-packages version is fine^[2], and thus lead to more confusion, especially for beginners, and especially in cases where there is a large group of people with diverse systems trying to do the same thing (e.g. university classes or PyCon workshops).

Hi, Debian! ↩︎
And if the existing version matches the requested range, that’s pip’s default behaviour, unless --upgrade-strategy=eager is passed. ↩︎

agoose77 · January 26, 2023, 6:28pm

Tone is hard to gauge in these contexts, so let me explicitly assert that mine is friendly!

I agree that we can’t know that yet, but I do think that if students don’t currently learn about env management, then PEP 582 being a default won’t change that. But, it will be a longer-term solution that saves from “oh, my env is broken and I don’t know why” problems.

I’m not so sure. I don’t know when you switched to these tools, but I would wager that it’s harder to avoid these days given just how many dependencies everyone uses in their packages; conflicts are easier to run in to.

I could see a world in which PEP 582 is treated as .venv, but I don’t know enough about the motivations for separating the two at this point.

kushaldas · January 26, 2023, 6:31pm

Over the years the folks I taught ^[1], not everyone developed production systems. But, after learning the basics of the syntax, all of them loved to play around with the various modules which are not part of the standard library. Once upon a time we had to install those modules from OS packages and then later directly from PyPi via pip. And almost no one used Conda (except 2 or 3 people in my whole teaching career).

You have to remember that our community is vast, just because I or you don’t see something as a problem, does not mean others in the community don’t face the similar problem.

I am sure all the authors of every data science/ML projects will disagree on that, they are not writing the code or documentation for operating system developers.

I am sure we can have multiple papers written by folks around the world on “What is Python?” or “How to teach a programming language the best way?”. There is no one single solution fits for everyone.

I teach Python from 2006, in colleges/universities/schools (and to corporate folks) in different parts of the world. ↩︎

Saphyel · January 26, 2023, 9:24pm

I did work for several companies and no one of them uses conda.
I also met a lot of data scientists that never used conda.
I’m not trying to say no one uses conda, everyone uses the best tool for their job and if you are happy with conda seems like they don’t care too much about this PEP.

I have the feeling if this PEP gets approved can be one of the most popular and most used since day 1… The popularity of PDM is part because of this PEP, 300 messages in this discussion and there’s a lot of blog posts about this as well.

PythonCHB · January 27, 2023, 7:14am

I’ve made my point, so only a couple comments:
“Tone is hard to gauge in these contexts, so let me explicitly assert that mine is friendly!”

Mine too – i was responding to a post WAY back in this thread that indicated that there was consensus from folks that teach python to newbies about this – I’m jsut saying it’s not so simple.

Of course, and that’s not what this thread is about, so I’ll stop here.

I said “systems developer” not “operating systems developer” – I can’t imagine anyone is using Python to develop an operating system.
These aren’t well defined terms, but what I mean is "complex systems with a large number of interdependence – e.g. web services, etc. ON the other end is single file scripts – and there is a LOT in between.

But I AM talking about “data science/ML” folks – and THEY use conda a great deal – they are not the target for much of the packaging community’s work. In fact, conda was developed in direct response to the core Python community (I think Guido himself) essentially saying: we’re not going to solve your problems, you should probably develop your own solutions. (Conda: Myths and Misconceptions | Pythonic Perambulations)

And a lot of data science folks are using Jupyter Hub, and Notebooks, and all that – I’m not sure how this would all fit into this PEP, but I suspect not well.

Maybe that’s neither here nor there for this discussion, but it would be nice if the interaction with other packaging systems and workflow would be kept in mind.

indrora · January 28, 2023, 12:32am

I figure I’ll toss my hat into the ring here.

Wearing my “Average Everyday Python Enjoyer” hat, this PEP fixes some of the gripes I’ve had with packaging for quite some time. I see a lot of discussion about how this is a “replacement” for virtualenvs.

How I see it here is that they’re a fix for a problem that virtualenvs happened to cover but by sheer accident, as well as problems that virtualenvs create in the process.

Take a typical PyTorch/GPU compute project: There’s at least one or two versions of CUDA somewhere, which have weird and sometimes awkward configuration. These libraries are finnicky to configure on Windows on the best of days, troublesome to configure under Linux on the worst of days, and overall come with a high degree of frustration. Anything related to TensorFlow, for instance, will inevitably be full of configuration that is best done system-wide.

If I use a virtualenv, what I get inside that virtualenv is “Python stdlib + whatever the virtualenv provides.” This means that I can’t share an installation-wide configuration of these libraries, which are often patched by my OS vendor. Virtualenv also means that my brew managed python regularly falls out of alignment with what Virtualenv wants.

If I’ve read the thread right (with… three years of history) one of the uncertainties is "where do you look for __pypackages__/.../? My take:

Adjacent to the module __main__ is in
Adjacent to any .(git|svn|mercurial|cvs|etc) directory
Adjacent to a pyproject.toml
Search upward until you find a .(git|svn|etc) directory or $HOME or /.

Putting on my “I work in a corporate environment hat”, At $dayjob, I go through contortions to set up the Python path in our build and development tools since we don’t use pip (for handwave reasons). Having this would actively simplify our workflow since it would create a cleaner mechanism for handling package-local dependencies, as well as make cross-version unit testing much easier (a situation where we currently have to spin up a bunch of virtualenvs to safely do).

This would absolutely be a boon to those who use PyTorch and other ML libraries in a corporate environment since it gives a chance for the distribution to control the GPU/etc specific bits and for developers to use the libraries they need without having to go through the process of setting up a virtualenv and redoing work. ML and NLP are full of mutually exclusive libraries and things that just don’t work well together.

njs · January 28, 2023, 2:43am

This does a really good job of explaining one of the major reasons I don’t think PEP 582 is a good idea: AFAICT there’s just no way a regular python foo.py invocation can start searching all those places before starting. The python command is the lowest-level way to invoke the interpreter, so it needs to be fast and predictable and can’t make major backwards-incompatible changes. This is problematic on all three axes – more directories to search adds startup latency, and suddenly directories at far away places could start affect commands that they never affected before.

That’s why PEP 582 only searches the current working directory or script directory. But it’s kind of a compromise that makes no-one happy? It’s not what PEP 582 users really want, but it’s the most they can get.

OTOH the problem here is trying to use the same python command to handle both the low-level and the high-level cases at once with a single UI. If you use some kind of front-end launcher for the high-level interactive use cases, then the conflict goes away and you can make it as smart as you want.

PythonCHB · January 28, 2023, 6:33am

I am really confused by this – As I understand it, the CUDA libs, and all that configuration are not Python – they are “system” libraries. And THAT is exactly the problem that conda was designed to address – this sure sounds like what conda environments are for.

And if your environment doesn’t use pip anyway, then keeping a conda channel for your systems would only make things easier.

Can you really put all that CUDA stuff just in the Python packages?

Hmm – looking at the PyTorch site, they seem to offer both conda and pip packages, so there is a way.

groodt · January 28, 2023, 7:13am

Yes, it’s been this way for a while now. Nvidia made it even easier and ships wheels with CUDA nvidia-cuda-runtime-cu11 · PyPI

I build systems that support ML Platforms / Data Science at $dayjob. We use torch, tensorflow, sklearn and all the usual suspects and we happily use pip and don’t use conda because it doesn’t interact well with the rest of our systems and tools. We only need a python package index and a way to manage and install these python packages, so that’s why we use pypi mirrors and pip.

So as folks have said before, the Python community is large and it’s not as easy to say that “data science people should use conda”.

PythonCHB · January 28, 2023, 7:43am

Nor can you say “only data science people should use conda”

pf_moore · January 28, 2023, 9:33am

One other issue is the security implications. I think I saw this in a different context (apologies if I am repeating something that was said here previously) but to be secure, you need to search upward but stop if you hit a directory owned by someone other than the current user. Otherwise an attacker can add a __pypackages__ somewhere above you and inject libraries into your search path.

That sort of checking is definitely not something we want to have to do in the startup code…