Why doesn't pip write installed packages to pyproject.toml?

baggiponte · January 19, 2024, 9:21pm

I skimmed PEP 735: Dependency Groups in pyproject.toml and could not find a reference to this. The discussion reminded me that pip still cannot write dependencies to pyproject.toml. This is a very welcome PEP, but I can’t help but think it will best serve the developers if pip could write the dependencies to each group by itself.

sinoroc · January 19, 2024, 9:40pm

@baggiponte Are you inquiring about something like a pip add command?

pf_moore · January 19, 2024, 9:45pm

Pip isn’t a project manager, it’s an installer. If you want a tool to manage writing your pyproject.toml for you, you want a project manager like PDM, Hatch or Poetry. I assume those tools will add capabilities to manage dependency groups if PEP 735 gets accepted.

baggiponte · January 20, 2024, 12:23am

Thanks for the prompt replies!

@baggiponte Are you inquiring about something like a pip add command?

I guess so. What do you think?

Pip isn’t a project manager, it’s an installer. If you want a tool to manage writing your pyproject.toml for you, you want a project manager like PDM, Hatch or Poetry. I assume those tools will add capabilities to manage dependency groups if PEP 735 gets accepted.

I understand. I actually regularly use PDM and I am positive they would support PEP 735. It’s just weird to see Python proposing standards but not adopting tools to use them.

I’ve been a Python developer for four years now (not much, compared to a lot of people in this forum) and I always struggled to introduce modern packaging practices where I worked because of this. Most of the people I worked with are used to pip install and pip freeze (or write random dependencies by hand in requirements.txt files): pyproject.toml feels just a nuisance when it’s yet another file to copy paste and fill in by hand. It’s not much better than a requirements.txt and a setup.py.

Honestly, it’s hard to blame them. I lose them when I tell them they can’t use pip as they usually did and that they need to learn yet another tool (or even more than one). The problem is that this happens even in the biggest companies. Even OSS projects from NVIDIA do not adopt pyproject.toml and copy-pasted the dependencies coming out of a pip freeze. I submitted a PR to fix this, but deferred choosing a project manager because of the reasons above. xformers by Meta is definitely popular in the DeepLearning/LLM space, but is a pain to install because their setup.py imports torch and without a pyproject.toml there’s no way to list build dependencies.

sinoroc · January 20, 2024, 9:48am

There must be a misunderstanding here… All the packaging PEPs that have been accepted are also implemented in mainstream tools (tools under the PyPA umbrella and outside). It is a (strict) requirement of all packaging PEPs, that there must be guarantees of adoption in mainstream tools and even (prototype) implementations ready before they (the candidate PEPs) get accepted.

sinoroc · January 20, 2024, 10:05am

Me, right now, with the current state of things I would say no. That is out of scope for pip as it currently is. If the suggested pip add some-library is meant to add some-library to the project.dependencies key of pyproject.toml, then no, because pip does not deal with modifying this kind of data, pip is only a consumer of such data.

Is that what you had in mind or something else?

Maybe there is some misunderstanding as to what pyproject.toml is and contains. The primary goal of pyproject.toml is to contain metadata (and configuration settings) for one specific library (or application) during its development, test, and build phases.

pyproject.toml is not meant to contain information about a (virtual) environment. Some development workflow tools kind of blur the lines about this, in that using their add command changes both the content of pyproject.toml and the content of the development environment. But we should not get confused about this: it is an environment meant to be used for the development and testing of the library (or application), it is not meant for actual use (deployment in production).

baggiponte · January 20, 2024, 5:18pm

Thanks for the reply. Sorry, I was not clear enough: I only meant to talk about the development/test/build phase, not production - for this, pip + requirements can do a lot.

The point I would like to make is about the development phase. People I worked with for the past years develop libraries and applications with the tools they are used to deploy code with. It’s currently hard to argue in favour of pyproject.toml: at least - they say - you can write two lines of code to parse the dependencies from a requirements.txt file into a setup.py.

I understand your point - pip add would also imply creating a pyproject.toml if not there - and I understand that the maintainers/the PyPA/the community agreed to leave pip be a CLI installer and have a broader set of options to use.

sinoroc · January 20, 2024, 6:58pm

If I recall correctly, with setuptools as build backend, you can write pyproject.toml so that the dependencies are loaded from an external file, abstract-dependencies.txt for example.

Probably something like this (untested, I am not 100% sure of the notation):

[project]
# ...
dynamic = ["dependencies"]

[tool.setuptools.dynamic]
dependencies = {file = ["abstract-dependencies.txt"]}

See setuptools documentation about “Dynamic Metadata”.

When the content of abstract-dependencies.txt is changed, then you should re-run python -m pip install --editable . and the current environment should be updated accordingly. So it is two steps instead of one, I guess, but otherwise seems pretty straightforward.

The above is not something I recommend, it is better to have packaging metadata as non-dynamic whenever possible. Also I always worry about developers being confused about the role of the dependency list in package metadata (abstract dependencies) and the role of requirements.txt (typically, concrete dependencies).

We have recently updated a lot of pages of the Python Packaging User Guide. In your case, maybe those pages may be of interest:

How to modernize a setup.py based project?
Writing your pyproject.toml
install_requires vs requirements files, this one has not been updated, but the point about “abstract vs. concrete dependencies” still stands

pradyunsg · January 20, 2024, 7:03pm

Honestly, I do want to extend pip to cover additional parts of the workflow.

However, the reality is that the pip project does not have the development resources invested in it to do so. There’s a huge amount of work needed to maintain pip’s current functionality and deal with incoming user requests (bug, feature, discussion etc) that the volunteer maintainers and contributors are already not keeping up with it or barely keeping up with it (depending on how generous you want to be).

There isn’t really much developer availability to pay down existing technical and social debt, which is viewed as a pre-requisite to adding additional features by the maintainers – if we can’t keep up with existing feature maintainance, it gets worse when we add more features. IMO, this is a reasonably big piece of why pip isn’t really gaining massive new user-workflow features.

notatallshaw · January 21, 2024, 5:47pm

Perhaps it’s time to consider bundling a tool with the CPython installer that does support this kind of package/environment manager workflow?

This would impose a lot of constraints on such a tool, and perhaps no existing tool wants or is capable of meeting these constraints. But if the PyPA made it clear they were interested and outlined the core requirements and constraints such a tool would need to meet, it would provide the opportunity for developers of such tools to assess the feasibility and express interest.

pf_moore · January 21, 2024, 8:50pm

Personally, I’m not as sure that this is the direction I want pip to go in. Given that a significant number of tools (in particular, the existing workflow tools) use pip in a subprocess to install packages, I think there’s a genuine need for something that acts as a (relatively!) streamlined package installer, and doesn’t do workflow (i.e., it doesn’t compete with the tools that want to embed it).

For now, at least, that tool is pip - and I think we should be cautious about changing the scope of pip without ensuring that there’s still an acceptable “embeddable installer” option.

This would have to be proposed as a PEP to CPython itself, similar to how pip was added in PEP 453. The CPython core devs (and the steering council in particular) would have to agree to this.

It’s also worth noting that the proposal to include pip was explicitly to “bootstrap” the packaging ecosystem. So pip was only proposed as a way to allow people to install whatever tool(s) they actually wanted to use, rather than as the tool we expect everyone to use. So adding a new workflow tool changes the arguments for having pip in the stdlib, and that’s something else that would need to be covered in any PEP that proposed a new tool. Specifically, if tool X is added to the stdlib as the official solution, there’s no longer any need for pip to be in the stdlib. Except as a dependency, but would it be acceptable to ship a tool with dependencies in the stdlib^[1]?

Clearly the ecosystem has changed since PEP 453, and it’s perfectly reasonable to suggest that what’s in the stdlib changes to reflect that. And the PyPA (or longer term, the packaging council) is in a good position to set the parameters for such a proposal. Although I doubt there’s a realistic possibility of anyone (council, PyPA or community) being able to come to a decision on which workflow tool is going to be blessed as the “official answer”. We’ve had way too many unproductive discussions on this in the recent past for me to think there’s anything even remotely like a consensus.

Personally, I still think that the idea of having a simple installer in the stdlib, that people can use to install their workflow tool of choice, is the right approach^[2], but I’m aware that this view is relatively uncommon these days.

Pip is constrained in a lot of ways by the fact that it can’t have dependencies. ↩︎
In fact, given that pip is now available as a standalone zipapp, it’s entirely reasonable to suggest removing pip from the stdlib, as the problem is now just one of distribution, not of bootstrapping. ↩︎

sinoroc · January 21, 2024, 9:20pm

Nowadays wonder if it would make sense to have pipx instead of pip to bootstrap things.

pf_moore · January 21, 2024, 11:09pm

Pipx uses pip behind the scenes. So it’s more pipx “on top of” pip rather than “instead of”.

Both pipx and pip are available as standalone zipapp distributions, so a Python installation without pip could be bootstrapped by downloading either or both of those two files, and using them to bootstrap into a full development environment.

The problem with bootstrapping from a separately available zipapp is that it’s a non-trivial process getting from “install Python” to “ready to go”.

sirosen · January 22, 2024, 1:03am

I’m not sure if it’s uncommon or the opinion of a “silent plurality” (or majority). It’s the current status quo, so most people who are happy with it probably don’t even check in on this forum, the pip issue tracker, etc.

Bundling pipx, or a bootstrapping process a la “ensurepip”, is a really interesting idea.

I was recently suggesting the use of a Python CLI tool to some Go developers and was struck by how much simpler installation would be to explain (in a platform agnostic way) if we could assume that everyone has pipx.

It may seem like it only saves one step, but it does so at a crucial point in a new user’s experience.

As for the idea which spawned this thread, having pip take on the ability to manipulate pyproject.toml, I can’t say I’m enthusiastic about it.

IMO we should be making that data easy to read and write by hand, so I don’t see much benefit in a command which does the same work as a trivial edit.

I also haven’t loved the experience I’ve had using poetry add, which does this exact work. Not because poetry has got the interface wrong or anything, but mostly because I have to know how to read (and therefore write) the file contents, but it’s very much optional for me to know the command flags and semantics. So I always forget what the command can do in order to produce the file contents which I want, whereas I never forget how to do it in vim.

Perspectives on this vary, but I wanted to share mine to note that even if we agree that pip should gain new capabilities, I’m not sure this particular one is a “slam dunk” in terms of value added.

notatallshaw · January 22, 2024, 5:45am

I don’t think that’s how most users see the situation though.

With pip bundled with the official installer and pypi.org displaying pip install package-name on every package page, the impression is that pip is the blessed official tool to use for managing 3rd party packages, not merely a simple way to bootstrap to your favorite tool of choice.

To quickly address several of your points here:

I agree this would lead to deprecating pip if the secondary tool provided sufficient replacement functionality as well as advanced package management workflows
This tool would also presumably have to vendor its dependencies to be bundled with the CPython installer, like pip does right now, so I don’t see the dependencies being an issue
I agree, we could have long discussions and get nowhere. I would say firstly there has to be some appetite on the CPython side to bundle such a tool, and secondly, there has to be at least one tool willing to go through the requirements to be bundled. Unless both points become true it isn’t worth spending energy on discussing

pf_moore · January 22, 2024, 8:54am

To follow up on these two.

Is anyone aware of a tool other than pip which is willing to switch to a “pure Python only, vendor everything” policy?
As one CPython developer, I’ll say that I don’t have the appetite to get CPython sucked into the “which tool is best” controversies this would involve. To get a more definitive answer you’d need to ask the SC, though.

notatallshaw · January 23, 2024, 12:46am

I haven’t analyzed their dependency chain, but aren’t poetry, pdm, and hatch all pure Python?

I don’t think any of them currently have a need for a vendoring policy. The point I’m trying to make (and perhaps communicating it poorly) is that if there were a set of requirements PyPA laid out, then maintainers of such tools could decide if they wanted to meet those requirements and be bundled or not.

As such tools have no incentive to vendor right now, why would it be expected for them to already do it? If the incentive changed, then so might the tools.

I agree picking a “which is best” would be a terrible idea. But also, I doubt any tool currently exists that would both meet a set of requirements the PyPA would be happy with and would want to get pulled into the strict constraints that it imposes.

Without putting requirements out there, no tool will ever consider meeting them, as they would be nonexistent. And developers will continue to complain about Python’s “terrible, fractured packaging ecosystem” and “why can’t there be a Python equivalent to what Rust has with cargo?”.

BrenBarn · January 23, 2024, 4:42am

That is an interesting idea. If I understand right, you’re basically saying that there would be a sort of checklist and if tools could be “certified” as meeting those criteria, they would then be bundled with Python? So it would be possible for a range of competing tools to meet those criteria and then all those competing tools would be bundled with Python?

If that (inclusion of competing tools) isn’t allowed, then this idea still requires some kind of “choose the best” decision. If it is allowed, I’m not sure it would really fix anything, since then people would still have choose among various tools, just now those tools would all be installed along with Python instead of having to be installed separately. I suppose it might help a bit by at least setting a bar that bundled tools have to meet (i.e., not choosing which tool is best but certifying some set of tools as “good enough”), although in practice I think that function is already performed by community recommendations and so on.

Either way, I think this would represent a major shift in how Python is distributed.

I understand this sentiment, but I also think that it’s ultimately incompatible with the goal of solving the fragmentation problem of Python packaging tooling. As long as the tools that are bundled with Python leave out large chunks of functionality that people want, while the ones that aren’t bundled compete and none is clearly endorsed by Python, users will feel confused and irritated. It doesn’t exactly have to be choosing the “best” one but I think a choice does need to be communicated about which tool(s) are recommended. (An intermediate way to handle this without bundling would be in the docs.)

baggiponte · January 23, 2024, 9:04am

I beg to differ - that kinda feels the opposite conclusion of the 8000+ respondents of the packaging survey. Yes, there has to have been a selection bias in answering, and pip + requirements + venv just works™ in prod. But I wouldn’t misunderstand this for support of the “status quo”.

I don’t think this is a just a matter of making it trivial, though I would say [pdm,poetry] add provides added value in since it writes the minimum supported versions directly - how would I get those otherwise? But maybe it’s me who got dependency constraints all wrong (not ironically).

While I agree this might not be a “slam dunk”, I think some “packaging sugar” would not hurt either. After 4 years I still see people using one huge conda environment with packages from conda + pip who literally say they don’t feel like creating, activating and deactivating a virtual environment for each project. This is the same kind of inertia that prevents moving from copy-pasted setup.py files to pyproject.toml-based setups. And it’s not only the average user: it’s big companies like NVIDIA, HuggingFace and Meta putting out libraries written this way.

Going back to the original topic (PEP 735, but this could apply to PEP725 too): I am afraid adoption will be slowed down by the fact that people will have to know these options exist. My guess is that the average user is far more likely to discover about those if I run pdm/poetry add --help than by deliberately going to the PyPA page to read the specs. No matter the solution, pip is likely the most important tool to help spread it.

sirosen · January 23, 2024, 3:29pm

It’s possible I’m misremembering, but I think the results of the survey were that people wanted “one right way to do it” and “an official, Python-endorsed workflow tool”.

But should it be distributed with the language itself? I don’t think that’s obviously correct.

I’m sure many of the survey respondents would like it if a tool like hatch (to pick one out of a hat) shipped with cpython. But are they reckoning with the fact that this ties their version, and therefore supported features, to the cpython release cycle? I’m definitely a bit concerned about that.

I bet that a lot of engineers would choose pip bootstrapping, as it exists today, over having to stay compatible with a 4 year old version of hatch.

Even in this case I think the ease that these tools provide is deceptive. They’re finding versions based on package constraints, which can’t tell the full story.

If I write down that my library uses requests>=2, then I really ought to test against 2.0.0 to make sure I’m not relying upon a behavior added in 2.1.0, etc.
Adding a dependency is a process, which includes testing and reading changelogs.
I sort of think writing the thing into a file is the easy part.

I’m not saying that these tools are bad. They do something, and they might make it faster and easier to execute that process of evaluating and adding dependencies. But is this the “killer feature” which pip needs? Ehhh… I just can’t see it, personally, as being all that useful.

Here I actually disagree more strongly and directly. I think you’re conflating different problems which might have the same origin, and proposing a solution which won’t have much effect.

Managing things in a monolithic environment is a bad personal habit. Anyone who is doing that is doing something which might not do them any harm today, but which could break and won’t serve them well in the future. It’s limited pretty much to a class of developers doing solo work or work with a very limited peer group, not meant for distribution.
These users cannot see the harm that their bad habit sets them up for. You can try to explain it, but it’s typically hard to convince people to change their habits for some theoretical bad future. Every day they spend without the rickety thing falling apart is evidence that they were right to ignore you. (When you meet someone like this, just save your effort for later. You may be helping them fix their situation in a month and be frustrated that they didn’t listen, but at least you didn’t waste much effort trying to convince them. Natural consequences – it’s how some people have to learn.)

So my claim is that this is a different case of people being resistant to change, not to be confounded with resistance to pyproject.toml .

I’m glad you mentioned copy paste, because I think that’s one of the main drivers here. Think about it from the perspective of an engineer who had used setup.py files for 10+ years across dozens of projects, and isn’t hostile towards the new packaging work but just doesn’t care at all. What they want is for all of their 50+ projects to look as similar – nay, identical! – as possible. Really what they need to be convinced of is that it’s okay for their many projects to start to diverge, as they won’t all be trivial to move to pyproject.toml and it won’t happen all at once anyway.
It’s also resistance to change, but it comes from a very different place.

I watched the Linux world grapple with a similar migration from sysv to systemd, only a few years back.
sysv was a script-based system, like setup.py, and very well entrenched. systemd is primarily declarative, like pyproject.toml . (Oh, and we can draw a parallel between upstart and setup.cfg if people like!)

There was a lot of griping. I think that migration process got a lot more blowback than pyproject.toml every could, for a variety of reasons.

But eventually, after several years, the dust settled. You can still write sysv scripts and shim them into systemd, but nobody really does that for new software because there’s little reason to do so.

Now, the parallel is obviously not perfect, but there was something I observed with systemd which I think we’re at for pyproject.toml . There’s an inflection point at which the new way ceases to be “new” and simply becomes “new, relative to the old way”. At that point, it’s no longer an early adopter thing, and people start really using the new way of doing things.

And there are always late adopters. Eventually there will be a small number of active Python projects with non-declarative metadata left on the Internet. And their numbers will start to dwindle.

It hasn’t been long enough, as measured in years, for us to be worrying about the still being resistance to the newer way of doing things. That’s normal and to be expected.

Oh, adoption will definitely be slow! Not least because we’ll be asking people to “rewrite a requirements file which already works fine”.

But I don’t think we’re looking for a sudden and huge adoption of this new feature. Once we have it, we can start to build on top of it. The early adopters are the most relevant in the short term – they’ll help us build out a good vocabulary of ways in which the feature is useful, which we can then use to convince more people to join the fun.

Here I have to respectfully disagree. There’s one more topic for PEP 735 to address before I can get started, but here’s the (non-definitive) list of projects which I want to engage with before submitting the PEP:

pip
tox
nox
hatch
pip-tools
pdm
poetry
setuptools
flit

Which ones of these will drive adoption? My money is on tox/nox/hatch. But probably those tools will not want to support this until pip does. So pip is definitely relevant to how this rolls out. I just don’t see it as the tool which will “spread the word”.