There’s another difference between pip and conda that I find notable but keep forgetting to mention in these threads. When you install or remove a package, pip only calculates the “forward” dependencies (i.e., what is needed to make the package you specify work), but conda also considers the “backward” dependencies (i.e., when you do what you asked for, will anything else stop working). Conda tracks the entire environment state, but pip only considers the packages you’re asking it to deal with.
For instance, if package A depends on package B, and you install A, both pip and conda will install A and B. But if you do pip uninstall B
, pip will do it, leaving A in a broken state. Conda won’t allow this; if you do conda remove B
, it will show you the plan of what it’s going to remove, which will include removing A, because it knows that A depends on B and thus won’t let you break the environment by removing B without also removing A.
The same also applies when changing package versions. If for whatever reason you upgrade or downgrade a package, conda will force you to update versions of other packages so that the overall dependency graph is consistent. That is, if you upgrade package B (the dependency) to 6.5 but the version of package A that you have installed has specified it depends on B<6.0, conda will make you either upgrade A (if there is a newer version that can handle the newer version of B) or remove it.
Few will be surprised to hear that I think the conda way is better.
In my experience a lot of frustration that people have with Python package management comes from getting into a situation where they try upgrading, downgrading, or removing packages for some reason (e.g., to match versions available on a server they want to use), and then find that in doing so they’re put the environment into a broken state by getting the dependencies out of sync. Ensuring correctness of the entire dependency graph on every package change operation prevents this; the only way you can break the environment is if a package itself is broken or its metadata is incorrect.
There are a couple downsides. One is that you have to resolve the full dependency graph on every change, which has become notorious as a problem with plain conda: if you accumulated a lot of packages in an environment, trying to install or upgrade one could take a long time because conda has to make sure it can find a way to do what you asked without breaking some other package that may be quite far away in the dependency chain. This has been most egregious in the case where what you’re asking to do isn’t possible (e.g., because of conflicting dependencies), as then conda can go on wild goose chases trying all manner of odd versions of things, trying to give you what you asked for. With mamba and the libmamba solver, this problem seems to be mostly alleviated.
Depending on how you look at it, another downside is that it becomes harder to do a “manual override” and force the installation of a package that whose declared dependencies are incompatible with a given environment state. Conda essentially treats the package metadata as gospel and makes it very hard or impossible to say “I know this package says it won’t work with the versions I have installed, but trust me, it will, go ahead and install it anyway”. Overall, though, this seems to me a case where it makes more sense to go with the approach that will help more people. Most users should stop if they encounter that kind of situation, and fall back to another option (e.g., downgrading all involved packages to older versions that are known to work together), rather than try to force an install.
Most importantly, I think a baseline user expectation, one that’s worth catering to is “whenever I ask the manager to do something, and it finishes and tells me it did it, the environment should be in a working state”. In other words it’s not just a package manager, it’s an environment manager. It manages the environment as a whole and makes sure that at all times other than in the middle of an operation, the environment is in a consistent, working state. Pip doesn’t satisfy this criterion; when you tell it “uninstall numpy” it just tells you “okay”, and doesn’t tell you that in doing that you broke pandas.
I was curious about Poetry so I took a look. What I found is basically that this situation highlights that poetry is primarily not a package manager, nor an environment manager, but a project manager. Poetry won’t even let you install any packages at all without having a pyproject.toml. Whenever you install something, it not only installs it in the environment but updates the pyproject.toml to list it as a dependency. If you poetry add pandas
and then try to poetry remove numpy
, it will say it can’t even find numpy, because numpy isn’t part of the declared dependencies of your project. So in this sense poetry also manages the environment as a whole, and goes beyond that in keeping the environment in sync with a project file like pyproject.toml
.
Personally I’m not as much of a fan of that, as I like to keep a few “general purpose” environments to play around in, rather than having to start with a named project. I also tend to develop by incrementally building on a small core that arises from that playing around, and I prefer if I can do almost all of the development before thinking at all about packaging the results for distribution. If I had to create a dummy project called something like “sandbox” that wouldn’t be the end of the world. As I understand it, though, Poetry really wants everything in the project to be in a single directory subtree, which would preclude having a single environment that’s used for multiple experimental nascent “projects”.
What’s most interesting to me here is the difference between package management (pip), environment management (conda), and project management (poetry). My preference is for environment management: ensure the environment is a consistent state, and layer project management on top of that, rather than requiring a one-to-one mapping between projects and environments. Ideally, though, that project management would be done by a tool that is integrated with the environment management tool (e.g., something like conda createproject blah
).