Ive been thinking about this today, and trying to think what the entire set of “primary” commands that we would expect a unified tool to handle. I’m choosing to focus primarily on the idea of a unified tool because I think if we can thread the needle on mashing multiple personas into a single tool, then we’re better off than with two tools. I think two tools is a good fallback if we can’t find a way to mash the personas into a single tool.
So to start off with, I’m assuming that there are two major personas we have to play here. There may be other minor personas, but they’re likely to be a subset of these two major ones:
- I am a consumer of libraries and want to use a tool to install and manage my local environment.
- I am a producer of packages and want to use a tool to build, publish, and manage my project.
I am purposely excluding the topic of environment management, even though it may ultimately be part of the “ideal” tool. I’m trying to keep this discussion scoped narrower if we can help it, and also I think this ideal tool also needs to support installing into arbitrary environments anyways (otherwise we’re eliminating a huge portion of our user base), so at worst it can be added on later.
So given that, a unified tool would need to, at a minimum, enable:
- Allow installation and uninstallation of projects into an environment.
- This includes commands for repeat deployments into multiple environments.
- Introspect the current state of installed packages, and get more information about them.
- Tools to produce a local directory of artifacts for offline usage.
- This includes downloading from PyPI, producing wheels, and producing sdists. Basically an analogy of “install, but save to a local directory”.
- Allow upload of existing artifacts.
- This could also possibly include command line support for things like yanking a file, managing a two phase release, etc that we might add in the future.
- Build artifacts such as an sdist from a directory, or a wheel from a sdist.
I think that’s it for what a mvp of such a tool would look like?
There are also additional features that one could imagine it having, such as:
- Enable a “binary only” install mode ala pipx.
- Ability to bootstrap a project, potentially from a user supplied template ala cookiecutter.
- More commands to manage the development lifecycle such as a
lint
command, atest
command, or aformat
command that would paper over differences between linters, test harnesses, or formatting tools.- This could really be expanded quite far, things like building documentation, running benchmarks, etc.
- A generic “run a command” feature ala tox or
npm run
. - A way to run a command from a project in development, ala
cargo run
.
Please note, that I’m not suggesting this tool should gain support for those features just that those are the kinds of features that exist in this space already (both within Python and without) and so are something we might possibly want to add at some point, so it makes sense to consider how they’d fit in if we did add them.
That’s a lot of things, and I’m sure I’ve missed some possible things but taking this list at face value, I see a few “concept collisions” where multiple personas are going to want different features that have a a lot of overlap:
- The ability to create a local directory of artifacts, and the ability to build an arbitrary project into artifacts.
- This is the one we all know, the
pip wheel
versus “build a wheel” case, would also apply to a hypothetical sdist command.
- This is the one we all know, the
- The default sort of "install and “uninstall” commands, versus a binary only pipx style install.
- The different meanings of “run a command” (generic tox/npm run style, versus “run a binary in this existing package”, versus pipx style “run a command in a temporary and thrown away install”).
I don’t really see much overlap otherwise, and of the overlap I do see, all but one of them exist in the set of “maybe it’s be cool, possibly?” features and they’re mostly not overlap that cross personas, but different ways of functioning within the same persona.
The OTHER major overlap I see is one of configuration. We currently have 3 wholly distinct APIs for different tasks, package install (/simple/
), search (XMLRPC search function), and upload (YOLO http form from a decade ago). There are also some projects that are using the JSON api for these tasks as well I think. Unfortunately at the API level there’s nothing tying these APIs together, so pip search requires you to manually specify one of them, pip install requires another, and twine requires yet another. However, I don’t think this is a serious problem long term, because it should be pretty easy to solve by switching instead to configuring some sort of top level endpoint, that then points to the repository api, the search api, the upload api, etc which would then unify the configuration for users.
So given that a single tool makes it a lot easier for people less steeped in packaging lore who just want to get something done and not think too hard and the ONLY real overlap seems to be between focusing on how to produce different artifacts related to either a local offline mirror or for distribution of a single project on PyPI, I think that it makes sense to focus on producing a unified tool.
I think that the local offline mirror versus build artifact is not very hard of a needle to thread. The easiest thing to do would be to just do something like foo build wheel
, foo build sdist
, then the mirroring commands can be whatever. It would be great if there was a sub command for them too, so something like foo mirror as-is
that just downloads them as is, foo mirror sdist
that only downloads and/or produces sdists, and foo mirror wheel
that produces wheels. However, I’m not super happy with the name “mirror” (or even really “as-is”) and I’m drawing a blank on better terminology, so it’s possible it’d make sense to keep them as top level commands like download
(or fetch
), wheel
, sdist
.
To further build on the assumptions I’ve made in this post, I also think it’s entirely possible to have “pip” be this tool, and I don’t think we need to further complicate the landscape by adding yet another tool. I think the only sort of backwards compatibility concern we’d have is if we ultimately decide to move the “local offline mirror” functionality into a sub command, but that isn’t required and if we did that, we could simply handle it as a normal part of our deprecation process.
Some other random thoughts about things people have said:
- I’m not super worried about the size of pip or “bloating” the functionality of pip. Yes, we should carefully consider each feature we add and whether it’s something we should be adding or if it would be better off as a separate thing-- but I think that’s true of all projects and I think that this provides a much slicker UX for end users.
- Specifically related to the size of pip, I think there is a lot of neat things we could do with our vendoring to try and reduce the total size of an installed pip such as tree shaking. We could also make it less painful (and solve other issues!) by threading the needle on making it possible for pip to target an environment instead of having to be run in an environment, and make it so everyone doesn’t have to have 237 copies of pip on every machine. These are harder and longer term projects than just saying “no” to new features, but I think they’re a better path forward ultimately.
- No matter what, we should strive to move as much of pip’s functionality as we can into libraries (possibly ones that have their own CLI, ideally through
python -m
to make them more of an “advanced” feature) so that people can reuse this functionality in their own tools, or can even opt for more advanced usage where they want to use the underlying tooling directly and have more overall control. - The primary constraints that living in pip adds to how we solve this project are roughly:
- No GPL code (can’t ship it in Python itself).
- Must at least have a pure Python fallback.
- We have to be able to vendor it, or treat it as wholly optional.
- I think there is real risk if we tell everyone to use “yet another tool” and I think it is substantially easier to communicate new pip features, versus communicating entirely new projects. Particularly since one of the single biggest complaints I’ve seen from users is that it’s too hard to figure out which tool is the “right” tool, and that what the “right” tool is seems to change every time they look.
- I think arbitrary plugin based commands to pip is probably a bad idea, and certainly not required for any of this. At work people can produce their own commands now and just call them
pip-foo
. There barely any difference betweenpip-foo
andpip foo
and it makes it clearer what is actually produced by pip, and what is a third party project. - This maybe does not solve the larger “elephant/platypus” problem as mentioned above, but I think it does move the UX needle drastically and does so with minimal churn. It doesn’t lock us out of solving that problem at a later date (either within pip, or even with the aforementioned wrapper tool). If anything I think adding this to pip makes it easier to solve this problem at a later date, because it would feel a lot worse to me to introduce yet another packaging tool to solve the platypus problem if we had just introduced yet another project to solve the “unified tool” problem.