Building distributions and drawing the Platypus

Ive been thinking about this today, and trying to think what the entire set of “primary” commands that we would expect a unified tool to handle. I’m choosing to focus primarily on the idea of a unified tool because I think if we can thread the needle on mashing multiple personas into a single tool, then we’re better off than with two tools. I think two tools is a good fallback if we can’t find a way to mash the personas into a single tool.

So to start off with, I’m assuming that there are two major personas we have to play here. There may be other minor personas, but they’re likely to be a subset of these two major ones:

  • I am a consumer of libraries and want to use a tool to install and manage my local environment.
  • I am a producer of packages and want to use a tool to build, publish, and manage my project.

I am purposely excluding the topic of environment management, even though it may ultimately be part of the “ideal” tool. I’m trying to keep this discussion scoped narrower if we can help it, and also I think this ideal tool also needs to support installing into arbitrary environments anyways (otherwise we’re eliminating a huge portion of our user base), so at worst it can be added on later.

So given that, a unified tool would need to, at a minimum, enable:

  • Allow installation and uninstallation of projects into an environment.
    • This includes commands for repeat deployments into multiple environments.
  • Introspect the current state of installed packages, and get more information about them.
  • Tools to produce a local directory of artifacts for offline usage.
    • This includes downloading from PyPI, producing wheels, and producing sdists. Basically an analogy of “install, but save to a local directory”.
  • Allow upload of existing artifacts.
    • This could also possibly include command line support for things like yanking a file, managing a two phase release, etc that we might add in the future.
  • Build artifacts such as an sdist from a directory, or a wheel from a sdist.

I think that’s it for what a mvp of such a tool would look like?

There are also additional features that one could imagine it having, such as:

  • Enable a “binary only” install mode ala pipx.
  • Ability to bootstrap a project, potentially from a user supplied template ala cookiecutter.
  • More commands to manage the development lifecycle such as a lint command, a test command, or a format command that would paper over differences between linters, test harnesses, or formatting tools.
    • This could really be expanded quite far, things like building documentation, running benchmarks, etc.
  • A generic “run a command” feature ala tox or npm run.
  • A way to run a command from a project in development, ala cargo run.

Please note, that I’m not suggesting this tool should gain support for those features just that those are the kinds of features that exist in this space already (both within Python and without) and so are something we might possibly want to add at some point, so it makes sense to consider how they’d fit in if we did add them.

That’s a lot of things, and I’m sure I’ve missed some possible things but taking this list at face value, I see a few “concept collisions” where multiple personas are going to want different features that have a a lot of overlap:

  • The ability to create a local directory of artifacts, and the ability to build an arbitrary project into artifacts.
    • This is the one we all know, the pip wheel versus “build a wheel” case, would also apply to a hypothetical sdist command.
  • The default sort of "install and “uninstall” commands, versus a binary only pipx style install.
  • The different meanings of “run a command” (generic tox/npm run style, versus “run a binary in this existing package”, versus pipx style “run a command in a temporary and thrown away install”).

I don’t really see much overlap otherwise, and of the overlap I do see, all but one of them exist in the set of “maybe it’s be cool, possibly?” features and they’re mostly not overlap that cross personas, but different ways of functioning within the same persona.

The OTHER major overlap I see is one of configuration. We currently have 3 wholly distinct APIs for different tasks, package install (/simple/), search (XMLRPC search function), and upload (YOLO http form from a decade ago). There are also some projects that are using the JSON api for these tasks as well I think. Unfortunately at the API level there’s nothing tying these APIs together, so pip search requires you to manually specify one of them, pip install requires another, and twine requires yet another. However, I don’t think this is a serious problem long term, because it should be pretty easy to solve by switching instead to configuring some sort of top level endpoint, that then points to the repository api, the search api, the upload api, etc which would then unify the configuration for users.

So given that a single tool makes it a lot easier for people less steeped in packaging lore who just want to get something done and not think too hard and the ONLY real overlap seems to be between focusing on how to produce different artifacts related to either a local offline mirror or for distribution of a single project on PyPI, I think that it makes sense to focus on producing a unified tool.

I think that the local offline mirror versus build artifact is not very hard of a needle to thread. The easiest thing to do would be to just do something like foo build wheel, foo build sdist, then the mirroring commands can be whatever. It would be great if there was a sub command for them too, so something like foo mirror as-is that just downloads them as is, foo mirror sdist that only downloads and/or produces sdists, and foo mirror wheel that produces wheels. However, I’m not super happy with the name “mirror” (or even really “as-is”) and I’m drawing a blank on better terminology, so it’s possible it’d make sense to keep them as top level commands like download (or fetch), wheel, sdist.

To further build on the assumptions I’ve made in this post, I also think it’s entirely possible to have “pip” be this tool, and I don’t think we need to further complicate the landscape by adding yet another tool. I think the only sort of backwards compatibility concern we’d have is if we ultimately decide to move the “local offline mirror” functionality into a sub command, but that isn’t required and if we did that, we could simply handle it as a normal part of our deprecation process.

Some other random thoughts about things people have said:

  • I’m not super worried about the size of pip or “bloating” the functionality of pip. Yes, we should carefully consider each feature we add and whether it’s something we should be adding or if it would be better off as a separate thing-- but I think that’s true of all projects and I think that this provides a much slicker UX for end users.
    • Specifically related to the size of pip, I think there is a lot of neat things we could do with our vendoring to try and reduce the total size of an installed pip such as tree shaking. We could also make it less painful (and solve other issues!) by threading the needle on making it possible for pip to target an environment instead of having to be run in an environment, and make it so everyone doesn’t have to have 237 copies of pip on every machine. These are harder and longer term projects than just saying “no” to new features, but I think they’re a better path forward ultimately.
  • No matter what, we should strive to move as much of pip’s functionality as we can into libraries (possibly ones that have their own CLI, ideally through python -m to make them more of an “advanced” feature) so that people can reuse this functionality in their own tools, or can even opt for more advanced usage where they want to use the underlying tooling directly and have more overall control.
  • The primary constraints that living in pip adds to how we solve this project are roughly:
    • No GPL code (can’t ship it in Python itself).
    • Must at least have a pure Python fallback.
    • We have to be able to vendor it, or treat it as wholly optional.
  • I think there is real risk if we tell everyone to use “yet another tool” and I think it is substantially easier to communicate new pip features, versus communicating entirely new projects. Particularly since one of the single biggest complaints I’ve seen from users is that it’s too hard to figure out which tool is the “right” tool, and that what the “right” tool is seems to change every time they look.
  • I think arbitrary plugin based commands to pip is probably a bad idea, and certainly not required for any of this. At work people can produce their own commands now and just call them pip-foo. There barely any difference between pip-foo and pip foo and it makes it clearer what is actually produced by pip, and what is a third party project.
  • This maybe does not solve the larger “elephant/platypus” problem as mentioned above, but I think it does move the UX needle drastically and does so with minimal churn. It doesn’t lock us out of solving that problem at a later date (either within pip, or even with the aforementioned wrapper tool). If anything I think adding this to pip makes it easier to solve this problem at a later date, because it would feel a lot worse to me to introduce yet another packaging tool to solve the platypus problem if we had just introduced yet another project to solve the “unified tool” problem.
5 Likes

My brain went to “fetch” before I finished reading that sentence as “download” doesn’t suggest a potential transformation as much as “fetch” (which I realize is a stretch, but “download” very specifically means “get me those bytes” versus “fetch me that thing somehow”). Otherwise “get” is another option.

And honestly to help alleviate pressure on pip itself. Distributing the load of things to packages helps let pip focus on UX, backwards-compat where necessary, and simply being the “face” of Python packaging. Otherwise everything else should be in a package that is implementing a spec.

But a key theme here is getting everything into packages with a programmatic API which is based on a spec so that some tool could rely on it. So that will require outlining the key steps in each feature to identify where a spec is missing and where a backing package with a programmatic API doesn’t exist (I have such an outline for downloading a wheel from PyPI which is what helped motivate my work on packaging.tags). And from there we can either try to focus on a specific feature to get going first or let everyone loose on all of them and as packages get implemented we can light up features in pip when possible.

But that requires consensus that this is the direction we want to go. :slight_smile: I’ll also say that if we do go this way it might help motivate people to contribute if these new packages can be Python 3-only (I will fully admit that adding packaging.tags to packaging almost didn’t happen because it took so much personal effort to motivate myself to port the code to Python 2 and to compromise/ruin the API accordingly).

1 Like

@pradyunsg thank you for starting this conversation.

Could you or others clarify what timeframe we need/want to make this decision in? Is this something we really oughta figure out in the next 2-3 weeks, or more like 2-3 months, or is there a solid, necessary intermediate step we can work on in the next 4-5 months while gathering data to make this decision?

I ask because this is partially a matter of user experience, and if we want to make a product decision in a somewhat rigorous way, we ought to take the time to do the kinds of things @nlhkabu has done for Warehouse – interview a sample of users (including some who don’t hang out in packaging discussions), watch them work, make maps of their mental models and user journeys, and make recommendations for the product and for documentation.

The Packaging Working Group is pursuing funding for several projects, including getting UX help with our command-line packaging and distribution tools. We have some different options we’re juggling that have different potential timescales. Thus my question.

2 Likes

is there a solid, necessary intermediate step we can work on in the next 4-5 months while gathering data to make this decision?

I’m inclined to say this one.

I was aware of the exploration for funding for UX work, and details on the user behavior and experience would definitely help better inform this discussion. However, I didn’t want to block us from exploring our options while waiting on that, since it’d probably be helpful for whoever comes in to have more context, as made available in this discussion.

At least one intermediate (but not strictly necessary) step is ongoing – the cleanup/refactor of the build logic in pip. I’m definitely working on that for now, regardless of what we decide here, and I imagine any major-ish changes to the status quo will likely depend on that being “reasonably done”.

1 Like

I’m gonna take the liberty of saying: having dedicated libraries for sub-tasks (like building packages, interacting with an index, getting information on distributions etc) is something we have consensus on.

AFAIK, we already have a fair amount of work ongoing / done on that front:

  • @cjerdonek and @uranusjr have worked together a fair bit on improving pip’s index interaction code, and my understanding is, the motivation for starting that work was to share that code at some point.
  • @techalchemy, @uranusjr and I have had multiple conversations about the resolver situation and we all agree that sharing as much code as possible, is the way forward here.
  • importlib.metadata goes toward simplifying getting information about distributions, decoupling it from pkg_resources.
  • @brettcannon’s work on packaging.tags.
  • I’m working on pip’s build logic currently, since it’s (still!) intertwined with the resolution logic in pip and it has become an annoying part of the codebase since PEP 517, which as a side effect, would make it easier to split it out of pip.

If anyone wants to discuss more about (the current effort of) moving code out into dedicated libraries that can be shared easily, please start a new topic.

I created Figuring out what is missing from dedicated packages for supporting downloading and installing a wheel from PyPI to handle the “download a wheel from an index/PyPI for a known package and version and install it” case.

1 Like

I think so too. Does anyone have any other functionality they view as critical here (or view anything here as non critical?

I feel like to whatever this tool built looks like, pipx style scripts/environments should be omitted from it for a while.

pipx does what it does, well. I don’t think there’s much merit in replacing it or wrapping it. And if we still do, IMO it should get namespaced behind “foo script install” or something like that, to avoid the concept collisions.

That also simplifies our life around the install/uninstall and run collision issues.

Good call! I was slightly scared that this discussion would go that way and be difficult to bring back. Thank you! :slightly_smiling_face:

That’s a very different can of worms and we can/should deal with it later, as Donald suggested.

I have a dumb proposal for dealing with this: rename pip wheel → pip wheelhouse, if we go down the road of pip build. (No, I’m not advocating for a pip build command, just saying that this is doable IMO)

IMO it communicates what’s happening, better than the current name.

I notice that this isn’t really blocked by anything. If someone wants to help figure this out, they can start a new topic to initiate that discussion.

Seconded. I was writing the same thing when Donald posted… so yay? \o/

1 Like

I’d like to re-check whether others agree with Pradyun, that we can gather data in the next 4-5 months while working on foundational improvements, and that we don’t need to make this decision right away.

I :heart: the 4-5 month post, but if you’re after a more explicit “yes”, here it is. :slight_smile: Otherwise you are probably after a poll to get concrete responses.

A thing I failed to note here earlier: poetry does serve as a do-it-all tool currently in the ecosystem, with not-insignificant adoption (~1/4 of pipenv based on PyPI download stats).

A lot of the broader CLI design and UX choices it makes, make sense to me, even though I disagree with some technical details of the functionality it provides.

@sumanah Can you clarify what the “this decision” is referring to? I wasn’t 100% sure because the original question is after a number of comments touching on different things.

Sorry @cjerdonek! I meant @pradyunsg’s original question:

and the 3.5 options he offers in the original post in the thread.

1 Like

I agree that we should figure out the right UX here. And @pradyunsg’s options seem about right.

However, I think there’s another (less UX-focused) aspect to consider, which is whether we put the build logic into “the tool”, or do what we’ve been trying to do in other areas, which is to make a reusable library, and then simply call that library from the “official” tool. That would probably mean putting work into making pep517 more robust and complete, and in particular making it the canonical place where “setting up a build environment” logic is implemented.

This I disagree with. If you’re doing <whatever tool> build foo, you could quite easily be building a wheel for foo to be used across multiple machines, possibly via multiple installers. So it 100% shouldn’t matter what installer is used to set up the build environment.

What is important (and I think this is what you were intending) is that the user should have an easy way to configure the options needed for that installer to run, and ideally those options should be automatically picked up from the config options that the user’s “normal” installer uses.

That may mean standardising an “installer configuration” format, or it may mean that the build tool needs a UX to say “use this installer for the build environment”. That’s up for discussion. (And yes, I do anticipate the possibility that if we make pip the build tool, someone will want to do something like pip build foo --isolated-env-installer=poetry:slight_smile:)

Maybe, but would need a lot of work. I’ve already gave up on using that library for tox. In it’s current form I find it makes caching and reusing build environments way too hard. I’m not sure if we can truly make it work for all. In it’s current form is mainly cli targeted, and pip use case.

Fair enough. I’ve put forward my perspective on this in a different thread (Build Environments for PEP 517 - #2 by pradyunsg) since this discussion is definitely OT for this thread, given how it has evolved.

Fair point - but would you agree that the logic involved is complex enough that we need some form of library to handle it? Or is your view that tools should decide to what extent they handle build isolation for themselves, and implement it internally? PEP 517 itself describes build isolation as something that tools SHOULD implement, not as a requirement, so there’s definitely a case for making it per-tool.

Since @dstufft has already sketched the semantics (which is the harder part) for transforming pip into a universal package manager, today I have tried to think about the syntax (which is the simpler part) and the various command inputs and steps, based on his thoughts:

So given that, a unified tool would need to, at a minimum, enable:

  • Allow installation and uninstallation of projects into an environment.
    • This includes commands for repeat deployments into multiple environments.
  • Introspect the current state of installed packages, and get more information about them.
  • Tools to produce a local directory of artifacts for offline usage.
    • This includes downloading from PyPI, producing wheels, and producing sdists. Basically an analogy of “install, but save to a local directory”.
  • Allow upload of existing artifacts.
    • This could also possibly include command line support for things like yanking a file, managing a two phase release, etc that we might add in the future.
  • Build artifacts such as an sdist from a directory, or a wheel from a sdist.

I think that’s it for what a mvp of such a tool would look like?

There are also additional features that one could imagine it having, such as:

  • Enable a “binary only” install mode ala pipx.
  • Ability to bootstrap a project, potentially from a user supplied template ala cookiecutter.
  • More commands to manage the development lifecycle such as a lint command, a test command, or a format command that would paper over differences between linters, test harnesses, or formatting tools.
    • This could really be expanded quite far, things like building documentation, running benchmarks, etc.
  • A generic “run a command” feature ala tox or npm run.
  • A way to run a command from a project in development, ala cargo run.

For the moment I have only considered the minimal commands of his first paragraph. Here after is my attempt.

Installing

$ pip install {project}
  • wheel → download → wheel + wheel dependencies → install
  • wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies → install
  • sdist → build → wheel → download → wheel + wheel dependencies → install
  • sdist → build → wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies → install
  • repo → build → sdist → build → wheel → download → wheel + wheel dependencies → install
  • repo → build → sdist → build → wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies → install
  • download → wheel + wheel dependencies → install
  • download → wheel or sdist + wheel or sdist dependencies → build → wheel + wheel dependencies → install
$ pip uninstall {project}

Inspecting

$ pip inspect {project}

Fetching

$ pip fetch --wheel {project}
  • download → wheel
  • download → sdist → build → wheel
$ pip fetch --sdist {project}
  • download → sdist
$ pip fetch --any {project}
  • download → wheel or sdist
$ pip fetch --wheel --dependencies {project}
  • wheel → download → wheel + wheel dependencies
  • wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies
  • sdist → build → wheel → download → wheel + wheel dependencies
  • sdist → build → wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies
  • repo → build → sdist → build → wheel → download → wheel + wheel dependencies
  • repo → build → sdist → build → wheel → download → wheel + wheel or sdist dependencies → build → wheel + wheel dependencies
  • download → wheel + wheel dependencies
  • download → wheel or sdist + wheel or sdist dependencies → build → wheel + wheel dependencies
$ pip fetch --sdist --dependencies {project}
  • sdist → download → sdist + sdist dependencies
  • repo → build → sdist → download → sdist + sdist dependencies
  • download → sdist + sdist dependencies
$ pip fetch --any --dependencies {project}
  • wheel → download → wheel + wheel or sdist dependencies
  • sdist → download → sdist + wheel or sdist dependencies
  • repo → build → sdist → download → sdist + wheel or sdist dependencies
  • download → wheel or sdist + wheel or sdist dependencies

Publishing

$ pip publish --wheel {project}
  • wheel → publish
  • sdist → build → wheel → publish
  • repo → build → sdist → build → wheel → publish
$ pip publish --sdist {project}
  • sdist → publish
  • repo → build → sdist → publish

Building

$ pip build --wheel {project}
  • sdist → build → wheel
  • repo → build → sdist → build → wheel
$ pip build --sdist {project}
  • repo → build → sdist

Notes. — For all the above commands:

  • The --wheel option is the default.
  • The --wheel and --sdist options can be combined.

All the above commands can build except the inspecting command.
All the above commands interact with the network except the inspecting and building commands.

Most of the commands look fine at first glance, although I suspect different ideas would appear when (and only when) it actually get implemented and used in the wild. One particular part I don’t really understand is inspect though; I didn’t see it mentioned up-thread, and there’s no explanation on what it does.

Also, linking back to When you kick the packaging hornet's nest on Twitter, the hornets seem to want an opinionated, KISS solution, there are some concerns (mine included) to implement the Platypus in pip. There are multiple reasons, including 1. the current design and implementation deviates too much from the proposal, and 2. most pip contributors do not want it to fit the proposed role.


Edit: This thread also kind of overlaps with Developing a single tool for building/developing projects (which is split from the Twitter hornet nest thread).

1 Like

I was again wondering where exactly we are at right now toward this Platypus thing, and ended up drawing this graph:

Blocks marked green are components we already have, and others yet to be standardised (has non-official competing solutions, or not even built yet). All blocks beside pip install and PEP 518 are clearly only needed by only either package and application development, which is a strong indication to me that we probably want two tools (or one tool with two distinct aspects), one for people releasing packages, and one for people installing them (package developers probably need both for their development workflow, but wouldn’t be using them at the same time).

Some personal thoughts:

  • There are a lot of talks about a new manifest and lock file format (i.e. declare dependencies in pyproject.toml, but those alone won’t solve the problem, and can wait until other things are solidified.
  • We are close to a universal package development frontend a la Flit’s command line interface. The only essential missing part is editable installs; others (e.g. incremental builds, extrapolating with external build tools) are all backend-only and can be improved incrementally.
  • Interpreter discovery (how to find Python interpreters in the host system) is a vastly underspecified space, especially on POSIX. There are multiple efforts now, including PEP 397 (py.exe on Windows), @brettcannon’s Python launcher, and Vritualenv 20.0’s Python discovery, but at some point some universal rules are needed for components to interop.
  • Given the relatively steep learning curve to virtual environments, the tool probably needs to hide the implementation behind some abstractions. Interpreter discovery from the previous point would help, but it still leaves the topic of how to manage multiple environments. I’ve been experiementing stuff in this area, but there more interest is needed.

Edit (2020-02-19): I added a new branch tool management, which is basically to standardise (and improve if necessary) pipx currently does. I feel most people already agree it is a good way to install Python applications (e.g. Black) if we continue to deploy them on PyPI, so what’s left is not much different from the virtual environment management thing (described in the next few messages) so tools can interop instead of relying pipx (or whatever we standardise it into) to support every possible use case.

1 Like

Can you expand on this? Not sure I follow.