Developing a single tool for building/developing projects

bernatgabor · November 2, 2019, 12:43am

Technically speaking poetry is this single tool. The other path is the Linux mantra of having one tool that does one thing only: this is the pip, twine, setuptools/flit route. The annual cadence is troublesome for non universal wheel packages only, and that tends to get complicated enough to need complicated solutions. Conda forge might help with that though IMHO.

emmatyping · November 2, 2019, 1:04am

I think it is also critical to point out that in languages that do things this way (npm is actually a good example of a tool that presents a unified interface for packaging that has been around for a while), you can use one tool to package most things. So flit would need to get support for building c extensions or such if it were to go this route. I think poetry is likely closer to the ideal of what most people want (save for that it is new, and so has all of the drawbacks you’ve listed).

Re the unix way, while it is nice to have dedicated tools, it adds a lot of overhead, and increases the odds of things breaking to have everything split up across a half dozen projects. (Where do I report issues to if I don’t know what project causes the issue?)

pf_moore · November 2, 2019, 9:26am

While this seems to be true, poetry seems to (as far as I can tell) exist in its own independent ecosystem. There’s very little participation that I’ve seen from poetry users on the packaging forums, so it’s hard for those of us involved with the more traditional packaging tools to get a sense of it. I’d like to know more about poetry, but I don’t have a currently project where I could use it, so it’s hard to get a start on learning about it.

The “one tool” idea does seem very prevalent these days, and there’s a lot of pressure on pip to be that “one tool”, as well. But I’m not sure it’s the right role for pip, so I’d love to get a better sense of what alternatives there are (I know about pipenv, for example, but that seems to address a slightly different problem - application development rather than library development).

uranusjr · November 3, 2019, 9:13am

I’ve been having thoughts recently about a lot of things mentioned in this thread, so forgive me to lay them out kind of independently to other posts here.

Easier configuration is sorely needed

I’m surprised nobody has mentioned MANIFEST.in yet It is definitely the one most confusing thing when I started, and to this day I still need to look up a lot of things to include files in my package.

Editable installs

I used to be +1 on this, but more problems seem to arise the more I think about it. We all agree install -e can never be perfect. It also (at least setuptools’s implementation) has a significantly different enough behaviour to trip newcomers into releasing broken packages (MANIFEST.in mumble mumble).

And then there’s extension package. The main virtue of install -e . IMO is you don’t need to run an additional command to sync source changes into the environment. But extensions don’t build themselves, so you still need to run one additional command (setup.py build_ext or whatever) to see the changes. And at this point you’d do as well using regular install instead.

So now we have two workflows. Pure Python packages want -e . so they only need to run one command (e.g. python -m mypackage) instead of two, but extension packages are stuck with two commands either way (built + run).

The conclusion I drew is that instead of trying to make -e . work, it could be easier to come up with a way to wrap the (build+)install+run process, so everyone can do their thing in one command, and also avoid the editable thing altogether.

`pip` is in a weird place

A popular opinion among Python users is to stuff more commands into pip, but it is quite awkward from pip’s perspective. pip is designed to operate inside an environment, around manipulating stuffs in that environment it resides, and all other things not directly related this particular task feel bolted on and out of place. In other words, pip is not really an equivalent to e.g. NPM and Cargo, which it gets compared to often, but only a component of them. Recent efforts to improve pip are also in the other direction, moving things not directly coupled out of pip (see PEP 517).

An all-in-one developer tool is still needed

It is just easier for users to need to learn one single tool. That does not mean we need to develop that tool in a single project, but PyPA need to be able to recommend something that ties all the tools together. Flit is doing a lot of good things, but AFAICT there are conceptual differences (e.g. support to src/) that prevents it from being PyPA’s best choice. PyPA has most pieces (exception is setuptools configuration through pyproject.toml). I’m tempted to start building a tool to wrap them all, if that sounds like a good idea.

pf_moore · November 3, 2019, 9:52am

I very much agree with this whole paragraph. There’s a lot of conflicting pressures on pip, and it’s suffering from a bad case of trying to be all things to everyone (in my view). To the extent that I’ve seriously considered building a completely new build frontend that ditches all of pip’s complexity and focuses on doing one job (installing packages) really well. With pip focusing on pushing as much core functionality as possible into standalone packages, this seems like a viable possibility.

Whether I’ll ever have the time to do more than think about such an option, I’ve no idea.

Maybe…? My biggest reservation would be that I don’t think pip is a good tool to wrap, in practice. It’s big, complex, and vendors a lot of stuff that a wrapper would also need, so there would be a lot of duplication¹. My advice would be to build small tools that individually do bits of what pip does (depending on other projects without vendoring, which makes them more composable, but not self-installable) and then build your wrapper based on those rather than on pip itself.

But maybe practicality beats purity here, and wrapping pip as it stands has the advantage of being practical, if a bit more clumsy.

¹ I know that “disk space is cheap”, so vendoring shouldn’t be that important, but working in an environment with real-time virus scanners, heavily proxied network connections, etc, the size of pip is a real issue for me. Time to download pip, and then copy all of its files in place, can be measured in minutes in the worst case, and is typically over 30 seconds. Even pip’s caching doesn’t help that much, unfortunately.

steve.dower · November 3, 2019, 3:37pm

This says to me we ought to be making editable installs better at a different level.

Fundamentally, it’s a workflow thing. I don’t think it belongs in pip or PEP 517, because it’s only going to be used while developing the app. At the point, you should know which backend you’re using and it can provide a direct option (to configure/enable it, including setting up automatic rebuild of extension modules, which again, is part of the backend and not pip).

The problem here is that pip is too official. Whether the project intended it or not, it is now in a monopoly position that makes it incredibly difficult to gain traction with another implementation (unless you rely on the cult of personality, which is one of very few things that can overcome being the official standard).

We have literally seen people propose features for pip that could go elsewhere just because getting into pip is the only way to get use.

I’ve proposed some myself! (I see teams at my work regularly investigate the Python community and figure out our norms. The results are often telling… but pip definitely stands out as something everyone contorts themselves through even when the alternatives may be better)

I don’t have a solution for this, but I think it’s important to be aware that by default, the world expects pip to be sufficient for all their scenarios.

pf_moore · November 3, 2019, 10:49pm

I think it’s also important to be aware that this is a huge problem for pip as well. We’re under severe pressure to cater for every workflow and scenario, regardless of how far from our core purpose they are, or how much catering for them might damage pip’s supportability.

We regularly get requests for new functionality that can easily be implemented as a couple of lines of shell script, or writing a custom script that uses libraries like pkg_resources directly.

I think this is too pessimistic. I agree that pip is a monopoly, in a way that blocks competition and stifles innovation, but I think this can be changed. The problem at the moment is that any attempt to address any part of the packaging landscape gets measured against “it doesn’t do everything pip does”. That’s not because “everything has to do what pip does”, but because no-one has properly defined limited but useful subsets of pip’s functionality. Some examples (possibly bad ones, these are just off the top of my head):

Install packages, but only from PyPI or other indexes, and only wheels. That would likely be entirely suitable as a “bootstrap” installer, and as such would be a much lighter weight default for shipping with Python and virtualenv, for example. It would also solve the problem of installers needing to vendor all their dependencies to avoid bootstrapping issues.
Build wheels from local source trees. We already have pep517.build offering this type of functionality, and it’s getting some use (sufficient that the fact that it was never intended as a “real” tool but only a prototype, is becoming an issue).
Create an editable install of a local source tree.
List currently installed packages in various forms.
Check PyPI (or other indexes) for newer versions of installed packages.

By defining these as specific use cases, we can:

Offer them as specs for new tools to work to, basically pre-packaged use cases if you like.
Have them as “building blocks” that people can use to build higher level tools or workflows from, with a level of security in the sense that they know that these are fully supported scenarios.
Provide “sample implementations” right now, in the form of specific pip invocations which reproduce them. That means that people don’t have to wait for new tools to be written to work in terms of these building blocks.

By framing the problem like this, it becomes more of an interoperability problem - we’re defining interfaces that tools and workflows can use reliably, while still leaving open the possibility of different implementations.

There’s no doubt this would be a lot of work, and it would need effort from people who are already stretched way too thinly. But it is a viable approach (IMO), and probably one of the few ways we’d manage to gradually move to a more open toolset (as opposed to the “big bang” switch to some new cool thing that probably would only happen via the sort of explosion of popularity that you describe as “cult of personality” based change).

njs · November 3, 2019, 10:53pm

I think this kind of “all-in-one python developer workflow” tool is definitely a productive direction to go in. (I think this is what @dholth dubbed “the platypus”?) I think it would be awesome if someone would take this seriously. I volunteer to beta test :-).

This is reminding me of this previous discussion:

brettcannon · November 4, 2019, 7:53pm

I think part of the problem here is both pip and wheel have taken the “no API” stance which makes building out equivalent tools a massive slog to re-implement what they already do. I have been maintaining a list of what is required just to install a wheel from PyPI because I would like to make a bare-bones competitor to pip that has an API and there’s still some potential gaps from the perspective of trying to reuse other projects for the relevant functionality (some of which I’ve tried to plug, e.g. packaging.tags):

Check if package is already installed (spec / importlib-metadata)
Check local wheel cache (? / ?; how pip does it)
Choose appropriate file from PyPI/index
1. Process the list of files (Simple repository spec / ?; PyPI JSON API / ?)
2. Calculate best-fitting wheel (spec / packaging.tags)
Download the wheel
Cache the wheel locally (? / ?; see local cache check for potential details)
Install the wheel
1. Choose which directory to install the files to (e.g. user versus system/virtual environment; ? / ?)
2. Install the files (spec / distlib.wheel)
3. Record the installation (spec / ?)

And this doesn’t even touch reading an sdist that uses pyproject.toml to produce a wheel which is what a build tool would need to handle.

Although this is all very messy in the face of editable installs and the fact that people have the concept of sdists because it means build tools need to know how to invoke an install tool and vice-versa. And I suspect this is why people keep saying “why doesn’t pip just do this?”

So for me, to solve this universal tool idea I think we probably need to figure out exactly what that tool needs to do and whether we have packages that provide that functionality through an API. Once we have that then we can start putting a nice CLI in front of those packages that let people do things with a single tool.

uranusjr · November 4, 2019, 10:18pm

There were some efforts cleaning up pip’s implementation around here. I believe it is in a state that a new implementation can copy LinkCollector with minimal change (to replace the use of Link).

There’s pep517. Last time I checked it is not in a production-ready state, but the work would be valuable for a pip competitor.

I know at least several people have expressed interest of working on a “pip competitor”, but all efforts stop short since there’s just not enough resource to put in them. pip is well over the scale of what pure volunteer work can handle, and pure volunteer work is just not enough to catch steam competing with existing solutions.

sdispater · November 5, 2019, 8:54am

Author of Poetry here!

Like some of you mentioned, Poetry is the closest thing we currently have to a single tool for managing Python projects, from scaffolding to publishing. That’s the main reason people use it. And that’s the main reason I built it.

Poetry tries to alleviate the pain some people have with the current ecosystem. It does this by pretty much making a clean cut of the existing tooling (except pip which is used for the installation part even though it will likely change).

Based on the feedback I received on Poetry here are the reasons people like it:

A single tool to manage Python projects (applications or libraries) which is on par with what exists in other languages
An intuitive Command Line Interface (inspired mainly by Cargo and Composer)
A working dependency resolver
Environment management

A lot of work has been put into it and it’s gaining traction so if it can pave the way to have a somewhat official tool it would be great.

I am willing to help with that (or if people are interested in helping out with improving Poetry to build on it, they are welcome).

I haven’t participated until now since I wanted to stabilize Poetry, and build the tool I always wanted to have, but now that Poetry is close to its 1.0 release I think I can contribute a little more to the discussions.

Like I said above, when I started Poetry I didn’t want to make it an agglomerate of existing tools. It was mainly due to the fact that I wanted an intuitive and unified CLI for it. However, I based my work on the existing PEPs so everything generated by Poetry is compliant with the existing ecosystem. So, Poetry does not live in its own independent ecosystem, otherwise it wouldn’t be used as much. That being said, I can’t deny that sometimes Poetry implemented its own features (like caret or union requirements) that are not part of any PEP but are useful. That’s one of the advantages of building a tool from scratch: you can free yourself from some of the existing constraints.

That’s true that I haven’t participated much to the discussions up until now. One of the reasons I mentioned was that I was focusing on building Poetry and making it relevant. I was pretty new to this whole packaging thing when I started Poetry. I knew and I was using the existing tools but that was pretty much it. I didn’t, and I still don’t, have the accumulated knowledge that some people here have so I didn’t think that I had anything relevant to add to the discussion except my vision of what an intuitive, single tool should look like.

Now, if you have questions about Poetry and some of the decisions I made, feel free to ask. I 'd gladly answer.

pitrou · November 5, 2019, 9:00am

What has happened with distlib? It was once heralded (by @ncoghlan in particular) as the future basis for ground-up distutils / setuptools replacements.

(while “told you so” isn’t a very interesting statement, I’ll point out that I was skeptical at the time, for good reason it seems )

dholth · November 5, 2019, 4:53pm

@sdispater sorry we couldn’t get poetry on the Platypus sticker! We thought about putting in a sheaf of paper but I guess we ran out of room.

There is already an API for making “pip install -e .” work with pyproject.toml but it entails having a command-line-compatible setup.py to do the work.

brettcannon · November 5, 2019, 7:29pm

It’s still being developed by @vsajip, but it’s scope is huge (e.g. Vinay maintains a back-end that mirrors metadata to his site).

brettcannon · November 5, 2019, 7:33pm

I think the first question for me, @sdispater, is where did the PEPs fall short for you? I really appreciate that you tried to stick to the PEPs so that poetry’s workflow could do what you wanted but still produce output the rest of the community could work with! So now the question is where did the PEPs come up short so we can consider filling in those gaps?

vsajip · November 5, 2019, 7:42pm

I maintain distlib, but there’s no feature development going on currently. Some of the PEPs it was based on became moribund (as before in packaging, there are many opinions and hard to get consensus) and the window of time I had to contribute mostly closed. I don’t think the scope of it is particularly big - the mirrored metadata is mainly to investigate/support better dependency resolution - which works reasonably well, as far as it goes, but wider adoption would depend on e.g. pip wanting to use it.

ncoghlan · November 6, 2019, 4:42am

For folks wanting to see the scope of a true “all in one” tool, the one we reference from the end of https://packaging.python.org/tutorials/managing-dependencies/ is hatch , since that covers project templating, release management, etc.

The main problem with “all in one” tool proposals is that they can only have one default behaviour, and that’s OK if you’re able to get in early into a relatively new ecosystem (as folks that don’t like the defaults will self-select out of the ecosystem entirely), but mostly doesn’t work for getting adoption in an established ecosystem.

pip was only able to pull it off because it was try to replace a specific existing tool (easy_install) that had defaults that were a long way from what most people wanted, and the switching costs for that usage model had been deliberately kept low.

By contrast, most of the tools we have now are explicitly making different core design assumptions. For example:

hatch: provides answers for everything Olek wanted an answer to when designing a project management process
pipenv: dependency management that doesn’t assume the project itself is a Python component, but does mostly assume one target environment per repo
poetry: dependency management that does assume the project itself is a Python component
pip-tools: a lower level DIY dependency management toolkit that doesn’t make any more workflow or target environment assumptions than pip itself does

sdispater · November 6, 2019, 8:23am

If we exclude the shortcomings of PEP-517 regarding editable installs since it’s a well known subject and a tricky one, I think the biggest limitation I found was PEP-508 (https://www.python.org/dev/peps/pep-0508)). From the start, I wanted Poetry to provide “or” or “union” constraints for dependency specification, for instance requests==2.20.* || requests==2.22.*. This is not possible in the current ecosystem so it’s discourage since it would lead to metadata not compatible with the existing tooling. Where it can be used without too much of an issue is when declaring the python compatibility of a project. Basically, you can declare python = "~2.7 || ^3.4" and Poetry will format it to an understandable constraint: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*.

sdispater · November 6, 2019, 8:30am

@ncoghlan

We had this discussion before and I mentioned back then that while the scaffolding part of Poetry (the new command) and the packaging part (the build command) assume a somewhat standard package structure (even though it can be adapted to more complex structures via the packages property in the pyproject.toml file), the rest of the commands do not and you can manage the dependencies of any Python project, being a library or an application, with it.

I am really curious to know why you keep thinking that Poetry was not suitable for applications since that’s something that I mentioned from the start.

brettcannon · November 6, 2019, 6:21pm

Do you want to start a separate topic to have a discussion about this to see if sentiments have changed?

Developing a single tool for building/developing projects

Easier configuration is sorely needed

Editable installs

pip is in a weird place

An all-in-one developer tool is still needed

`pip` is in a weird place