Python packaging documentation feedback and discussion

steve.dower · March 14, 2023, 11:51pm

Each file has a source location, a destination name (partially inferred from its nesting level in the package definition, optionally from the source name), and additional metadata that determines how it gets processed (e.g. does it contribute to incremental rebuild analysis? Does it need to be included in the sdist? In wheels? And a much longer list of things that apply to native code compilation and wouldn’t ever be standardised).

Right, hence my earlier suggestions about making it easier to compare than the current list of packages on packaging.python.org. We don’t have to pick a winner to make the choice easier, and I think we can push most of the work onto the packages (if they even care about justifying/advertising themselves).

rgommers · March 15, 2023, 12:10am

No, it really doesn’t work at all, it’d be pointless duplication. For one, users may be using the Meson CLI directly for dev work, at which point pyproject.toml is ignored - the info therefore must already be in the meson.build files, also for .py files.

And how do you know that you’d want all .py files? The original question/idea here doesn’t seem to make a difference between selecting what goes into an sdist and into a wheel. Those aren’t the same thing.

BrenBarn · March 15, 2023, 1:54am

I think that is quite far from perfect. As I mentioned in another post, if the tabs are meant as examples, the tutorial needs to explicitly say that. Otherwise it looks like it is offering an exhaustive list of options.

Even so, I still do not see any reason to show more than one at this particular point in this particular tutorial. A tutorial on “how to package a simple Python project” does not need to show multiple backends. It should decide on the one that is best for that purpose — perhaps including simplicity, breadth of adoption, and ease of transitioning to more complicated projects — and show that one. There can be a separate tutorial about why someone might need others.

If you’re saying that you think the tutorial is fine because no one raised an issue on the issue tracker, I think that is not a justifiable conclusion. Many people may have given up. Others may have understood after spending more time than they ought to have. If we’re lucky, some asked in some external forum (like this), or even on this very website.

Now, admittedly not all of that can be traced directly to this tutorial, but there is no doubt in my mind that there is widespread sentiment that Python packaging has too many ways to do too many things and it’s hard to navigate them all. I’m sympathetic to the view that that situation is in some sense a consequence of Python’s success and the wide range of ways it’s used. But confronting users with multiple backends in a situation where they don’t need them only exacerbates the problem. In a tutorial that only needs one build backend, there is no need to say anything about any others except “click this link if you want to learn more”. There should be one and only one obvious way to do it.

henryiii · March 15, 2023, 3:38am

Neither of these are referring to the tutorial. One of those is someone trying to figure out what to go to from setup.py, obviously a more advanced user than the tutorial targets, and they are listing things like Poetry, which certainly they didn’t get from this page! That doesn’t even support PEP 621 yet. Some of their questions would be answered by reading this page, and the others would be helped by the comparison guide we’ve been talking about.

The other one is referencing a different page “flow” that sounds like it’s more a guide style page, and confused by it. That’s actually what we are talking about adding more of, actually. They also mention it shows 7 different tools - this tutorial shows 4 and tells you to just use hatchling.

Both cases are looking at different pages, yet instead of working on those pages, we are arguing about a simple selector box that simply reflects the current diversity of the ecosystem. One we worked very hard to cultivate. Do you really want “only one way to do it”? We had that. It was distutils/setuptools. It was a mess. Setuptools was stagnant, and couldn’t improve without breaking everyone. Numpy had over 13,000 lines of code related to patching this for building. There was no innovation for years. Many use cases weren’t well covered and required extensive workarounds. Adding competition here encouraged innovation - including in setuptools. It’s really hard to move an existing packaging tool forward without breaking things. It’s much easier to innovate if you allow competitors - and then everyone may start adopting the best ideas. The other languages with “one blessed” way to do things almost always are much newer languages like Rust and Go, and their tooling takes a lot of inspiration from things that didn’t exist when Python was born. And I still don’t know the difference between npm and yarn. Eventually, we might end up with everyone using one tool because it’s the best and covers most use cases, and the other tools just start recommending that tool. With standardization like PEP 621, it’s easy to switch!

By the way, this sort of sentiment well predates the change to this page - I’ve seen it for at least a year before the page was changed. And that sort of “what should I use” question almost always mentions Poetry (which is the second largest backend in use, behind setuptools, IIRC), which means the person expressing it is not just confused by this little selection box.

So we need to pick the “one way” to plot and tell all other plotting libraries to stop? Or one web framework? Or one ML library? The “one way to do it” refers to the language, not the ecosystem, and packaging is part of the ecosystem. The standards around packaging, like PEP 517 and PEP 621 are the equivalent of the “one way”, but the backends and frontends are like packages on PyPI - actually, they literally are packages on PyPI. Python’s never claimed to have some sort of ruling authority that picked the “one package” to do something.

Let’s work on improving the aging pages, the pages that compare libraries, the more advanced guides. The tab box was discussed for over six months, we made a decision, and I haven’t seen the (then predicted!) torrent of problems, complaints, and confusion that was supposed to ensue. It does clearly say “this tutorial uses hatchling”, and that’s what the issues seem to show people are doing.

This is going off track (IMO) for this discussion, which was not originally supposed to be about removing that tabbed box from the docs. I’m getting repetitive, so I’ll stop.

BrenBarn · March 15, 2023, 3:53am

If the tutorial says this as the first sentence:

This tutorial walks you through how to package a simple Python project.

Then yes, in that tutorial there should only be one way^[1].

None of what you have said changes my opinion on that matter. Whether we actually want one way to do things overall is a different question (to which I personally answer yes), but that’s not the issue here. There is no benefit to mentioning anything in this tutorial except what people specifically need to accomplish the tutorial’s stated purpose. Other things can go elsewhere.

or any choices that the user must make must be fully explained in that tutorial ↩︎

cameron · March 15, 2023, 4:56am

As the person cited, the end result of that post was this:
https://packaging.python.org/en/latest/flow/
which is intended as a the conceptual publication flow so that people can decide what tools they want to use in context. I hoped it would be read before the tutorial, at least potentially.

Any individual tutorial/howto will inherently either use a particular tool as its exemplar or be very complicated.

I hoped the “flow” doc would let you end up with a mental model of how a package gets from an author to an end user, and which bits of that flow you’re intending to implement. Then you can pick a tool which fits how you’re going to approach things (eg easy flit(?) or flexible setuptools). The first tool you pick needn’t be where you end up in the long run.

oscarbenjamin · March 15, 2023, 10:08am

I’m not sure what you mean by the “original question/idea” here but if you mean me suggesting that there should be a standard way to specify in pyproject.toml which files to include then I meant for it to be able to handle the things that are already handled by flit, poetry, pdm etc just in a uniform way. So firstly it wasn’t intended to cover cases where someone is using meson or pymsbuild etc but rather to be something simple for pure Python projects. Of course it’s interesting to know how it could compare for a meson-based build but that’s not really the intended target.

@steve.dower might find that packaging pure Python packages is not something he does very often but the majority of Python packages made are pure Python and it represents the 90% easy case. That’s why we already have so many competing tools that only really support that case.

I think all of flit, poetry, pdm, hatch do allow specifying whether files should go in the sdist or wheel. They all just have different ways of writing include, exclude etc in the config file to make a specification of source-destination mappings and a couple more bits of information.

They differ mostly in exactly how the config options are spelled and which bits of information are specified implicitly or by default and exactly how they handle VCS ignore files. I suspect that they probably also differ in more subtle ways meaning that switching between them would be nontrivial even though they are basically all doing the same thing.

rgommers · March 15, 2023, 11:54am

Okay, so my $2c here is:

This kind of interface alignment where possible is good and a useful way to spend energy. Making the UX of tools that offer the same concepts more uniform makes those tools easier to learn, and hence this benefits users.
This is not PEP material. There are no consumers of this information beyond the individual backends themselves, it’s limited to a subset of (simpler) backends, and the user is using one at a time only. It’s irrelevant from an interoperability point of view.
Therefore, I suggest to simply go do it: figure out for the pure Python backends what the commonality is, where they differ, and what the preferred way of spelling should be where there’s a difference. Then go to the individual backend authors and get a few of them to agree, then roll it out.

pf_moore · March 15, 2023, 12:45pm

I’ll add one qualification here. The standards reserve all namespaces except [tool] for future standards, and say that individual tools must use their own section under [tool]. So backends can agree a common convention under [tool.<backend_name>], but they can’t share config directly.

Simply having a common agreement seems like a reasonable compromise to me, though.

brettcannon · March 15, 2023, 8:57pm

Unless the PyPA as a group decide to have a single tool (which is a separate topic), we chose a while ago to not make packaging.python.org express a singular, definitive opinion on tooling. So unless there’s some desired clarification to make things read better, I don’t think worrying about the selector boxes on that tutorial page is worth discussing anymore.

I personally would rather talk about what things at packaging.python.org are outdated and need to just go, what needs a refresh, and what’s missing, and all without projecting a stance the PyPA does not generally hold.

BrenBarn · March 15, 2023, 10:23pm

There is, and I have said it explicitly multiple times already in this thread.

Note I am not (right now) suggesting that PyPA express any opinion on tooling as a whole. I’m simply suggesting that in an individual tutorial which is supposed to show people exactly how to do a specific thing, it should show them one way to do that thing.

By doing so, the tutorial would not be endorsing or favoring whatever backend it uses — no more than it currently endorses or favors the MIT license by including that as the example license to use in the package. It is just an example of a way to do it. In order for the tutorial to successfully walk the user through how to do it, some simplifying choices have to made for the purposes of that tutorial, and those choices should be made, not dodged, for the purpose of that tutorial.

cameron · March 15, 2023, 11:30pm

As a newcomer to this thread, it seems to me that you want the basic
tutorial to a) describe exactly one way to do package something simple
b) describe the so prescriptively as to seem the be the one way and c)
not endorse of favour a particular backend.

To me, those objectives seem inconsistent.

I’ve just looked at the current tutorial, and it’s been improved since I
last visited: it shows a few alternatives, first a Windows vs UNIX
install command and then an example [build-system] clause with
alternatives for 4 popular backends.

From that point onward there are only UNIX/Windows command line
alternatives and it looks like there’s no backend-tool-specific stuff at
all.

I think (correct me if I’m wrong) that you want the tutorial to not
burden the user with the pick-a-tool choice. Certainly, that can be a
hard choice to make as a newcomer to packaging, and I for one have
indecision problems when faced with several choices when I don’t think I
know enough.

But I’m against making to tutorial use only one backend. The choice of
backend is so simple that I’m in favour of the “here are 4 possible
backends” example.

What I do think is lacking is a preamble paragraph up the top which
says something along the lines of:

 This tutorial shows how one can package a simple project.

 There are several available packaging tools, and this example 
 chooses one in the `pyproject.toml` configuration file; suitable 
 clauses for 4 common tools are presented and it does not matter much 
 which backend is chosen for this example.

My intent here is to clarify that there are choices presented in the
tutorial but to head off the difficult “pick the best choice”
indecision conflict this can raise in a reader.

The difficulty in my mind of offering exactly one tool is that it
implicitly favours/endorses that tool. I’d much rather it was clear
that there are several choices and that for a simple project it doesn’t
matter which is chosen, and that that choice is easily changed just by
fiddling the config clause.

Cheers,
Cameron Simpson cs@cskk.id.au

BrenBarn · March 15, 2023, 11:52pm

Perhaps I have confused things by mixing two goals.

In general, I am advocating for a greater move toward B (although not all the way there) as a philosophy across the entire packaging landscape.

For this tutorial specifically, I am advocating for A and C as an interim measure.

CAM-Gerlach · March 16, 2023, 12:03pm

Just to give a bit of historical context here, the reluctance for the PyPA/PyPUG to endorse a single all-in-one packaging tool to rule them is more than just theoretical—it previously tried exactly that, with pipenv…which, to make a long story short, resulted in a huge controversy and backlash. Of course, some of the reasons that blew up were unique to that particular situation and the dramatis personae, but it’s not easily forgotten by either the PyPA folks or the community, and casts a long shadow over any future such proposal.

lwasser · March 16, 2023, 3:11pm

it sounds to me like that are some community challenges with what PyPA can and can’t do here and i think that is good to recognize!

I wonder if a few things might be helpful (just my two cents:) )

What if PyPA as a group considered the feedback and created a scope for their guide that was clearly articulated somewhere online (or maybe it’s part of Pradyun’s newsletter idea). the scope could clarify the goals of the guide, the audience what it is and what it is not.

Maybe PyPA focuses on the technical implementations of packaging, translating (very difficult to digest but very important PEPs) into more clear language so the community is clear about standards that are important to follow. And which are just opinions

you could even highlight on those pages (if you wish) which tools adhere to the standards as you learn about them - might be too much just an idea.

I think it’s important to track back to @brettcannon original statement about updating the current content as that would be a wonderful first step here along with some communication around the scope and audience of the guide right now.

And then to @BrenBarn and others points here. i do think it’s clear that some subset of the community would love a tutorial or an opinionated guide. so let’s make note of that being important. maybe we (pyOpenSci) can fill that role as ONE example for scientists that anyone else is welcome to follow! and we will follow all of the PyPA standards.

Other excellent tutorials may also arise in the future as the ecosystem evolves and can be linked to.

Perhaps you have some criteria about what you link to - a few ideas there:

it needs to be maintained / updated over time
it needs to be community developed / reviewed (or something like that)
it needs to follow x,y,z pep standards defined in the PyPA guide such as how to add metadata to the pyproject.toml …
others?

and so that type of guidance allows you to create a process for how the documentation evolves. Publish that criteria as well!

just a thought. id love to see this conversation lead to some tangible updates to currently content as i think others have suggested here including @henryiii and @pf_moore …

my two cents

steve.dower · March 16, 2023, 3:49pm

Big +1 to this idea. Workflow recommendations for specific audiences would be perfect, and should be encouraged and endorsed (and then, we just have to make sure that the PyPA standards don’t constrain those recommendations to the point where it hurts users).

sinoroc · March 16, 2023, 4:02pm

That seems like a reasonable way forward to me as well. The content of Python Packaging User Guide could be re-focused on the standard specifications, and the general concepts of Python packaging, while trying to avoid being opinionated and naming specific tools unless necessary. Some pages might need to be deleted or their content cut down. The guide could then link to some of those more opinionated guides (after loose review). I could see it work.

brettcannon · March 16, 2023, 9:38pm

Assuming this is the general direction that people are comfortable with (I know I am), I think a key topic at the packaging summit at PyCon US is going to be how to drive this work and make sure packaging.python.org has enough support to see it through and continue to keep its content fresh.

BrenBarn · March 18, 2023, 4:44am

I do think the resources that you’ve linked to in this thread are moving in a good direction, and it would be awesome to see more of that. I think such resources become immeasurably more useful if they are prominently placed as part of the documentation on python.org.

It seems to me there is a bit of a catch-22 in what many participants in these threads (especially some of the more knowledgeable “insiders”) think about the mission of PyPA and/or the Python distribution (by which I mean the stdlib and things like pip that, although not part of the stdlib, have some kind of official status with regard to Python). On the one hand, I do not see much uptake on the idea of dramatically expanding the scope of what Python provides out of the box. On the other hand, I also see reluctance to have python.org include documentation that is not about the tools that Python provides out of the box. This is the sentiment of “the documentation for tool X documents tool X, it’s not supposed to tell anyone how to decide whether to use that tool”.

The problem is that this kind of “how do I decide” documentation is exactly what is missing for many users. People can search the internet for something like “how to install a Python library” and find various web pages mentioning pip, conda, poetry, and who knows what else. But it’s hard to find anything telling them how to winnow that down to get something that will work for their needs. (And a lot of what they can find is junky articles with clickbait headlines like “Why You Should Be Using X NOW!!!” that don’t really provide a cogent analysis of the choices involved.)

In addition, in some of these discussions there have been blurred lines between what I see as three different dimensions of “packaging tools”, one of which has subcategories:

Installing packages
Managing environments
Distributing packages
a. packages that are pure Python (even if they depend on non-pure-Python packages)
b. packages that actually require some non-Python compile/build operation as part of their own build process

My sense is that #3b is the hairiest one, and a lot of discussion has focused on that. But I think this is also the one that affects the smallest number of users. So I would hope that, even if we can’t all agree on recommendations for an all-encompassing build tool, we could still make more progress on concrete recommendations for items 1, 2, and 3a above.

This would certainly be better than nothing. I sort of see the options like this:

Plan A: Python provides a tool that meets users’ needs^[1]
Plan B: Python doesn’t provide a tool that meets users’ needs, but Python’s official docs do help people find and use third-party tools that meet their needs. (This could include directing them to third-party docs, but in my view it would still mean that the official docs at least include some kind of “feature matrix” or flowchart type thing that guides people to the tool that is best for them, rather than just saying “Here, you go ahead and visit all these links and decide for yourself”.)
Plan C: Python doesn’t provide a tool that meets users’ needs, and the official docs don’t tell them how to find and use tools that will meet their needs, but at least the official docs tell them up front “We’re not going to help you find and use a tool that meets your needs.”
Plan D: Python doesn’t provide a tool that meets users’ needs, and doesn’t tell them how to find and use something that will meet their needs, and doesn’t even tell them that’s it not going to do that, but just tells them how to use the tools that Python does provide, whether those meet their needs or not.

We could then crosstab this with the dimensions of packaging I mentioned above (i.e., the “tool” in Plans A-D could be a package installer, or an environment manager, or a build tool).

From my perspective, the current Python packaging documentation is at a level of basically Plan D+ on all dimensions. There are some references to other tools sprinkled in the docs, mostly buried in unobtrusive places. There are almost no acknowledgements that the documentation is only telling people one way to do things, despite the fact that many people choose to use third-party tools^[2] because the ones Python provides aren’t good enough (especially, I think, in dimensions #1 and #2).

What you’re suggesting sounds like upgrading everything to Plan C+? I’m not clear on which parts you’re proposing to include in the docs vs. just have linked as an external resource.

I think that would still leave a huge number of users feeling disappointed, though. I’ve not given up hope that we can shoot for at least “Plan 1B+2B+3C” — that is, official documentation that fully incorporates information and recommendations about third-party package installers and environment managers, and at least acknowledges its limitations with regard to build tools.^[3]

Finally, although I’ve mentioned this before, in my view the most important about the docs, by orders of magnitude, is the extent to which resources are reachable in a clearly signposted “drill-down” way from docs.python.org. Having a link to some third-party tool shoved at the bottom of an inner doc page, with little or no explanation of what it does or why to use it, is not the same thing as having big text on the top-level “How to install packages” page that says “This page is about X. You may be interested in Y or Z for the following reasons. . . [actual explanation/guide here].”

So all in all I think @lwasser’s idea is a step in the right direction, although not as big a step as I would hope we’d be able to take.

I don’t insist it has to meet every single person’s needs. I’m envisioning a 90%-confidence-interval situation. ↩︎
of which it seems poetry and conda are the most widespread ↩︎
I mean, I haven’t given up on Plan A either, but that’s just because I’m a dreamer. ↩︎

lwasser · March 20, 2023, 3:07pm

@BrenBarn i very much agree with you! i think users do want a unified approach. and i think most of us WANT to look to PyPA. Opinionated content on Python Packaging User Guide — Python Packaging User Guide would hold much more weight than on pyOpenSci’s. My thoughts FWIW :

community work reaching consensus is hard. especially when there is no defined decision making process that already has community by in and when there are so many tool options (and opinions)
it feels to me that PyPA has been in a really difficult position for some time where push back from the community makes it difficult to present a single authoritative approach (i can’t speak for PyPA that is just what i’m seeing communicated in threads)
PyPA wants to support the community and has excellent intentions. They worked hard on pulling together what is there now and did experience push back when they tried to be more opinionated. it’s not trivial to pull together technical content especially when it’s in the “public” eye! but its easier to critique such content.

Given the above: can we move the needle in the right direction to get started to create momentum rather than continual circular conversations / arguments etc?

and once that starts happening is it possible to ALSO

collect more information to better understand what the community needs (new tools, better tools or just more work put into and more support for - thinking bus factor - for existing tools). Thinking about pradyun’s blog etc etc.
get the broader community (including all of us here) on the same page surrounding this shared vision of an authoritative approach knowing that some may not LOVE the end workflow but most will 1000% appreciate it? (maybe we can or atleast we can chip away at this big goal?)
create a decision making process that the community buys into to get to a more unified place.

so my suggestion comes from a place of -

Let’s move in a productive direction and make sure it’s the right direction.
Let’s take positive direction and movement as a win.
Let’s then build on top of it towards a much bigger goal in line with what you are speaking to!

Let’s move the needle away rom the statement “python packaging is a mess” which propagates more confusion.

Let’s move towards " we are working as a community to unify and clarify python packaging".

Again just my two cents i would never pretend to speak FOR anyone here i’m just translating what i’m reading. And hoping this conversation can have a productive outcome! i’m super optimistic that we can get there together.