Python Packaging Strategy Discussion - Part 2

smm · February 2, 2023, 11:59am

Summary of discussions:

We requested the Packaging community for feedback on the proposed survey questions
The Python Packaging Survey was conducted in Sept-Oct. The results are summarized here.
We asked maintainers/contributors on which they would like to discuss the Packaging Strategy. The most popular option was Discourse.
The first part of strategy discussion was about unification of Packaging tools

Discussion cue:

In the Packaging survey, users indicated that they were looking for more support.

To illustrate this, survey respondent 1 said:

“Many new python developers do not understand the difference between packaging a library, and shipping an application. Those who come from java backgrounds expect python packaging to be like java, and for libraries to freeze 100% of their dependencies. The community would benefit from richer examples by packaging case, showing how people maintain one library with poetry, another with pipx, now an application doing CI might use pip freeze in addition to its docker requirements. Few people understand how to update their dependencies, and manage ones they no longer need. Python tutorials for applications should demonstrate packaging and reuse, including updates and lifecycle.”

Survey respondent 2 said:

“The error messages, sometimes they are too ambiguous to beginner end users”

Survey respondent 3 said:

“Clear communication of a “correct” or “recommended” workflow for packaging. I appreciate the complexity of the issue, but it would help to present ONE approach as the standard, and relegate others for situations where the former doesn’t work. Furthermore, maintaining this standard workflow for an extended amount of time (i.e., several years) would be very helpful.”

Survey respondent 4 said:

“A standardized tool to manage dependencies and package lifecycle. Poetry aims to be one, but AFAIK is not an official standard.”

Survey respondent 5 said:

“I don’t get to use Python that much in my work, however, the times that I have used it I’ve had to relearn “how to install X properly” every time.”

Survey respondent 6 said:

“Although the packaging documentation is great, it could be improved by giving “shortcuts” like how packages /modules should be named, how the structure of a package should be, etc… with clear examples”

This could be technical support such as “better deployment of dependencies when installing projects” or “better system for managing dependencies”. It could be defining common use cases and workflows for them. It could also be improving the support we offer on top of user documentation in the form of guided workshops, video tutorials. How can we better support our users?

Rules:

The cues listed above are only suggestions. Please add your thoughts or questions below.
This discussion is open to any active/passive PyPA/non PyPA tool maintainer/contributor
When posting for the first time in this thread, please indicate which tool(s) you are representing. This is for my benefit as I would like to gauge how many tools have been represented in the discussion. If you have participated earlier in the strategy discussion, please feel free to ignore this.
The discussion for this post will be open until Feb 10, 2023

brettcannon · February 3, 2023, 12:51am

That’s extremely broad as there are a ton of varying scenarios around code deployment.

Wouldn’t that fall under the whole “unified UX” discussion?

I believe this falls under the UX discussion.

This probably requires deciding what we want packaging.python.org to be beyond a repository of packaging standards?

pradyunsg · February 3, 2023, 8:47am

While I do see the value of this sort of targetted user support… IMO our current contributor base does not have the bandwidth, interest, and (possibly) skillset to do these.

I think we’d be better served by focusing on better UX than providing targetted user support or doing workshops. Those are aspects that perhaps we can/should engage with new contributors/other community members on. Things like better error messaging, clearer guidance associated with the errors, setting on a default workflow etc are all UX-level concerns which will cover most of the responses here.

Also, packaging.python.org is already not limited to being a repository of packaging standards – it does make recommendations today, as well as provide guidance and tutorials. It is difficult to navigate today to get to answers, but that’s a separate tractable problem.

encukou · February 3, 2023, 10:00am

IMO, workshops and targetted user support (like office hours) are a great way to determine where UX can be improved. Don’t do it to help users in general: you’ll help too few of them to make a difference there – though of course you should make it as useful as possible for the lucky few that can attend. Then see where you struggled to explain and where they struggled to follow.

pf_moore · February 3, 2023, 10:24am

Agreed, but as @pradyunsg noted, this isn’t something that our contributor base is likely to pick up on (anyone interested in doing this probably already is, the rest of us aren’t likely to do so just because someone says it’s a good idea).

Much like with the previous discussion, we can talk endlessly about the problem, but not much will happen unless someone steps up and says they can organise something and deliver people able and willing to work on it. And in that case, they can say what they plan on doing, and the rest of us are mostly just going to be saying whether we like the idea or not.

Organisationally, the PyPA has no power to dictate anything - and that’s by design (see PEP 609 and the PyPA goals). Discussions like this tend mostly to demonstrate that there’s no uniform view on direction among PyPA members (let alone among non-PyPA projects like conda and poetry). Given this, I’m genuinely not sure what the purpose of these strategy discussions is meant to be. It’s sharing the survey results, which is good, but what “strategy” is coming from it? There wasn’t much consensus on the previous discussion, so does that mean we have no strategy? Or will someone propose a strategy, in which case without a change in PyPA governance, what difference will that make? (Even with a change in governance, I don’t see anyone imposing a particular direction on packaging projects - there’s too much history of independence for that to happen any time soon).

I’d like to see improvements in the packaging ecosystem, and I certainly don’t think our current approach is ideal. But are these strategy discussions likely to deliver anything better, or are they just taking energy and bandwidth away from the people working on making progress?

pradyunsg · February 3, 2023, 2:07pm

@pf_moore’s latest comment is the 3rd strike^[1] for me writing down what I think the whole point of these strategy discussions is. Wrote that in a separate topic; in case folks wanna opine on that.

That’s a baseball reference. ↩︎

dstufft · February 3, 2023, 2:26pm

I would note that it’s kind of a finicky situation though, it sort of makes recommendations but it’s weighed down by the desire not to “pick winners”, so I do think an open question is whether the packaging.p.o should get more opinionated, to give it the power to be a better source of guides for new users.

Unfortunately, the PyPA doesn’t really have a mechanism to make that decision, and we’re unlikely to come to a unanimous agreement on such a decision.

steve.dower · February 3, 2023, 2:54pm

Could we add a page where we let various tools provide their own blurb, and let them be as opinionated about themselves as they like? Advertising, basically, and marked as such, but still under the control of (let’s call it) “PyPA Consensus” for which projects get a space.

pradyunsg · February 3, 2023, 2:58pm

https://packaging.python.org/en/latest/key_projects/

We have such a page already. By and large, tooling authors don’t care about PyPA documentation. Beyond the fact that this page isn’t useful for marketing, I also don’t see an incentive structure for any project to want to be listed in a long list of projects.

dstufft · February 3, 2023, 3:02pm

Besides what @pradyunsg said, from what I’ve seen the contention tends to come from anything where we appear to pick a “winner” or a “default” option.

There was a large discussion around the Packaging Projects tutorial. Most of the contention came from the fact that the implementation of that wanted to select a backend to use as the default, because all of those backends are competing with each other, and their was a feeling that whichever project got selected, would gain some kind of advantage in attracting users.

pradyunsg · February 3, 2023, 3:03pm

And, that did happen (for better or worse) – Hatch does get recommended on the basis of being the default choice there; at least a point in its favor.

steve.dower · February 3, 2023, 3:14pm

Yeah, I followed that discussion, and think it landed at about the best place possible. It’s the gentlest possible “winner” - as a reader I certainly wouldn’t take it as an endorsement, which I guess is why I always feel a bit like “backend devs should just get over it” when it comes up.

In any case, as I think we very nearly agreed in part 1, build backends aren’t the real problem. The Managing Application Dependencies — Python Packaging User Guide page is more important here, and I think the feedback we have points toward this workflow being too important to let projects fight over it, and the various discussions show that we don’t have any real consensus around what this workflow is meant to be, who it’s for, and what problems it should solve.

PythonCHB · February 4, 2023, 2:02am

I’m sorry if this is not the right thread for this, but I’m not sure where else to put it. So here goes:

I think one challenge with Python packaging is folks understanding what a “package” is in the first place, and how or why one might make one.

It could be a dir with a __init__.py
It could be something you download from PyPi
It could be a something you built yourself with setuptools or whatever, in order to keep just-for-me-code organized and available.
probably some other things as well.

I’ve seen the term “distribution” use for that thing you get from PyPi, too (and then there’s “application” – another confusion, often applications are packages as well!)

Once those terms are clearly defined, another issue I see is that most of the documentation (and tooling), is very focused on ultimately getting a package (distribution?) up on PyPi – which is understandable, because that’s a significant problem, and the problem that the folks writing the tools and docs are trying to solve for themselves. However, the results is that newbies end up thinking (and doing) one of two things:

hmm, I’m not trying to share my code with the world, I don’t have a use for packaging
or
OK, I’m going to make a package of my code, and put it up on PyPi – even though they don’t need to share it with the world, and it’s not ready even if it will be some day.

The result is that you have a lot of folks with a jumble of python files scattered about their systems, and also a lot of half-baked packages on PyPi.

I’m not sure I have any solutions in mind, though I made my little contribution with a lightning talk at SciPy a few years back:

http://pythonchb.github.io/PythonTopics/where_to_put_your_code.html

That is now out of date (PRs accepted!), but I think the ideas still hold true.

Anyway – what to do? My suggestions are:

Try to come up with some more clear definitions for “package” and “distribution”, and maybe even coin a new term or two? --then make sure those terms are used consistently in the PyPa docs – hopefully the community will follow.
Add some docs to the PyPa ones talking about simple packages that you aren’t (at least not yet) going to distribute.

Yes, I’m willing to contribute, and, of course anyone can re-use any of what i’ve written if it’s useful.

CAM-Gerlach · February 4, 2023, 4:10am

In short, the colloquial term “package” refers to two different things—import packages, i.e. a directory of files with an __init__.py, or and distribution packages which are the artifacts you upload to PyPI (Conda packages are also distribution packages). A distribution package may contain one or more import packages, with names that may or may not match the name of the distribution package, and one or more distribution packages may correspond to a single top level import package (Additionally, Python projects (Numpy, Pandas, xarray, etc) can also be colloquially (but imprecisely) referred to as “packages”.

The PyPA Packaging Guide site has the official PyPA Glossary which defines and disambiguates them, and the various tutorials, guides and specs on there try to be mostly consistent with using them. However, while this has been up there for many years, this unfortunately hasn’t seemed to alleviate the wider community confusion.

ofek · February 4, 2023, 4:36am

This might be slightly off-topic but I thought we were referring to the pre-distribution package set/what one works on as simply a project. The docs for Hatch do that at least.

edit: project is actually in the glossary that was linked Glossary — Python Packaging User Guide

pf_moore · February 4, 2023, 11:15am

While true, this misses the other point of confusion @PythonCHB mentions, which is that people think (based on what the various docs say) either that everything should be a package (and so struggle with stuff like uploading that isn’t actually relevant to them) or that packages aren’t relevant to them (and so get nothing from the docs that nearly all assume you’re writing a distribution package).

I tend to use “project” for any work that someone may be doing with Python. It’s not universally accepted usage, though, so it does cause some confusion. For example, a data analysis is a project in my eyes, even though it’s likely just a notebook and/or a collection of scripts. Or an automation script is a project, even though it’s just a single file of Python. On the other end of the scale, a standalone application, written to be packaged using PyInstaller and not via console entry point scripts, is still a project (and needs consideration of aspects like distribution, but the typical “distribution package” answers aren’t necessarily appropriate).

We have documentation on managing distribution packages (as @PythonCHB says, probably because that’s what the documentation authors need themselves) but very little that’s suitable for other sorts of project. In fact I’d go further and say that a lot of people in the packaging community might even unconsciously think of these other types of project as “not in scope for packaging”.

Genuinely, I’m not convinced that people think that someone writing a 10-line standalone Python script is “doing packaging”. And yet, it may not be complicated, but concepts like “metadata saying what Python version is supported” apply, and in many cases dependency metadata would be useful. Is it necessary to insist that this simple 10-line script must become a full-blown distribution package, just to get the packaging community to help with the question of “how do I remember what version of requests this script needs?”

PythonCHB · February 4, 2023, 4:15pm

First: I apologise for not looking harder for the Glossary before posting – I actually found it soon after posting – so the first part of my post is moot.

Hmm – that one specifically says a project: “… is intended to be packaged into a Distribution”.

Which makes sense in that context – but then what do we call what colloquilally would be called “project” – i.e.

It helps to have some term for that.

and while a “project” – anyone else have another term? can be pretty much anything, I think a use case of “collection of functions for my own use” needs a term – and for me “package” is the right term, which brings up my issue – maybe “library”, which is not in the Glossary. Though that will get confused with a C library, as in a dll or .so or … (Naming things is hard!)

As for metadata, specifically requirements, to go with a single script, or couple of scripts, I personally use a requirements.txt file (or, in my case, generally a conda_requrements.txt file. but the same idea). Maybe pyproject.toml is the way to do that now – though it seems pretty heavyweight if you don’t plan on sharing with the world. - perhaps documentation of a “minimal” version would help here.

Though, AFAICT, you can’t tell pip to install “just the requirements” for a pyproject.toml, unless I’m missing something.

Now that I think about it – the original target for pyproject.toml was build systems, but PyPA docs now say “for packaging-related tools to consume.” – so this is a reasonable use case.

CAM-Gerlach · February 5, 2023, 12:28am

Yeah, that’s what I was referring to above—this is sometimes colloquially referred to as package, but at least per the PyPA glossary and our terminological conventions, that usage is at best imprecise, and at worse incorrect. By the same token, the official term for the result of a wheel installer is an “installed project”, rather than an “installed package”, since once the artifact is installed it is no longer considered a package.

Yup, that’s true—my post was more focused on the part where @PythonCHB was asking for an official, canonical definition of these terms. To drive that point further, it seems to me that most (though not all) of the docs there (and much of the packaging world in general) assumes you’re writing a library, rather than an application, for which the packaging and distribution strategy can often be widely different.

At least for (distribution) packaging purposes, which the PyPUG is concerned with, the arguably the simplest practical definition is simply a directory with a pyproject.toml in it, mirroring the definition of a sdist. It’s not perfect, as it doesn’t cover legacy implicit setup.py projects, nor projects that first generate their pyproject.toml as part of the build process, or more exotic niche cases, but it’s fairly close.

There’s certainly projects that fall “below” that, but I wouldn’t say that they fall within the domain of “packaging”, at least in the sense I perceive it to be understood by most of those involved. But perhaps it should?

PythonCHB · February 5, 2023, 3:28am

I thought that was an installed “distribution” – at least that’s the terminology I’ve seen when talking about getting version numbers via importlib.metadata.version

Docstring:
Get the version string for the named package.

:param distribution_name: The name of the distribution package to query.
:return: The version string for the package as defined in the package's
    "Version" metadata key.

Though in that short docstring, it’s called both a package and a distribution (and a “distribution package” – but nowhere is it called a “project”.

I’m a bit confused as to why ‘project’ was defined that way – particularly after installation – it is clearly a python package (i.e. something you can import) then. Yes, if could be a single module, or it could be more than one package, but still …

pf_moore · February 5, 2023, 8:13am

IMO it should, and by limiting our discussions to just distribution packages, we are (a) failing to meet the needs of a significant group of users and (b) harming the usability of our tools (workflow tools that only work when pyproject.toml is present, for example).