Python Packaging Strategy Discussion - Part 2

Summary of discussions:

Discussion cue:

In the Packaging survey, users indicated that they were looking for more support.

To illustrate this, survey respondent 1 said:

ā€œMany new python developers do not understand the difference between packaging a library, and shipping an application. Those who come from java backgrounds expect python packaging to be like java, and for libraries to freeze 100% of their dependencies. The community would benefit from richer examples by packaging case, showing how people maintain one library with poetry, another with pipx, now an application doing CI might use pip freeze in addition to its docker requirements. Few people understand how to update their dependencies, and manage ones they no longer need. Python tutorials for applications should demonstrate packaging and reuse, including updates and lifecycle.ā€

Survey respondent 2 said:

ā€œThe error messages, sometimes they are too ambiguous to beginner end usersā€

Survey respondent 3 said:

ā€œClear communication of a ā€œcorrectā€ or ā€œrecommendedā€ workflow for packaging. I appreciate the complexity of the issue, but it would help to present ONE approach as the standard, and relegate others for situations where the former doesnā€™t work. Furthermore, maintaining this standard workflow for an extended amount of time (i.e., several years) would be very helpful.ā€

Survey respondent 4 said:

ā€œA standardized tool to manage dependencies and package lifecycle. Poetry aims to be one, but AFAIK is not an official standard.ā€

Survey respondent 5 said:

ā€œI donā€™t get to use Python that much in my work, however, the times that I have used it Iā€™ve had to relearn ā€œhow to install X properlyā€ every time.ā€

Survey respondent 6 said:

ā€œAlthough the packaging documentation is great, it could be improved by giving ā€œshortcutsā€ like how packages /modules should be named, how the structure of a package should be, etcā€¦ with clear examplesā€

This could be technical support such as ā€œbetter deployment of dependencies when installing projectsā€ or ā€œbetter system for managing dependenciesā€. It could be defining common use cases and workflows for them. It could also be improving the support we offer on top of user documentation in the form of guided workshops, video tutorials. How can we better support our users?

Rules:

  • The cues listed above are only suggestions. Please add your thoughts or questions below.
  • This discussion is open to any active/passive PyPA/non PyPA tool maintainer/contributor
  • When posting for the first time in this thread, please indicate which tool(s) you are representing. This is for my benefit as I would like to gauge how many tools have been represented in the discussion. If you have participated earlier in the strategy discussion, please feel free to ignore this.
  • The discussion for this post will be open until Feb 10, 2023
3 Likes

Thatā€™s extremely broad as there are a ton of varying scenarios around code deployment.

Wouldnā€™t that fall under the whole ā€œunified UXā€ discussion?

I believe this falls under the UX discussion.

This probably requires deciding what we want packaging.python.org to be beyond a repository of packaging standards?

3 Likes

While I do see the value of this sort of targetted user supportā€¦ IMO our current contributor base does not have the bandwidth, interest, and (possibly) skillset to do these. :slight_smile:

I think weā€™d be better served by focusing on better UX than providing targetted user support or doing workshops. Those are aspects that perhaps we can/should engage with new contributors/other community members on. Things like better error messaging, clearer guidance associated with the errors, setting on a default workflow etc are all UX-level concerns which will cover most of the responses here.

Also, packaging.python.org is already not limited to being a repository of packaging standards ā€“ it does make recommendations today, as well as provide guidance and tutorials. It is difficult to navigate today to get to answers, but thatā€™s a separate tractable problem.

1 Like

IMO, workshops and targetted user support (like office hours) are a great way to determine where UX can be improved. Donā€™t do it to help users in general: youā€™ll help too few of them to make a difference there ā€“ though of course you should make it as useful as possible for the lucky few that can attend. Then see where you struggled to explain and where they struggled to follow.

2 Likes

Agreed, but as @pradyunsg noted, this isnā€™t something that our contributor base is likely to pick up on (anyone interested in doing this probably already is, the rest of us arenā€™t likely to do so just because someone says itā€™s a good idea).

Much like with the previous discussion, we can talk endlessly about the problem, but not much will happen unless someone steps up and says they can organise something and deliver people able and willing to work on it. And in that case, they can say what they plan on doing, and the rest of us are mostly just going to be saying whether we like the idea or not.

Organisationally, the PyPA has no power to dictate anything - and thatā€™s by design (see PEP 609 and the PyPA goals). Discussions like this tend mostly to demonstrate that thereā€™s no uniform view on direction among PyPA members (let alone among non-PyPA projects like conda and poetry). Given this, Iā€™m genuinely not sure what the purpose of these strategy discussions is meant to be. Itā€™s sharing the survey results, which is good, but what ā€œstrategyā€ is coming from it? There wasnā€™t much consensus on the previous discussion, so does that mean we have no strategy? Or will someone propose a strategy, in which case without a change in PyPA governance, what difference will that make? (Even with a change in governance, I donā€™t see anyone imposing a particular direction on packaging projects - thereā€™s too much history of independence for that to happen any time soon).

Iā€™d like to see improvements in the packaging ecosystem, and I certainly donā€™t think our current approach is ideal. But are these strategy discussions likely to deliver anything better, or are they just taking energy and bandwidth away from the people working on making progress?

1 Like

@pf_mooreā€™s latest comment is the 3rd strike[1] for me writing down what I think the whole point of these strategy discussions is. Wrote that in a separate topic; in case folks wanna opine on that.


  1. Thatā€™s a baseball reference. ā†©ļøŽ

3 Likes

I would note that itā€™s kind of a finicky situation though, it sort of makes recommendations but itā€™s weighed down by the desire not to ā€œpick winnersā€, so I do think an open question is whether the packaging.p.o should get more opinionated, to give it the power to be a better source of guides for new users.

Unfortunately, the PyPA doesnā€™t really have a mechanism to make that decision, and weā€™re unlikely to come to a unanimous agreement on such a decision.

Could we add a page where we let various tools provide their own blurb, and let them be as opinionated about themselves as they like? Advertising, basically, and marked as such, but still under the control of (letā€™s call it) ā€œPyPA Consensusā€ for which projects get a space.

https://packaging.python.org/en/latest/key_projects/

We have such a page already. By and large, tooling authors donā€™t care about PyPA documentation. Beyond the fact that this page isnā€™t useful for marketing, I also donā€™t see an incentive structure for any project to want to be listed in a long list of projects.

1 Like

Besides what @pradyunsg said, from what Iā€™ve seen the contention tends to come from anything where we appear to pick a ā€œwinnerā€ or a ā€œdefaultā€ option.

There was a large discussion around the Packaging Projects tutorial. Most of the contention came from the fact that the implementation of that wanted to select a backend to use as the default, because all of those backends are competing with each other, and their was a feeling that whichever project got selected, would gain some kind of advantage in attracting users.

And, that did happen (for better or worse) ā€“ Hatch does get recommended on the basis of being the default choice there; at least a point in its favor.

Yeah, I followed that discussion, and think it landed at about the best place possible. Itā€™s the gentlest possible ā€œwinnerā€ - as a reader I certainly wouldnā€™t take it as an endorsement, which I guess is why I always feel a bit like ā€œbackend devs should just get over itā€ when it comes up.

In any case, as I think we very nearly agreed in part 1, build backends arenā€™t the real problem. The Managing Application Dependencies ā€” Python Packaging User Guide page is more important here, and I think the feedback we have points toward this workflow being too important to let projects fight over it, and the various discussions show that we donā€™t have any real consensus around what this workflow is meant to be, who itā€™s for, and what problems it should solve.

4 Likes

Iā€™m sorry if this is not the right thread for this, but Iā€™m not sure where else to put it. So here goes:

I think one challenge with Python packaging is folks understanding what a ā€œpackageā€ is in the first place, and how or why one might make one.

  • It could be a dir with a __init__.py
  • It could be something you download from PyPi
  • It could be a something you built yourself with setuptools or whatever, in order to keep just-for-me-code organized and available.
  • probably some other things as well.

Iā€™ve seen the term ā€œdistributionā€ use for that thing you get from PyPi, too (and then thereā€™s ā€œapplicationā€ ā€“ another confusion, often applications are packages as well!)

Once those terms are clearly defined, another issue I see is that most of the documentation (and tooling), is very focused on ultimately getting a package (distribution?) up on PyPi ā€“ which is understandable, because thatā€™s a significant problem, and the problem that the folks writing the tools and docs are trying to solve for themselves. However, the results is that newbies end up thinking (and doing) one of two things:

  1. hmm, Iā€™m not trying to share my code with the world, I donā€™t have a use for packaging
    or
  2. OK, Iā€™m going to make a package of my code, and put it up on PyPi ā€“ even though they donā€™t need to share it with the world, and itā€™s not ready even if it will be some day.

The result is that you have a lot of folks with a jumble of python files scattered about their systems, and also a lot of half-baked packages on PyPi.

Iā€™m not sure I have any solutions in mind, though I made my little contribution with a lightning talk at SciPy a few years back:

http://pythonchb.github.io/PythonTopics/where_to_put_your_code.html

That is now out of date (PRs accepted!), but I think the ideas still hold true.

Anyway ā€“ what to do? My suggestions are:

  • Try to come up with some more clear definitions for ā€œpackageā€ and ā€œdistributionā€, and maybe even coin a new term or two? --then make sure those terms are used consistently in the PyPa docs ā€“ hopefully the community will follow.

  • Add some docs to the PyPa ones talking about simple packages that you arenā€™t (at least not yet) going to distribute.

Yes, Iā€™m willing to contribute, and, of course anyone can re-use any of what iā€™ve written if itā€™s useful.

3 Likes

In short, the colloquial term ā€œpackageā€ refers to two different thingsā€”import packages, i.e. a directory of files with an __init__.py, or and distribution packages which are the artifacts you upload to PyPI (Conda packages are also distribution packages). A distribution package may contain one or more import packages, with names that may or may not match the name of the distribution package, and one or more distribution packages may correspond to a single top level import package (Additionally, Python projects (Numpy, Pandas, xarray, etc) can also be colloquially (but imprecisely) referred to as ā€œpackagesā€.

The PyPA Packaging Guide site has the official PyPA Glossary which defines and disambiguates them, and the various tutorials, guides and specs on there try to be mostly consistent with using them. However, while this has been up there for many years, this unfortunately hasnā€™t seemed to alleviate the wider community confusion.

This might be slightly off-topic but I thought we were referring to the pre-distribution package set/what one works on as simply a project. The docs for Hatch do that at least.

edit: project is actually in the glossary that was linked Glossary ā€” Python Packaging User Guide

While true, this misses the other point of confusion @PythonCHB mentions, which is that people think (based on what the various docs say) either that everything should be a package (and so struggle with stuff like uploading that isnā€™t actually relevant to them) or that packages arenā€™t relevant to them (and so get nothing from the docs that nearly all assume youā€™re writing a distribution package).

I tend to use ā€œprojectā€ for any work that someone may be doing with Python. Itā€™s not universally accepted usage, though, so it does cause some confusion. For example, a data analysis is a project in my eyes, even though itā€™s likely just a notebook and/or a collection of scripts. Or an automation script is a project, even though itā€™s just a single file of Python. On the other end of the scale, a standalone application, written to be packaged using PyInstaller and not via console entry point scripts, is still a project (and needs consideration of aspects like distribution, but the typical ā€œdistribution packageā€ answers arenā€™t necessarily appropriate).

We have documentation on managing distribution packages (as @PythonCHB says, probably because thatā€™s what the documentation authors need themselves) but very little thatā€™s suitable for other sorts of project. In fact Iā€™d go further and say that a lot of people in the packaging community might even unconsciously think of these other types of project as ā€œnot in scope for packagingā€.

Genuinely, Iā€™m not convinced that people think that someone writing a 10-line standalone Python script is ā€œdoing packagingā€. And yet, it may not be complicated, but concepts like ā€œmetadata saying what Python version is supportedā€ apply, and in many cases dependency metadata would be useful. Is it necessary to insist that this simple 10-line script must become a full-blown distribution package, just to get the packaging community to help with the question of ā€œhow do I remember what version of requests this script needs?ā€

1 Like

First: I apologise for not looking harder for the Glossary before posting ā€“ I actually found it soon after posting ā€“ so the first part of my post is moot.

Hmm ā€“ that one specifically says a project: ā€œā€¦ is intended to be packaged into a Distributionā€.

Which makes sense in that context ā€“ but then what do we call what colloquilally would be called ā€œprojectā€ ā€“ i.e.

It helps to have some term for that.

and while a ā€œprojectā€ ā€“ anyone else have another term? can be pretty much anything, I think a use case of ā€œcollection of functions for my own useā€ needs a term ā€“ and for me ā€œpackageā€ is the right term, which brings up my issue :frowning: ā€“ maybe ā€œlibraryā€, which is not in the Glossary. Though that will get confused with a C library, as in a dll or .so or ā€¦ (Naming things is hard!)

As for metadata, specifically requirements, to go with a single script, or couple of scripts, I personally use a requirements.txt file (or, in my case, generally a conda_requrements.txt file. but the same idea). Maybe pyproject.toml is the way to do that now ā€“ though it seems pretty heavyweight if you donā€™t plan on sharing with the world. - perhaps documentation of a ā€œminimalā€ version would help here.

Though, AFAICT, you canā€™t tell pip to install ā€œjust the requirementsā€ for a pyproject.toml, unless Iā€™m missing something.

Now that I think about it ā€“ the original target for pyproject.toml was build systems, but PyPA docs now say ā€œfor packaging-related tools to consume.ā€ ā€“ so this is a reasonable use case.

Yeah, thatā€™s what I was referring to aboveā€”this is sometimes colloquially referred to as package, but at least per the PyPA glossary and our terminological conventions, that usage is at best imprecise, and at worse incorrect. By the same token, the official term for the result of a wheel installer is an ā€œinstalled projectā€, rather than an ā€œinstalled packageā€, since once the artifact is installed it is no longer considered a package.

Yup, thatā€™s trueā€”my post was more focused on the part where @PythonCHB was asking for an official, canonical definition of these terms. To drive that point further, it seems to me that most (though not all) of the docs there (and much of the packaging world in general) assumes youā€™re writing a library, rather than an application, for which the packaging and distribution strategy can often be widely different.

At least for (distribution) packaging purposes, which the PyPUG is concerned with, the arguably the simplest practical definition is simply a directory with a pyproject.toml in it, mirroring the definition of a sdist. Itā€™s not perfect, as it doesnā€™t cover legacy implicit setup.py projects, nor projects that first generate their pyproject.toml as part of the build process, or more exotic niche cases, but itā€™s fairly close.

Thereā€™s certainly projects that fall ā€œbelowā€ that, but I wouldnā€™t say that they fall within the domain of ā€œpackagingā€, at least in the sense I perceive it to be understood by most of those involved. But perhaps it should?

I thought that was an installed ā€œdistributionā€ ā€“ at least thatā€™s the terminology Iā€™ve seen when talking about getting version numbers via importlib.metadata.version

Docstring:
Get the version string for the named package.

:param distribution_name: The name of the distribution package to query.
:return: The version string for the package as defined in the package's
    "Version" metadata key.

Though in that short docstring, itā€™s called both a package and a distribution (and a ā€œdistribution packageā€ ā€“ but nowhere is it called a ā€œprojectā€.

Iā€™m a bit confused as to why ā€˜projectā€™ was defined that way ā€“ particularly after installation ā€“ it is clearly a python package (i.e. something you can import) then. Yes, if could be a single module, or it could be more than one package, but still ā€¦

IMO it should, and by limiting our discussions to just distribution packages, we are (a) failing to meet the needs of a significant group of users and (b) harming the usability of our tools (workflow tools that only work when pyproject.toml is present, for example).