Python packaging documentation feedback and discussion

steve.dower · March 20, 2023, 3:39pm

Yeah, it’s this position that makes me so positive towards people doing their own opinionated distros of Python (including the tools, docs, and support that will be best for their users).

I’ve challenged the position before, but most of the core CPython team only want to do CPython. Even tools like the py.exe launcher are considered “outside” the core. And the primary deliverable is a bundle of source files - the pre-built distros are conveniences, not the product.

It’s all designed to allow someone else to put together a complete package for users to use. Realistically, Anaconda and ActiveState are the only ones who come close to achieving this vision, but the licenses allow anyone to enter into the space if they choose.

So do bear in mind that, short a drastic change in the intentions of the CPython team (via the Steering Council, and probably requiring a governance change), none of this is going to happen at that level. And the people who would need to be convinced aren’t even reading these discussions.

Efforts like PyOpenSci are far more in line with what everyone wants. If it went as far as having a pre-packaged distro with its own user docs, it would be totally fine to tell people to install that rather than the python.org packages or the Linux distro packages.

The problem is still marketing, ultimately. But convincing the web site maintainers to link to your reliable, long-standing distro is far easier than convincing volunteer maintainers to switch to maintaining something that they don’t want to do.

pf_moore · March 20, 2023, 3:57pm

One further point I would add is that I think we should try to keep the scope carefully focused. It’s all too easy to take “packaging” as a blanket term to cover everything to do with developing with Python in all its forms. While there’s a clear need in the user community for clarity and guidance in many areas, I think we need to be careful not to assume the packaging documentation (and more specifically the PyPA) has to cover all of that.

To make this more concrete, which of the following are “in scope” for the packaging documentation?

Specifying package metadata and build processes
Distribution of libraries for public use
Sharing libraries privately between individuals/teams
Creating standalone applications in Python
Sharing Python scripts with their dependencies, between people with Python installed
Project structure - for libraries, applications, data science, etc (many different use cases here)
Tools and processes for testing
Tools and processes for documentation
Python environment management
Non-python environment management (shared libraries, etc)
Task runners
Workflow, whether automated or manual
Continuous integration and deployment
Deployment of the Python runtime itself

All of these have at one stage or another been brought up in “packaging” discussions, but there’s clearly^[1] no chance of covering all of that in any practical timescale. And there’s a lot of that which frankly isn’t the job of the packaging community. So I think it’s important that we try to come to some consensus on what is in scope, and then agree to restrict discussion to that scope.

For what it’s worth, this question of scope is just as much of an issue for any opinionated distribution or documentation - and there’s no reason that every such distro/document should make the same choices (for example, conda includes environment management, but as far as I know ActiveState doesn’t).

At least, it’s clear to me ↩︎

steve.dower · March 20, 2023, 4:34pm

I’m not as familiar with it, but they had their PyPM which is being retired in favour of a higher level manager that is used for all their languages and includes the runtimes. The newer one at least seems to cover environment management.

[Edit: sorry, forgot to link to an interesting page for their new tool]

pf_moore · March 20, 2023, 5:08pm

Fair enough. Nevertheless, it’s still not a given (IMO) that a distro/document has to encompass environment management.

brettcannon · March 21, 2023, 11:02pm

Because no one asked for my opinion …

Specifying package metadata and build processes
Distribution of libraries for public use
Sharing libraries privately between individuals/teams
Creating standalone applications in Python
Sharing Python scripts with their dependencies, between people with Python installed
Project structure - for libraries, applications, data science, etc (many different use cases here)
Tools and processes for testing
Tools and processes for documentation
Python environment management
Non-python environment management (shared libraries, etc)
Task runners
Workflow, whether automated or manual
Continuous integration and deployment
Deployment of the Python runtime itself

So for me it’s whatever has to do with packaging (which I know sounds obvious, but I view environments existing to manage packages, so falls under this purview).

lwasser · March 23, 2023, 4:06pm

so based on this list of upvoted items…

would it be valuable to do a bit of inventory of the existing pypa content in a high level outline form to help identify gaps or areas that need updates, etc prior to the packaging summit in april?
and then creating a high level plan that is perhaps implemented by tackling one bullet item in the list above at a time?

for example the first - package metadata and build process would be great to tackle. there are so many questions now about setup.py moving from it to pyproject.toml etc etc.

pf_moore · March 23, 2023, 4:19pm

I think so, yes. I’d avoid environment management, personally, as I think that’s probably the most controversial item (I can see why @brettcannon included it, but it very quickly leads into areas that he didn’t select, such as task runners and workflow).

While I’m on the subject of (not) asking for your opinion , do you think the scope of the PyPA should be wider than the scope of the packaging docs, or do you think the two should be the same? Personally, I think there’s enough confusion already, so we should keep them the same for clarity, if nothing else.

I’m aware that puts workflow tools like hatch in a somewhat awkward position, as a result of being part of the PyPA while technically not being entirely in the PyPA’s scope. I think we can resolve that, though, if we need to.

And regarding the question of environment management I hinted at above, my view is that the machinery of virtual environments is the responsibility of the core devs (because it’s implemented in the core)^[1], and the PyPA/packaging docs only cover tools for managing environments. Does that match your view?

And that includes the (implied) statement that virtual environments are the official means of creating isolated environments in Python. ↩︎

abravalheri · March 23, 2023, 5:43pm

I agree with this, environment management sounds out of scope.
When I think about PyPA I think of “building and installing packages”.
When I think about environment management, my mind goes immediately to tools like tox and nox.

I understand that we have tools like hatch nowadays in PyPA but I imagine that this is due to the fact that hatch is “tagging along” with hatchling.

BrenBarn · March 24, 2023, 4:40am

Paul Moore:

One further point I would add is that I think we should try to keep the scope carefully focused. It’s all too easy to take “packaging” as a blanket term to cover everything to do with developing with Python in all its forms. While there’s a clear need in the user community for clarity and guidance in many areas, I think we need to be careful not to assume the packaging documentation (and more specifically the PyPA) has to cover all of that.

To make this more concrete, which of the following are “in scope” for the packaging documentation?

Specifying package metadata and build processes

Distribution of libraries for public use

Sharing libraries privately between individuals/teams

Creating standalone applications in Python

Sharing Python scripts with their dependencies, between people with Python installed

Project structure - for libraries, applications, data science, etc (many different use cases here)

Tools and processes for testing

Tools and processes for documentation

Python environment management

Non-python environment management (shared libraries, etc)

Task runners

Workflow, whether automated or manual

Continuous integration and deployment

Deployment of the Python runtime itself

As I mentioned before, I think that, by and large, this is the wrong way to slice it. It is better is to think about what people want to do and then tell them everything they need to do to do that. Some of that may not even be packaging stuff; we still need to tell them how to do it.

More and more I question what value PyPA qua PyPA is really adding to anyone’s experience. The topic is not packaging; it is the use of Python. Packaging tasks are an important part of using Python; those tasks should be comprehensively covered in the Python documentation.

As a specific example, I think that environment management is absolutely essential and needs to be considered in tandem with packaging. Every package that is installed is installed into an environment; you can’t use packages effectively without at least having some awareness of where you’re installing them to…

Similar to the discussion of “greenfield” exploration on another thread, I think that beginning by identifying some narrow scope risks a large waste of effort, like the joke about the drunk who dropped his keys in the alley but is looking for them near the lamppost because the light’s better there. The problem with the existing documentation is precisely its fragmentation and lack of comprehensive coverage of relevant user tasks. Creating additional documentation that doesn’t fix that problem will not just not help — it will actually make things worse, by adding yet another layer of partial coverage to the landscape.

pitrou · March 24, 2023, 9:14am

If I look at this diagram, I conclude “PDM seems the most flexible, so let’s just use PDM everytime. Why should I bother with other less capable tools?”.

In the end, the situation is befuddling: arguably there is choice because of differing needs, but nevertheless one tool is able to replace all others?

(also, the backend/frontend distinction is frankly not very user-friendly)

lwasser · March 27, 2023, 3:45pm

i’ve reviewed all of the most common tools - the tools from the pypa survey were my basis for most common - we know that might be a bit skewed given the volume of conda users.

I plan to publish some content on each as well just in blog posts so people can see. the BIGGEST difference in the front end tools PDM, HATCH, poetry is how easy the UI is to use (documentation quality and the UI itself) . Then each has a set of features that is slightly different. For instance hatch has nox like functionality & supports matrix environment builds for local testing. pdm doesn’t but pdm supports other back ends and conda and other environments out of the box without a plugin.

from the science user perspective of all of the tools right now PDM is the only that supports using other build back ends. This is super valuable for anyone who has a more complex build because many of the core science packages are moving to meson-python. and PDM does work with meson-python.

PDM and hatch both provide documented support for c/c++ extensions. but i believe the way to implement this may be a tad different (don’t take my word for that last statement however i haven’t tried it yet just reiterating verbal feedback i’ve gotten).

Generally for most users creating pure python packages who aren’t too concerned about the exact structure of their SDist and wheels the backends for flit, pdm hatch and poetry are all fairly similar. the big difference seems hatch and pdm have documented support for c extensions. poetry doesn’t document such support. and all of them have slight differences in what is included in the sdist by default and how you specify what’s included. Finally flit has a lot LESS features compared to pdm and hatch it really just builds and publishes a package to pypi.

this is why i do think it’s confusing for a user to see multiple back end choices on a page. if you are more advanced you really understand why you might pick one over the other. however for most i they won’t see a big difference using flit core, pdm-build poetry or hatchling UNLESS they actually unpack their distribution files and look at the contents carefully.

our content is live now - there is a table here that broadly overviews features. i go into more detail for each tool below (note i did ask the tool maintainers review this content for accuracy and they did so with wonderful helpful feedback!). Python Package Build Tools — pyOpenSci Python Packaging Guide

i hope this helps. i do think there are some great tool options out there. we just need more documentation that guides users regarding where to start if they don’t want to become a packaging expert. i think that is what is important. (just my two cents)

lwasser · March 27, 2023, 3:46pm

also please again note these are just my thoughts on the situation. i’m also learning a lot about this community as pyOpenSci evolves and our ONLY goal is to help here while supporting the scientific community!

lwasser · March 27, 2023, 3:49pm

@BrenBarn what do you think about considering a starting place with fewer topics given it would be really hard to start tackling everything at once. maybe an outline with more topics that can be covered as a plan? I only ask this because it took me … about a solid 3 months to get that packaging chapter sorted out and it’s just a high level overview. and feedback was still coming in after i “closed” the review. my worry is if we as a community tackle too much as once, nothing will get done. there are so many things to get right here. just a thought…

sinoroc · March 27, 2023, 4:39pm

@lwasser Thanks for this effort. On a first skim, it looks very helpful.

A first remark, before doing a deeper dive, I wonder if flit needs to be considered at all. My understanding is that flit’s main purpose is to be used for bootstrapping, because it has zero dependencies. It has a very limited feature set, which as far as I understood is very unlikely to grow, and some hard to navigate corner cases (if I recall correctly, it behaves differently when used from its own CLI or as a pure build back-end). My feeling is that it would be a disservice to suggest it as a candidate for build back-end.

Hopefully, others will correct me if my understanding of flit is wrong.

CAM-Gerlach · March 27, 2023, 4:42pm

@takluyver can explain it better, I’m sure, but as I understand it Flit’s primary designed use case is to make packaging as simple as possible for users with pure-Python packages who just want a tool that does the right thing with a minimum of options and complexity, particularly scientists and other non-full-time programmers. The design tradeoffs, such as a limited feature set, opinionated choices and not handling less-common corner cases, are all specifically oriented around that goal.

sinoroc · March 27, 2023, 5:01pm

@CAM-Gerlach Thanks. I need to reconsider flit’s position in my mental model. And the guide even has a “Why you might not want to use Flit” section, so it’s all good.

abravalheri · March 27, 2023, 5:11pm

Hi @lwasser, thank you very much for sharing this.
Could you please correct the following sections:

setuptools will build a project without a name or version if you are not using a pyproject.toml file to store metadata.

Setuptools also will include all of the files in your package repository if you do not explicitly tell it to exclude files using a MANIFEST.in file

Setuptools can derive name and version from multiple files that store metadata (pyproject.toml, setup.cfg and setup.py).
The second bullet point is not precise. Setuptools has a set of files that will be included by default and that is not equal to all the files in the repository. We do recommend using setuptools-scm though, and the outcome of that should be more or less equivalent to what hatch and others do…

lwasser · March 27, 2023, 6:28pm

@abravalheri happy to make changes! many thanks for noting this.

Rather than diverting the convo here - would you be open to creating an issue in our repo here: GitHub - pyOpenSci/python-package-guide: A guide to scientific python package recommendations curated by pyOpenSci i have a few questions for you about this and can show you what happened in a package i created as well

@sinoroc yup - i think a nice working case for flit also is people making small modules for personal use. it’s light weight and quick to get up and running. Although, i wish it might make a directory structure for someone if they wanted to start from scratch too (minor and maybe out of scope). here is a blog post that i intend to publish soon about Flit - Learn About Using Flit for Python Packaging - HackMD just so you can see the interface.

what would be really cool if is some of these tools could just built upon each other extending functionality for various groups of users rather than rebuiliing similar functionality that is slightly different (if that makes sense). i don’t have a good solution this is just an observation.

sinoroc · March 27, 2023, 6:35pm

Maybe the feedback on pyOpenSci’s packaging guide should be split into its own thread?

CAM-Gerlach · March 27, 2023, 6:40pm

I can split it from Python packaging documentation feedback and discussion - #71 by lwasser onward if people agree (there were some posts on that topic much earlier, but they were much more intermixed with other discussion).