Python packaging documentation feedback and discussion

takluyver · March 27, 2023, 6:43pm

Thanks @CAM-Gerlach ! Truth be told, I don’t exactly know how much prominence Flit should have now. There are shinier tools like Poetry/Hatch that can do everything Flit can do and more.

I think someone might like Flit if:

They explicitly don’t want their packaging tool to set up an environment for them, e.g. because they’re doing that at a different layer.
- Of course, using tools with environment management doesn’t force you to use those features. But if you’re not buying in to those tools all the way, it’s a bit of extra mental effort to figure out which bits you want and which you don’t.
They’re comfortable with traditional setup.py style packaging, and want to keep the same conceptual model while moving away from setuptools.
- You can also use pyproject.toml metadata with setuptools, but setuptools has extra complexity because of all the other stuff it supports.
They want a packaging tool that they can understand very well, and thus one that’s as simple and predictable as practical.
- The predictability is somewhat undermined by flit build using information from your VCS, but we’re planning to move away from that.

I don’t have a good sense whether this describes 10% of likely readers or 0.1%, though.

(Within the packaging world, I also think there’s a role for Flit as a kind of reference implementation - by being roughly the simplest thing that could be used for a reasonable fraction of cases, it can be useful for e.g. trying to implement proposed new standards. But this doesn’t matter to typical users.)

abravalheri · March 27, 2023, 6:47pm

I think this refers back to the monolith vs lego discussion.

There are great tools out that the combine really well with “backend-only” or “slim” approaches like setuptools or flit (respectively).

I can cite PyScaffold for setuptools (because I help to maintain it), but flit probably also has great cookiecutters out there.

In my experience maximalist approaches may seem great at first glance, but they also restrict you. If you need to change something you may have to modify the maximalist tool itself. A project generated by a tool that integrates things like nox/tox for project management, on the other hand, will allow you to quickly inspect all the “task” definitions, learn and modify them to fulfil your specific needs.

CAM-Gerlach · March 27, 2023, 7:14pm

I don’t want to drag things further off-topic, but just FWIW, Flit’s model has seemed to be very attractive (both in my estimation, and from talking with colleagues) for the 90% of PIs, GRAs, engineers and data analysts in the scientific world for which Python is just a means to an end. They don’t have much training or experience in programming let alone packaging behind whatever their advisor/colleague/local guru (like me) taught them, and they just want to be able to share their work quickly, easily and robustly while having to me as few non-trivial between different options as possible—the latter of which IMO is the single biggest frustration I hear about Python packaging from scientists, from brand new students all the way to my advisor (the most knowledgeable person about Python and packaging in our department.

As I understand it, Flit’s primary motivation is to address exactly that user story. IMO the only real reason it hasn’t picked up more steam among that crowd is just a tremendous amount of inertia and cargo-culting especially among those sorts of user groups; workflow processes are handed down from person to person much more as there is less understanding of why things work, and (particularly as everyone has a story of how they messed something up in Python) a lot more fear of breaking what works, plus many confusing options to choose from and little time or motivation to spend any time looking in to them.

Bringing things back on topic, to that end, this really speaks to the value of authoritative guide briefly summarizing the main options and the users they work best for.

lwasser · March 27, 2023, 7:30pm

let’s stay on track. i welcome issues to our repo on our guide or discussion in a separate thread too.

I wonder if starting an outline of current content on pypa and aligning with current needs would be a great start to moving things forward. especially if the (initial) focus is going to be on translating the peps and standards into usable information that the community understands as a starting place.

also identifying how technical your readers need to be to understand the content would be very helpful before doing any writing IMHO.

BrenBarn · March 27, 2023, 10:06pm

I can certainly see your point. To me the way to slice it would be to divide up the user tasks/goals and provide an end-to-end recommendation for each “kind of user”. The way I think about these dimensions is sort of like:

Doesn’t care about creating packages for other people to install, but wants to install and use packages that other people have created.
Wants to be able to share code so others can use it, but may not care about putting it in a publicly-accessible repository like PyPI or conda-forge (e.g., they want to create a file they can email to someone).
Wants to publish a pure-Python package or application.
Wants to publish a pure-Python package or application that depends on non-pure-Python packages.
Wants to publish a package or application that requires some kind of compilation step as part of its own build process (e.g., a C extension).
Wants to publish a package or application that requires some extra-complicated compilation steps? Maybe this is things that have multiple separate pieces with their own separate compilation steps that have to be integrated?

In my mind, level 4 there is the sweet spot as far as Python documentation, because that’s the most complex level at which a person can operate without having to know any other language than Python. I also think it’s where the userbase expands a great deal, as it’s where you get things like “everyone who uses stuff that depends on numpy”.

For most of those levels I think just about everything on @pf_moore’s list of topics is in scope. (In particular, I think that thinking about environment management is relevant even for level 1, so that should not be considered out of scope.) The degree of detail or “opinionation” may vary from one topic to another though.

One wrinkle is that often users are at one level but are on a trajectory to reach “higher” levels, so we want to avoid too much recommendation of tools that are “simpler for beginners” but that may have to be left behind for more complex tasks.

The bigger worry for me is that some of the tasks on my list may not have clear answers due to gaps in existing tooling or docs. For instance, if someone wants to distribute an application (rather than a library) my impression is the paths to choose from are not so straightforward. Those are cases where we actually need new options, not just new signposts; but even so, signposts may be helpful in navigating the existing options.

brettcannon · March 27, 2023, 10:27pm

Yep, I agree with that.

Yep, it does!

pitrou · April 2, 2023, 7:13pm

If that is correct, then it sounds like a bad idea to advocate Flit. Once their needs exceed what Flit offers, users will have to migrate to a different utility with different conventions.

CAM-Gerlach · April 2, 2023, 7:41pm

That is a potential drawback of that design choice, certainly, but it doesn’t necessarily make it a “bad idea” overall for users in its target audience. The above presupposes that most or all Flit users’s needs will inevitably exceed what Flit has to offer, which in practice is unlikely to be the case given most modern pure-Python packages do fit within its constraints, particularly those by the userbase it is geared toward, and most of which will never end up needing anything more—and those that don’t often are aware of such at the start (e.g. extension module support, building binaries in other languages, etc).

Additionally, by and large the most likely constraint to be exceeded (particularly in Flit’s target demographic, scientific users) is support for building extension modules, incorporating external binaries, etc., which is something that most similar backends (Hatch, Poetry, PDM, etc) offer limited or no native support for anyway. Particularly in the scientific domain, the likely better choice for that is to switch to a backend specifically tailored for that use case like Meson-Python or Scikit-Build.

Finally, if they do have to move to a non-specialized backend, by design Flit relies on Pyproject metadata ([project] table), follows the relevant packaging standards and conventions (VCS-based include/exclude, etc), and has relatively minimal backend-specific configuration, so the author can keep most of what they’ve already set up and not have to relearn it, and mostly focus on the new things they need the new backend for.

Maybe we do want to split this backend comparison discussion off to a new thread?

jagerber · April 2, 2023, 8:34pm

I have a question that I’ve been too scared to ask, but I bet a lot of other people have the same question:

What is “building” and why do I care about it?

Basically the packaging flow documentation has a ton of stuff about building and build tools. I have a rough idea what building is but I don’t know exactly what it is along the chain from “me writing code” to "that code being used somewhere else*.

The reason I don’t want to select a build tool is because I don’t want to have to learn that much about what “building” means. I have a feeling that there are many others who feel the same way I do.

Also, I know that it’s not necessary to know precisely what build tools are to share code, because I have shared code before but don’t know precisely what build tools are.

jeanas · April 2, 2023, 8:46pm

Amusingly enough, I asked roughly the same question recently: Sdists for pure-Python projects - #3 by jeanas

CAM-Gerlach · April 2, 2023, 9:15pm

That’s a totally valid question, and you’re certainly not the first person to ask it. In fact, it was discussed at some length in the recent thread:

It seems you’ve already read The Packaging Flow document but it’s still not entirely clear, so I’ll try to briefly fill in the gaps.

In Python-specific packaging, “building” most specifically refers to the process that uses the project’s chosen build backend (like setuptools, hatchling, flit-core, poetry-core, etc) that takes a source artifact (typically a source distribution/sdist, or a project source tree, VCS checkout, tarball, etc) as input, and produces a built artifact (typically a wheel) that can be installed on a compatible platform by essentially just unpacking an archive and moving files into place, without needing the build backend or build dependencies installed and without having to execute any dynamic code.

Even with a pure Python project where you don’t need to compile C code to a binary executable, there’s still some work to do to get from a sdist/source tree to a built wheel that can simply be unpacked and moved into place (or, in some albeit unsupported, cases, imported and used directly), with the amount depending on the complexity of the project and the build backend. I list some of them here:

Additionally, “build” is sometimes also used (perhaps less precisely and more oxymoronically) in the context of “building” an sdist from a source tree, which typically involves a more limited set of steps to collect the source files into an archive, generate preliminary metadata and perhaps do some other preprocessing, depending on the backend and user custom configuration. However, given the obvious confusion, I’ve moved away from using that term in this context, and preferred terminology that is clearer, more specific and less potentially confusing (e.g. “generating” a sdist, “constructing” a sdist, etc).

By analogy, if you’re familiar with the typical workflow on *nix, source tree → sdist is loosely analogous to configure, sdist → wheel like make, and wheel → installed project like make install. Or using the old-style implicit setup.py-based workflow, the build step is when your dynamic setup.py would be executed to run the appropriate code within.

pf_moore · April 2, 2023, 9:17pm

That’s a very useful insight, so I’m glad you overcame your fear and asked

In many ways you’re right that you don’t necessarily care about “building”. In its traditional sense, building usually meant compiling a C program (or similar) into its native form, and that doesn’t apply to a language like Python.

But if you think of “building” as “taking your source code, and making it into a thing that you can give to someone else which will let them install it”, does that help? You may well think that just sending them your Python file is enough - and it is, for simpler projects. But once you have a couple of files (maybe one with your main code, and one with helper functions) or you depend on something from PyPI, having something that captures all of that detail - plus other things like a project name and version, so “the system” can upgrade or uninstall it, etc - is pretty helpful. “Building” is the process of making that shareable thing.

That’s oversimplified a lot, but does that help you understand what “building” means in the context of Python?

BrenBarn · April 2, 2023, 9:19pm

Thank you for saying this.

In my view it is not your fault that you don’t know that; it’s the fault of the packaging documentation, because it’s inadequate and incorrectly structured. It’s why in my earlier post in this thread I tried to break things down in terms of user goals like “I want to do X”. I hope others see the value in doing so as well, and radically rethinking the documentation to foreground tasks that typical users want to do (e.g., “make it so someone else can use my code”) rather than internal terminology like “build” and “sdist”.

CAM-Gerlach · April 2, 2023, 9:43pm

More on the topic of this thread, the Packaging Flow document you referenced was in fact originally written by another packaging user much like you who wasn’t quite sure how all the pieces fit together and why certain things were necessary, and with the help of the community wrote up their findings to guide others. Perhaps it should, then, at least briefly explain why a project needs to be “built” to be shared with others and what that means in the context of a Python project?

Right, that would be a how-to guide (which certainly are one area that could definitely be updated and improved on the PyPUG), whereas the user was reading an explaination, which has the primary goal of explaining the concepts involved to further user understanding—answering the question of “why” rather than “how”.

While already partially organized in the Diataxis style, perhaps this explanation should be moved later or under the Explanations/Discussions section so newer users start off with the tutorial, so they first learn just the basics they need to know to accomplish what they likely want to do, without being overwhelmed with concepts that may not be important to them at this point.

On the other hand, as I understand it the thinking was to give users a high-level overview of how everything fits together, so once they proceed to the tutorial they aren’t just rotely repeating steps without really understanding what is going on, but instead have a sense of the overall picture. They are going to run into these concepts anyway once they do (or at least these terms, given the names of things in pyproject.toml), and it seems users are curious (and potentially get confused and uncertain) about how everything fits together, which is the goal of that document to explain. Perhaps instead it can be improved to help clear up that uncertainty, per my suggestion above.

lwasser · April 3, 2023, 2:34pm

hey @jagerber does this page help at all to clarify? The Python Package Source and Wheel Distributions — pyOpenSci Python Packaging Guide i wonder if it doesn’t we might want to add some language around that specific question.

This exact question relates to my comment:

“there is a level of expected background knowledge associated with how most if not all of the online resources on packaging carry”.

but this question is SUCH a good one!! it is one that i and i’m guessing MANY MANY others have asked and have struggled to find a simple, clear answer to. for me finding the answer involved talking to many people and reading a bunch of material scattered around the internet (some of which was dated).

bryevdv · April 3, 2023, 5:27pm

In retrospect, the word “build” seems to have very heavyweight connotations, e.g. invoking separate toolchains for compilation or linking or transpiling or whatever. This is certainly the case for me. I guess if I had my wish, these steps would be split up with different names and different docs for different audiences.

The simple step of “assembling a wheel” would get a much less intimidating name like “pack”, with “build” would reserved for cases where all that extra work is needed. Then pure-python projects would only have to see and learn and know about how to “pack” a wheel, and could ignore the mountain of overlapping and semi-conflicting tools for all the different use-cases that involve (what I would consider) and actual “build”. ^[1]

every project has the same “pack” step, but some also have a “build” step earlier ↩︎

EpicWink · April 3, 2023, 9:14pm

To be fair, that definition is coming from your preconceptions with compiled languages (for example, and especially, C/C++). A JavaScript dev might think it’s to use Webpack to generate a distributable, while a new Dev may think a small team of monkeys folds cardboard into a box and throws some packing foam inside.

bryevdv · April 3, 2023, 9:36pm

My actual experience here comes from needing to build a JS library from Typescript sources, using all the separate toolchain that implies. I did not mean to get hung up on the name “pack” in particular. I only meant to highlight that:

Every project (more or less, now) needs to “assemble a wheel”
Not every project needs to “do toolchain stuff” (whatever that might mean)

So my conclusion is that splitting out just the simple “assemble a wheel” part as a concept would allow the folks who don’t need to “do toolchain stuff” (i.e. pure python packages) a quick process/docs/tools offramp that avoids a lot complexity that doesn’t apply to them at all.

jagerber · April 5, 2023, 4:02am

Thank you all for your responses to “What is ‘building’?”. They’re all helpful. I come out of the past few posts feeling VERY sympathetic to what @bryevdv is saying about splitting out different terms for the different steps. I understand that, if my goal is to share code I’ve written with someone else, that packaging my code into a wheel and giving them that wheel is a very good way to do this. So this is something I care about. But it sounds like in some cases there might be more to “building” than packing code into a wheel, and in some cases “building” my not involve packing code into a wheel at all. So yeah, it sounds like (1) the term “build” is perhaps a little overloaded, or at least general enough as to not be useful for non-experts and (2) what I really care about is not knowing how to build code, but how to pack python code into a wheel. When I’m more advanced maybe I’ll care more about building more generally, like if I’m making a python package with non-python components).

A follow up question: what is the name of the process where a built artifact (typically a wheel) is installed (unpacked?) onto a platform (typically resulting in some files populating a folder in the site-packages directory…)? What is the name of the tool that does this? I guess the answer is installation/installer?

@lwasser I had a look at the link you sent. I think it’s the best explanation of “what is an sdist” and “what is a wheel” that I’ve seen so far. I do think an explanation of “what is building” would be helpful. There were a few sentences here and there on that page I found confusing. Where would be a good place to give that sort of sentence-level feedback on that document? Sort of unrelatedly, I did find one comment on a pyopensci blog post you wrote that I’m more and more agreeing with “Python packaging is not bad. It’s just not well documented.” So I’m becoming excited about this thread and its possible results.

Back to the topic: I feel like what I’m seeking is an explanation in the reverse order of what I’ve seen. That is, I want to know what pip (or some other installer?) looks for when it gets a wheel (or some other “built artifact”) and what it does with the information that it finds. THEN I want to know one or multiple to ways to create that artifact so that the installer can do it’s job. Right now I feel the descriptions is “so you want to share you code?” Well you can start with one of 6 different options… but since I don’t know my end goal (other than I want to be able to import my code) I don’t have any context for these different options.

lwasser · April 5, 2023, 2:25pm

@jagerber many thanks for this feedback. I did open an issue about being more specific about what building is here - We need to answer the question - what is "building" a package · Issue #74 · pyOpenSci/python-package-guide · GitHub feel free to add to that issue and open any other issues that you have regarding what would be helpful to folks such as yourself in terms of documentation

i’m also working on some draft graphics that will hopefully tie together the entire process of creating and building a package as it relates to the various tools (these are just drafts). we welcome your feeedback in our repo via issues (to avoid distracting from this convo here !)

So this would be setuptools/build + twine for example.