Python packaging documentation feedback and discussion

That’s a totally valid question, and you’re certainly not the first person to ask it. In fact, it was discussed at some length in the recent thread:

It seems you’ve already read The Packaging Flow document but it’s still not entirely clear, so I’ll try to briefly fill in the gaps.

In Python-specific packaging, “building” most specifically refers to the process that uses the project’s chosen build backend (like setuptools, hatchling, flit-core, poetry-core, etc) that takes a source artifact (typically a source distribution/sdist, or a project source tree, VCS checkout, tarball, etc) as input, and produces a built artifact (typically a wheel) that can be installed on a compatible platform by essentially just unpacking an archive and moving files into place, without needing the build backend or build dependencies installed and without having to execute any dynamic code.

Even with a pure Python project where you don’t need to compile C code to a binary executable, there’s still some work to do to get from a sdist/source tree to a built wheel that can simply be unpacked and moved into place (or, in some albeit unsupported, cases, imported and used directly), with the amount depending on the complexity of the project and the build backend. I list some of them here:

Additionally, “build” is sometimes also used (perhaps less precisely and more oxymoronically) in the context of “building” an sdist from a source tree, which typically involves a more limited set of steps to collect the source files into an archive, generate preliminary metadata and perhaps do some other preprocessing, depending on the backend and user custom configuration. However, given the obvious confusion, I’ve moved away from using that term in this context, and preferred terminology that is clearer, more specific and less potentially confusing (e.g. “generating” a sdist, “constructing” a sdist, etc).

By analogy, if you’re familiar with the typical workflow on *nix, source tree → sdist is loosely analogous to configure, sdist → wheel like make, and wheel → installed project like make install. Or using the old-style implicit setup.py-based workflow, the build step is when your dynamic setup.py would be executed to run the appropriate code within.

That’s a very useful insight, so I’m glad you overcame your fear and asked :slightly_smiling_face:

In many ways you’re right that you don’t necessarily care about “building”. In its traditional sense, building usually meant compiling a C program (or similar) into its native form, and that doesn’t apply to a language like Python.

But if you think of “building” as “taking your source code, and making it into a thing that you can give to someone else which will let them install it”, does that help? You may well think that just sending them your Python file is enough - and it is, for simpler projects. But once you have a couple of files (maybe one with your main code, and one with helper functions) or you depend on something from PyPI, having something that captures all of that detail - plus other things like a project name and version, so “the system” can upgrade or uninstall it, etc - is pretty helpful. “Building” is the process of making that shareable thing.

That’s oversimplified a lot, but does that help you understand what “building” means in the context of Python?

2 Likes

Thank you for saying this.

In my view it is not your fault that you don’t know that; it’s the fault of the packaging documentation, because it’s inadequate and incorrectly structured. It’s why in my earlier post in this thread I tried to break things down in terms of user goals like “I want to do X”. I hope others see the value in doing so as well, and radically rethinking the documentation to foreground tasks that typical users want to do (e.g., “make it so someone else can use my code”) rather than internal terminology like “build” and “sdist”.

More on the topic of this thread, the Packaging Flow document you referenced was in fact originally written by another packaging user much like you who wasn’t quite sure how all the pieces fit together and why certain things were necessary, and with the help of the community wrote up their findings to guide others. Perhaps it should, then, at least briefly explain why a project needs to be “built” to be shared with others and what that means in the context of a Python project?

Right, that would be a how-to guide (which certainly are one area that could definitely be updated and improved on the PyPUG), whereas the user was reading an explaination, which has the primary goal of explaining the concepts involved to further user understanding—answering the question of “why” rather than “how”.

While already partially organized in the Diataxis style, perhaps this explanation should be moved later or under the Explanations/Discussions section so newer users start off with the tutorial, so they first learn just the basics they need to know to accomplish what they likely want to do, without being overwhelmed with concepts that may not be important to them at this point.

On the other hand, as I understand it the thinking was to give users a high-level overview of how everything fits together, so once they proceed to the tutorial they aren’t just rotely repeating steps without really understanding what is going on, but instead have a sense of the overall picture. They are going to run into these concepts anyway once they do (or at least these terms, given the names of things in pyproject.toml), and it seems users are curious (and potentially get confused and uncertain) about how everything fits together, which is the goal of that document to explain. Perhaps instead it can be improved to help clear up that uncertainty, per my suggestion above.

hey @jagerber does this page help at all to clarify? The Python Package Source and Wheel Distributions — pyOpenSci Python Packaging Guide i wonder if it doesn’t we might want to add some language around that specific question.

This exact question relates to my comment:

“there is a level of expected background knowledge associated with how most if not all of the online resources on packaging carry”.

but this question is SUCH a good one!! it is one that i and i’m guessing MANY MANY others have asked and have struggled to find a simple, clear answer to. for me finding the answer involved talking to many people and reading a bunch of material scattered around the internet (some of which was dated).

1 Like

In retrospect, the word “build” seems to have very heavyweight connotations, e.g. invoking separate toolchains for compilation or linking or transpiling or whatever. This is certainly the case for me. I guess if I had my wish, these steps would be split up with different names and different docs for different audiences.

The simple step of “assembling a wheel” would get a much less intimidating name like “pack”, with “build” would reserved for cases where all that extra work is needed. Then pure-python projects would only have to see and learn and know about how to “pack” a wheel, and could ignore the mountain of overlapping and semi-conflicting tools for all the different use-cases that involve (what I would consider) and actual “build”. [1]


  1. every project has the same “pack” step, but some also have a “build” step earlier ↩︎

3 Likes

To be fair, that definition is coming from your preconceptions with compiled languages (for example, and especially, C/C++). A JavaScript dev might think it’s to use Webpack to generate a distributable, while a new Dev may think a small team of monkeys folds cardboard into a box and throws some packing foam inside.

My actual experience here comes from needing to build a JS library from Typescript sources, using all the separate toolchain that implies. I did not mean to get hung up on the name “pack” in particular. I only meant to highlight that:

  • Every project (more or less, now) needs to “assemble a wheel”
  • Not every project needs to “do toolchain stuff” (whatever that might mean)

So my conclusion is that splitting out just the simple “assemble a wheel” part as a concept would allow the folks who don’t need to “do toolchain stuff” (i.e. pure python packages) a quick process/docs/tools offramp that avoids a lot complexity that doesn’t apply to them at all.

Thank you all for your responses to “What is ‘building’?”. They’re all helpful. I come out of the past few posts feeling VERY sympathetic to what @bryevdv is saying about splitting out different terms for the different steps. I understand that, if my goal is to share code I’ve written with someone else, that packaging my code into a wheel and giving them that wheel is a very good way to do this. So this is something I care about. But it sounds like in some cases there might be more to “building” than packing code into a wheel, and in some cases “building” my not involve packing code into a wheel at all. So yeah, it sounds like (1) the term “build” is perhaps a little overloaded, or at least general enough as to not be useful for non-experts and (2) what I really care about is not knowing how to build code, but how to pack python code into a wheel. When I’m more advanced maybe I’ll care more about building more generally, like if I’m making a python package with non-python components).

A follow up question: what is the name of the process where a built artifact (typically a wheel) is installed (unpacked?) onto a platform (typically resulting in some files populating a folder in the site-packages directory…)? What is the name of the tool that does this? I guess the answer is installation/installer?

@lwasser I had a look at the link you sent. I think it’s the best explanation of “what is an sdist” and “what is a wheel” that I’ve seen so far. I do think an explanation of “what is building” would be helpful. There were a few sentences here and there on that page I found confusing. Where would be a good place to give that sort of sentence-level feedback on that document? Sort of unrelatedly, I did find one comment on a pyopensci blog post you wrote that I’m more and more agreeing with “Python packaging is not bad. It’s just not well documented.” So I’m becoming excited about this thread and its possible results.

Back to the topic: I feel like what I’m seeking is an explanation in the reverse order of what I’ve seen. That is, I want to know what pip (or some other installer?) looks for when it gets a wheel (or some other “built artifact”) and what it does with the information that it finds. THEN I want to know one or multiple to ways to create that artifact so that the installer can do it’s job. Right now I feel the descriptions is “so you want to share you code?” Well you can start with one of 6 different options… but since I don’t know my end goal (other than I want to be able to import my code) I don’t have any context for these different options.

2 Likes

@jagerber many thanks for this feedback. I did open an issue about being more specific about what building is here - We need to answer the question - what is "building" a package · Issue #74 · pyOpenSci/python-package-guide · GitHub feel free to add to that issue and open any other issues that you have regarding what would be helpful to folks such as yourself in terms of documentation :slight_smile:

i’m also working on some draft graphics that will hopefully tie together the entire process of creating and building a package as it relates to the various tools (these are just drafts). we welcome your feeedback in our repo via issues (to avoid distracting from this convo here !)

So this would be setuptools/build + twine for example.

1 Like

I recently came across this article on common packaging mistakes and pitfalls, which I found really helpful.

I then went back and read the PyPA docs on using setuptools with pyproject.toml, to which I’m relatively new, and it made more sense.

The confusion and frustration surrounding Python packaging, in my view, is partly due to all this jargon that has been built up over the years: everyone inventing or using their own terminology to refer to what are essentially the same things. So you have “package”, “egg”, “distribution”, “source distribution”, “wheel” (a word which I hate), etc. and who knows what else. These words don’t refer to different concepts, but the same. If we want standardisation of tools then we should also standardise on terminology across all published documentation, and encourage developers to do the same.

Finally, I’d like to say something positive, on PDM - from recent experience I’ve found this to be probably the best of the current crop of new-style packaging tools. The documentation is quite nice, although a bit sketchy on some details. There’s a useful conversion tool to convert a setup.py to a pyproject.toml - needed some manual work to complete, but in the end it’s working nicely, and I’ve able to get rid of setup.py entirely from one particular project I’m working on.

Uh, no – eggs, source distributions and wheels are three different distribution formats and definitely don’t mean the same thing. A “package” is a conceptual “bag” of reusable Python code, and a “distribution” is AFAIK a concrete file/folder containing the package, in one of the above mentioned formats.

With that being said, I agree that there is a lot of confusing terminology around. My personal pet peeve is the use of PEP numbers. For example, pip wheel --use-pep517. I for one can never remember which is which between PEP 517 and PEP 518, and I think it’s Greek to Joe User…

Agreed, and my personal apologies for that particular flag, which I implemented :slightly_smiling_face: In my defense, I would say that a lot of the concepts that we use PEP numbers for simply don’t have good “natural” names - and of course, inventing terms simply exacerbates the “confusing terminology” problem. I know that --use-pep517 was very much a case of “I can’t think of anything better”, for example…

We’ve moved towards “pyproject” as a term instead of “pep517”, reflecting the fact that it generally refers to projects that use pyproject.toml. IMO it’s still not ideal, but it’s better than the PEP number. Although we almost certainly won’t change the --use-pep517 flag in pip, there’s not much point as we’re progressing towards a point where the legacy behaviour will be removed, and --use-pep517 will be the default (and only!) behaviour. So you will be rid of it relatively soon :slightly_smiling_face:

1 Like

Personally I am at least thankful only one of these is actully valid, there are definitely worse much flag names out there…

Let’s link the glossary from the Python Packaging User Guide, it is in quite good shape these days:

https://packaging.python.org/en/latest/glossary/

Err, good shape? As far as I can tell, the glossary hasn’t been touched much in years and many parts are half a decade or more out of date (a lot of it is stuck in the legacy setup.py build pre-pyproject (PEP 517/518) era); I made a PR to update it as part of a bunch of other changes (adding the PEP 517/518/660 to the packaging specs), but it was way too big to be reviewed and accepted, which was a mistake on my part, and I was pretty burnt out after spending most of a week working on all that so I haven’t gotten back to it yet. For now I’ve done some partial updates in PEP 639 where relevant to that PEP, and hope to get the time to split off and refactor just the glossary updates in a single atomic PR and submit that at some point soon.

@CAM-Gerlach Ah, you might be right. I did not read every word of the page. I mostly looked at the terms that were mentioned (eggs, source distributions and wheels) and some other things that I know trip up a lot of users (most importantly distribution package vs. import package). All in all, it seems quite okay from my point of view. I did not spot any big issue that would be terribly misleading to a new user, but for sure there is quite a bunch of things to fix. Do you have a link to your old PR? I guess it is this one:

Different distribution formats, yes, but distributions all the same - code artifacts you build to share with others. I was making a point about the distribution concept quite generally, not talking about formats.

If that were the case, why is there so much confusion and frustration over Python packaging?

Of course, I do not have an answer.

I do not know how well the Python Packaging User Guide is referenced (ranked) in search engines. But for sure there is an incredible amount of outdated, misguided, or plainly incorrect advice all over the internet (I probably produced some) that we can’t fix.

This glossary is for sure better than a lot of what I regularly see in questions and answers on StackOverflow (where I am mostly active). So I guess that is what I meant.