Python packaging documentation feedback and discussion

I recently came across this article on common packaging mistakes and pitfalls, which I found really helpful.

I then went back and read the PyPA docs on using setuptools with pyproject.toml, to which I’m relatively new, and it made more sense.

The confusion and frustration surrounding Python packaging, in my view, is partly due to all this jargon that has been built up over the years: everyone inventing or using their own terminology to refer to what are essentially the same things. So you have “package”, “egg”, “distribution”, “source distribution”, “wheel” (a word which I hate), etc. and who knows what else. These words don’t refer to different concepts, but the same. If we want standardisation of tools then we should also standardise on terminology across all published documentation, and encourage developers to do the same.

Finally, I’d like to say something positive, on PDM - from recent experience I’ve found this to be probably the best of the current crop of new-style packaging tools. The documentation is quite nice, although a bit sketchy on some details. There’s a useful conversion tool to convert a setup.py to a pyproject.toml - needed some manual work to complete, but in the end it’s working nicely, and I’ve able to get rid of setup.py entirely from one particular project I’m working on.

Uh, no – eggs, source distributions and wheels are three different distribution formats and definitely don’t mean the same thing. A “package” is a conceptual “bag” of reusable Python code, and a “distribution” is AFAIK a concrete file/folder containing the package, in one of the above mentioned formats.

With that being said, I agree that there is a lot of confusing terminology around. My personal pet peeve is the use of PEP numbers. For example, pip wheel --use-pep517. I for one can never remember which is which between PEP 517 and PEP 518, and I think it’s Greek to Joe User…

Agreed, and my personal apologies for that particular flag, which I implemented :slightly_smiling_face: In my defense, I would say that a lot of the concepts that we use PEP numbers for simply don’t have good “natural” names - and of course, inventing terms simply exacerbates the “confusing terminology” problem. I know that --use-pep517 was very much a case of “I can’t think of anything better”, for example…

We’ve moved towards “pyproject” as a term instead of “pep517”, reflecting the fact that it generally refers to projects that use pyproject.toml. IMO it’s still not ideal, but it’s better than the PEP number. Although we almost certainly won’t change the --use-pep517 flag in pip, there’s not much point as we’re progressing towards a point where the legacy behaviour will be removed, and --use-pep517 will be the default (and only!) behaviour. So you will be rid of it relatively soon :slightly_smiling_face:

1 Like

Personally I am at least thankful only one of these is actully valid, there are definitely worse much flag names out there…

Let’s link the glossary from the Python Packaging User Guide, it is in quite good shape these days:

https://packaging.python.org/en/latest/glossary/

Err, good shape? As far as I can tell, the glossary hasn’t been touched much in years and many parts are half a decade or more out of date (a lot of it is stuck in the legacy setup.py build pre-pyproject (PEP 517/518) era); I made a PR to update it as part of a bunch of other changes (adding the PEP 517/518/660 to the packaging specs), but it was way too big to be reviewed and accepted, which was a mistake on my part, and I was pretty burnt out after spending most of a week working on all that so I haven’t gotten back to it yet. For now I’ve done some partial updates in PEP 639 where relevant to that PEP, and hope to get the time to split off and refactor just the glossary updates in a single atomic PR and submit that at some point soon.

@CAM-Gerlach Ah, you might be right. I did not read every word of the page. I mostly looked at the terms that were mentioned (eggs, source distributions and wheels) and some other things that I know trip up a lot of users (most importantly distribution package vs. import package). All in all, it seems quite okay from my point of view. I did not spot any big issue that would be terribly misleading to a new user, but for sure there is quite a bunch of things to fix. Do you have a link to your old PR? I guess it is this one:

Different distribution formats, yes, but distributions all the same - code artifacts you build to share with others. I was making a point about the distribution concept quite generally, not talking about formats.

If that were the case, why is there so much confusion and frustration over Python packaging?

Of course, I do not have an answer.

I do not know how well the Python Packaging User Guide is referenced (ranked) in search engines. But for sure there is an incredible amount of outdated, misguided, or plainly incorrect advice all over the internet (I probably produced some) that we can’t fix.

This glossary is for sure better than a lot of what I regularly see in questions and answers on StackOverflow (where I am mostly active). So I guess that is what I meant.

Some examples from the terms mentioned above of things that are obsolete or widely agreed to be bad practices nowadays:

Sdist: A distribution format (usually generated using python setup.py sdist)

Directly executing setup.py has been deprecated and strongly discouraged for many years now, with python -m build as the current recommended replacement, and is one of the most common bits of obsolete information I still see floating around the internet.

Egg: A Built Distribution format introduced by setuptools, which is being replaced by Wheel.

Eggs have long since been replaced by wheel as of 5+ years ago, well before I even learned Python, and PyPI will (finally) be removing support in the next month or so after a long deprecation period.

Wheel: A Built Distribution format introduced by an official standard specification, which is intended to replace the Egg format. Wheel is currently supported by pip.

Similar to the previous, it isn’t merely intended, it has replaced eggs for modern usage as of many years ago, and is supported by every current tool that intends to work with PyPI packages directly (as opposed to redistributors, etc.).

Yup, that’s the one, though before pushing that I ended up trimming off to a separate branch a bunch of other glossary fixes and updates I’d made locally but weren’t sufficiently relevant to be included in that PR, so there’s actually a good bit more than that. I ended up incorporating some of those changes into the revisions to PEP 639’s terminology section, where relevant to that PEP, as well as a number of others (related to terminology around “pyproject builds”, “pyproject metadata”, “pyproject backend/frontend”, replacing terminology referencing PEP numbers, etc), though I haven’t quite finished with that either and hope to open a PR for that soon.

Right, then if you’re referring to the concept of a distribution package generally, you would use the term “distribution package”. If you’re referring to a specific distribution format, or a distribution artifact in that format, you’d refer to it by its appropriate name (currently, just sdist and wheel when it comes to PyPI). The terms “package” and “distribution” alone to refer to distribution packages are discouraged, as both are ambiguous and can mean multiple different things that can easily be confused—“import package” for the former, and “Python distribution” (Anaconda, WinPython, Mambaforge, etc) or “Linux distribution” (Debian, Fedora, Ubuntu), etc. for the latter. And “eggs” are thoroughly obsolete and can be disregarded.

Of course, not all packaging documentation is careful in following this guidance constantly, but that’s the goal we’re working towards, and we’d be glad to have your help in doing so if you spot instances of ambiguity!

1 Like

6. Modules — Python 3.12.1 documentation does not use the term “import package”.

I wonder if it would not be less futile to switch to a different term there (e.g. “folder module”?) There’s no obvious reason this concept has to be confusingly called anything related to packages instead of modules. The Python language itself doesn’t seem to use the word package anywhere for this.

Then you could get rid of the confusing word “distribution” for package completely. Just use “package” like the rest of the programming world already does, as does most of the Python world in practice. It’s not PyDPA and PyDPI. :slight_smile:

Right, though that documentation doesn’t discuss distribution packaging at all—there, “package” is only ever defined and used to mean an import package. Its certainly arguable that the unambiguous term should be used there as well, though as the reader will presumably read that tutorial as they are first leading how to program in Python, well before they get to sharing their code with others through distribution packaging and they are taught even what that is, it seems to me to be less potentially confusing than its use in various areas of, say, the packaging.python.org site, never mind third party blogs and articles.

In hindsight, given how Python and world evolved in the meantime, it certainly might have been a better idea to use terminology like “[file/directory] module” rather than repurposing the word “package” to mean a directory of modules that can be imported (as opposed to the distribution-oriented container around them).

However, I’m not really sure how realistic it would be to change it at this point, given it is a long-established term fairly deeply embedded in all kinds of names and parts of the language, import system, stdlib and distribution packaging infrastructure, and the fact that the unambiguous term “distribution package” would continue to need to be used for at least the length of the (5-10+ year) transition period, in order to continue to be unambitious, until the revised usage is widely used and accepted.

Considering we’re still weaning people off direct invocation of setup.py for building 5+ years from that being deprecated, and probably will be for years to come, I’m not optimistic, and such name changes also consume valuable and highly limited churn budget already strained in recent years, with modern packaging standards only now finally starting to stabilize.

I’m not sure what you mean, sorry. Just in the Python language/stdlib and its official documentation alone, “package” in the context of “import package” (with a few exceptions, core Python does not deal with distribution packaging nearly at all—make of that what you will) is used, for example:

So, I’m a little confused what you’re trying to say, sorry.

I assume it would be almost impossible, but still more realistic than convincing the entire world to use a weird term (distribution) for a very common concept (package). :slight_smile:

The docs can be changed, the library names deprecated and renamed. It would be a humongous amount of work. (But possible?) With luck, after a very long time (5-10+ years) the confusion might start to fade away.