Removing setup.cfg and setup.py from the packaging tutorial

sbaack · June 13, 2022, 2:42pm

To add: If I understand correctly, the version of the backend tool you have installed locally via pip (or maybe pipx) is ignored when you call python -m build or python -m pip install -e ., because it will always download the latest version anyway? Then maybe mention that too. This way, users understand that they can completely ignore the fact that their venv includes an outdated version of setuptools by default (at the moment at least) because even if they specify setuptools as their backened, python -m build will download the latest version of it?

fungi · June 13, 2022, 3:02pm

Well, it doesn’t necessarily install the latest versions of the
backends, since your build-system.requires can include version
specifiers as part of its syntax. It will rely on pip’s solver
though, which yes will default to pulling the latest version if not
explicitly instructed to do otherwise by configuration.

CAM-Gerlach · June 13, 2022, 3:28pm

Perhaps @henryiii can add a line or two to the pyproject.toml section in his PR stating something like this?

Funny enough, I was just debugging some issues installing Spyder in editable mode that appear to be related to this. If importlib.metadata is available (Py 3.8+) or importlib_metadata is installed, the current legacy console_script installed by setup.py develop script uses them; only if neither is found does it fall back to pkg_resources (and thus require a runtime dep on Setuptools).

However, it still does some legacy stuff that causes issues, specifically setting __requires__ which seems to be the cause of an error later when we import pkg_resources if our deps are constrained too tightly for the development environment (which should be fixed very soon once I finally get rid of Spyder’s runtime dependencies on pkg_resources and replace it with packaging for version parsing and importlib_metadata for entry points).

Specifically, the fallback order is:

try:
    from importlib.metadata import distribution
except ImportError:
    try:
        from importlib_metadata import distribution
    except ImportError:
        from pkg_resources import load_entry_point

+100

Clarifying in the packaging guide that the main pain points people are worried about are no longer the case would likely go a long way to help that, at least in the case of folks like @sbaack

Yes, unless it affects frontend caching, but that’s an implementation detail.

That’s really an implementation detail of the frontend, so I’m not sure if its worth mentioning to beginners, aside from making very clear that the build deps are installed automatically in an isolated env and the user’s current working env has nothing directly to do with it. All that’s guaranteed is that the backend dependency versions will satisfy the constraints specified in build-system.requires.

bhrutledge · June 14, 2022, 12:26pm

I agree something like this would be useful, along with clarifying the distinction between frontend and backend tools. However, I don’t think it’s on @henryiii to add those; he’s put a lot of work into this already. As a maintainer of PyPUG, I think the main blocker for merge is the aforementioned issue related to package naming, which has a proposed resolution, but is waiting for someone to implement it. So far, neither @henryiii or myself have taken the time to do that. If/when I get to it, I might implement some of the content suggestions here, but they could also be implemented in subsequent PRs.

brettcannon · June 14, 2022, 10:39pm

At this point is there something fundamental preventing the PRs from being merged? My read of the latest replies seem to revolve around adding more clarity in some sections and not a fundamental disagreement in the approach. As such, would it make sense to get the PRs merged and then have people submit PRs to suggest clarifications as appropriate?

bhrutledge · June 14, 2022, 11:04pm

Yep, as I noted in my previous comment (though maybe not clearly enough), and in the PR description. TL;DR: The metadata name needs to match the directory name in order for flit and hatchling to work as expected. @henryiii has indicated he can make that change when he gets back from his travels. I might be able to do it this weekend.

brettcannon · June 14, 2022, 11:10pm

Sorry about that! Swamped at work on top of a bit of COVID brain fog probably is not helping with my reading comprehension.

sbaack · June 15, 2022, 8:30am

I would say it’s worth mentioning somewhere that the local setuptools installation in new venvs created with python -m venv is only there for legacy reasons, but completely ignored by python -m build (even if you specify setuptools as your build backend). Because this was very confusing to me. Maybe with a separate info box or something

pf_moore · June 15, 2022, 9:19am

I’m actually inclined to consider that as a limitation of those two backends, and I wouldn’t want to suggest in the tutorial that the standards require you to follow that rule. There has been some pretty heated debate in the past over people wanting tools to preserve the “official” name of a project rather than forcing a normalised name (“Django” vs “django” being a common example) and IMO the tutorial should stay out of that debate.

By all means only use examples where the display name and the normalised name are the same, just don’t suggest that’s required. (And if people report issues along the lines of this, frame the responses as “yes, that’s a limitation of your particular backend” rather than as “you shouldn’t do that”).

Also, I should say that while I do support the idea of distinguishing display (project) name and normalised (install) name, I’m not insisting on it - if someone wants to propose that we require normalised forms everywhere, that’s fine (I’ll argue against it, but I’ll accept whatever consensus arises).

layday · June 15, 2022, 12:26pm

flit and hatchling allow you to specify the package/module name independently of the project name as do all the backends on that list. These two backends do also allow you to omit the package/module name if it’s identical to the raw project name whether it be normalised or unnormalised, as long as it’s a valid Python identifier. In my very subjective opinion, this is a bad idea and the tutorial should favour explicit configuration.

bhrutledge · June 15, 2022, 1:01pm

I think the issue is that explicit configuration of the package name is backend-specific, e.g. via a [tool.hatch] section. That conflicts with the desire to keep the tutorial backend-agnostic.

pf_moore · June 15, 2022, 1:49pm

Conversely, letting the reader assume that the backend will always default to using the project name as the import name also conflicts with the intention of keeping things backend agnostic.

Maybe say something like:

You need to tell the backend where your project files are. That has to be done in a backend-specific manner. However, some backends (including flit and hatch, which we are using here) will default to assuming your project consists of a single import package, named the same as the project. We will assume you are using such a backend here - if not, or if you need to use a different name for the project and the import package, please refer to your backend’s individual documentation for how to do this.

pf_moore · June 15, 2022, 5:42pm

One thing I would say, though - I don’t think we should push the “all backends are interchangeable” message too hard. There are good reasons why we have different backends - they offer different trade-offs, different philosophies, and different “extras”. And I think we should be explaining to new users that deciding which backend they prefer is a choice they should be making. The point of the standards is to make that choice (relatively) painless, not to remove it.

And “how do I specify what files should be in my distribution” is one of the distinguishing features that people might care about.

mwichmann · June 15, 2022, 10:52pm

I know this has been quite a long thread, so sorry for a dive in from outside…

For a tutorial, write it in a way that reflects what are considered best practices. It’s fine for that to be somewhat opinionated: that is, a tutorial should show a good way to do something, it doesn’t have to prove it’s the absolute best way given a multitude of different choices, but it should be a good way that works for everyone. Don’t mention that there are 16x choices if you use these different implementations/options/configs, etc. - that’s not the job of a tutorial. That belongs in a different kind of doc.

I don’t think we should push the “all backends are interchangeable”

IMO, a tutorial should show a way to do something that’s agnostic - that is, for the steps shown, all backends would indeed be interchangeable. In a tutorial, there’s no need to go deeper into variations from different backends.

ofek · June 16, 2022, 1:29am

Hatch supports that. The issue before Henry fixed it today was that essentially the project name was foo while the package directory was bar.

sbaack · June 16, 2022, 8:35am

I agree with @mwichmann here. Choice is great for experienced users, but for beginners, choice can be a burden. Or in other words, I would say that deciding which backend is preferred is a choice users should be making after they’re familiar with creating simple, pure Python packages like the example project created in the tutorial. I think the differences between backends are neglectable in this case? It’s totally fine to encourage users to explore the differences of the backends at the very end of the tutorial though.

BTW, is there a good, up to date comparison between the backends in cases where I want to make a pyproject.toml-only project like the one in the tutorial AND I want to use build and twine instead of backend frontends? What exactly are the differences in this scenario and when do they matter? I know that setuptools has lots of legacy stuff and doesn’t support editable installs without setup.py at the moment, and that PDM has PEP 582 support. But beyond that?

abravalheri · June 16, 2022, 7:57pm

Speaking about my personal feelings as a person that recently became a contributor and maintainer of setuptools (disclaimer: I don’t speak for the project, the options here do not reflect the ideas of the setuptools team as a whole or the other maintainers/contributors).

I support adding a tutorial focused on pyproject.toml-only (however maybe it would be nice to leave the old docs accessible somewhere too?)

What I would not be comfortable with, is the tutorial clearly stating that a specific backend solution is “recommended” or “recommended for beginners” (as currently implemented in the PR), over the others, for the following reasons:

A page under the “python.org” domain brings with it some feeling of “official”. If this webpage recommends one solution over the other it will inevitably change the weights of the choices.
Saying that one solution is “recommended for beginners” implicitly says that the others are “not recommended for beginners”. We have been putting a lot of effort lately on setuptools with the specific purpose of making it better for beginners (the documentation improve all the time, the configuration using pyproject.toml was super simplified to detect most of the things automatically…). I have been instructing colleagues to start packaging projects with setuptools and they seem to be very happy. Seeing something that implies that setuptools is not recommended for beginners, makes me sad.
It is not clear what are the parameters that are used to define a solution as “recommended” or “not recommended”. What are the requirements for a solution to achieve this status?
Recommendations change with time… It is not clear if the content of the tutorial can change the recommendation in the future and what would be the process to do that.

Please note that I fully support changing the tutorial to favour interoperability. I understand that setuptools is not the only backend available nowadays and that we should make the tutorial reflect that. But I strongly disagree in emphasizing one solution over the other (which includes letting a default tab open by default with an specific backend configuration).

Right now I understand that setuptools might be more complicated for beginners if we focus on pyproject.toml metadata only, since it does require an extra file for the “editable” parts^[1], but that is bound to be changed.

I also understand, as pointed previously in this discussion, that there is a lot of content out there about setuptools and old ways of doing stuff. However I don’t think we should penalize one tool over the other because of webpages that the tool developers don’t have any control over it (in a few years any tool could be subject of a lot of “legacy” content spread around) … We have the compromise of updating setuptools docs, but we don’t have control over the internet…

In my opinion the tutorial should simply list the backends (without using tabs to hide them, since it may induce choice) and abstain from influencing users (I am thinking about the weight that a webpage under the “python.org” domain has in the user’s opinions…)

Although this does not seem to be relevant for this tutorial ↩︎

dstufft · June 16, 2022, 11:06pm

Anderson Bravalheri:

Please note that I fully support changing the tutorial to favour interoperability. I understand that setuptools is not the only backend available nowadays and that we should make the tutorial reflect that. But I strongly disagree in emphasizing one solution over the other (which includes letting a default tab open by default with an specific backend configuration).

Right now I understand that setuptools might be more complicated for beginners if we focus on pyproject.toml metadata only, since it does require an extra file for the “editable” parts[1], but that is bound to be changed.

I also understand, as pointed previously in this discussion, that there is a lot of content out there about setuptools and old ways of doing stuff. However I don’t think we should penalize one tool over the other because of webpages that the tool developers don’t have any control over it (in a few years any tool could be subject of a lot of “legacy” content spread around) … We have the compromise of updating setuptools docs, but we don’t have control over the internet…

In my opinion the tutorial should simply list the backends (without using tabs to hide them, since it may induce choice) and abstain from influencing users (I am thinking about the weight that a webpage under the “python.org” domain has in the user’s opinions…)

I think that as a general principle we should attempt not to prioritize one reasonable build backend over another.

I also think that as a general principle, we should aim to produce documentation that is as easy as possible for the intended audience to understand and achieve the goal behind that documentation.

When we’re talking about documentation whose intended audience is beginners and some choice has to be made, these two principles are at conflict with each other, and we have to compromise between them in some fashion. This is because anytime you ask a beginner to make a choice, that requires to teach them the differences between those choices, and the consequences of those choices. That’s a pretty big ask of a beginner, and in a lot of cases they’ll just end up picking one at random out of the list^[1].

Essentially, a beginner is in a horrible position to make the choice that we’d be asking them to make. They don’t even understand how to produce a Python package at this point, much less understands the subtle differences between why they would choose one over another. The people authoring this guide is in a much better position to simplify things, to help the beginner get something packaged, before introducing them to the added complexity of having to pick your backend.

I think that the solution in the PR represents a pretty reasonable compromise. It shows that there is a choice to be made, and it picks one of the fairly reasonable choices, and makes it easy for people using this guide to switch to using something else. I don’t think there is a reasonable solution that doesn’t involve, in some way, making this choice for the user^[2] that doesn’t compromise too much in the direction of making the guide introduce too much cognitive overhead for the beginner.

I also don’t think it’s entirely fair to call pointing out the reality of the world with regards to how much misleading or outdated information is out there in regards to a specific backend as “penalizing” that backend. Given that, I believe we have to make a choice, there are in fact choices that are better in some minor way, and if we’re forced to make a choice, we should use any relevant information to aid in that choice, and while it certainly isn’t your fault, or the fault of anyone on the setuptools team, that doesn’t change the reality that those existing docs/etc exist as possible points of confusion for someone new^[3].

I do think it is understandable for backend authors/maintainers to be a put off by this trade off, particularly if they’re not the one being selected for the default choice. Unfortunately, I don’t see a way not to make this trade off, that doesn’t hinder the people this guide is targeting even more.

This ends up going down into a never ending cycle. If you make a list, then you can say that the ordering of objects within that list prioritizes one project over another, as it does for ballots, which is one of the reasons why many modern voting systems use randomized ballot order. ↩︎
For instance, I could imagine a change that randomizes which backend is selected and/or the ordering of the tabs, and stores that as a cookie. That has it’s own problems though with confusing users who open the guide on different computers or something, and warrant it’s own discussion as an iterative improvement in a later PR. ↩︎
And to be clear, it’s not like these things go completely against setuptools. One positive thing for setuptools is that the lion’s share of packages out there use it today and (as far as I can tell) it is by far the most flexible backend, which means they are less likely to need to throw away their existing backend and switch to setuptools if they want to do anything particularly interesting like build against a Rust project. ↩︎

abravalheri · June 17, 2022, 7:18am

Thank you very much Donal for sharing these ideas. I understand your points, but I think that there are different (better) ways of reconciling these two distinct objectives.

(Sorry for the massive essay that follows . As before these are personal opinions and I don’t represent the setuptools team)

I don’t think that picking one randomly out of the list is a bad outcome. In fact, a random choice would promote the most healthy growth of the ecosystem. Having different people to try out different tools would encourage friends/colleagues/co-workers to discuss about these different tools and exchange information/impressions/anecdotes and as a result throughout the time users interested in Python packaging would develop a better and more informed understanding of the ecosystem.

This might be the best way of promoting the diversity and openness that the PyPA standards have been striving for.

Considering these two objectives:

The second objective does not require explicitly stating that one tool is “recommended” (and implicitly implying that the others are not).

Given the tutorial (and the interoperability promoted by PEP 621), any choice of tool will produce a text that is easy to follow and perform the first packaging.

I understand that making a choice and explicitly telling the users to follow that choice, is something that will make the tutorial easy to follow. But then, as previously pointed out, justifying this choice as “random” (or not justifying at all) is something that can be done.

I would not like to separate these two discussions, because they are intrinsically connected. Right now the problem pointed out is that the tutorial needs to make a choice to make it easier to follow. Having a “recommended” tool, is something that contradicts another goal (i.e. not prioritize one tool over the others). A choice justified as “random” solves the problem while not contradicting a second goal.

If we agree that a random choice is a good way forward, we can start by rewording the existing PR to: (a) justify the choice as “randomly selected” (or something similar) instead of “recommended” and (b) expanding the tabs into a list, to avoid hiding. Than a follow up PR could bring dynamicity to it. I understand that the second PR might be a lot of trouble, but if we choose this path, I volunteer to investigate how to add a dynamic random choice and provide the follow up PR.

Regarding the problem of same user opening the guide in two different computers: the only thing that would change are 2 lines in the example configuration, and the exact content of these two lines is irrelevant to follow the tutorial. If the user starts following the tutorial in one computer, stops, and then keep going in a different computer, they should be able to follow it without re-doing any configuration. Moreover, by stating that the choice is “random” and that everything works, they would not be compelled to change any files that they might already have produced in their computers.

This is a very fair comment, but we should use the same lenses for analysing any other possible choice. For example, there is another aspect that may imply that some choices are better in some minor ways than others: promoting any backend that is closely intertwined with a full solution with brings project management features will inevitably lead the users to explore that tool’s documentation, and then they are in a position that they have to understand a lot of new concepts at once (this can be made worse if the backend is not documented in its own, but rather depending on the documentation of a different tool).

In the PR review I pointed out some doubts that may appear with the current choice and its maintainer kindly pointed me out in the direction of better understanding. However, this does not change the fact that beginners might have the same doubts, and that recommending a tool that does not “have a life of its own” will lead users towards a complex and more difficult to grasp project-management solution.

(Please note here that I don’t think we should promote setuptools in the tutorial either… I think different solutions have different advantages and disadvantages for beginners and different people will put different weights when deciding. That is why random choice sounds like a good compromise.)

bhrutledge · June 17, 2022, 1:12pm

As a maintainer of PyPUG, and someone who got started in Python packaging via this tutorial, I strongly disagree that a random choice is a good way forward. As folks have noted, a naive implementation will be surprising to readers who come back to the tutorial. It will also make it harder for maintainers and others to offer support. Addressing those concerns will add burden to PyPUG contributors and maintainers, and I think even a naive implementation is non-trivial complexity.

In short (and at the risk of oversimplifying): simple is better than complex.

The current PR preview says:

this tutorial recommends Hatchling for its simplicity and speed, but it will work identically with setuptools, Flit, PDM, and others that support the [project] table for metadata.

That feels like a relatively benign recommendation to me, but I’m happy to make it more neutral, e.g. “uses” or “defaults to” instead of “recommends”. However, I think some indication of why a default was selected is important.