Removing setup.cfg and setup.py from the packaging tutorial

domdfcoding · June 17, 2022, 1:28pm

Could the first tab not be a backend itself, but rather instructions telling the user to choose a tab to view the backend-specific instructions? The tab could also explain what a backend is and what the [build-system] table is for. That way there’s no explicit recommendation, but also no randomness when reloading the page.

pf_moore · June 17, 2022, 1:45pm

I think @dstufft is right here - we need to remember that the target audience (newcomers) simply don’t have the background to make a choice, and we’re doing them a disservice by forcing them to.

As long as we’re explicit that we’re just picking one of a number of equally acceptable backends, and that the instructions will work just as well with any of the others, then I think we’ve done the best we can. Let’s not let the perfect be the enemy of the good here.

dstufft · June 17, 2022, 4:47pm

I think rewording this to be neutral is a pretty easy win to remove some of the sense that this guide is prioritizing one backend over another. I’m not hardly an expert in what is best for beginners, so I don’t have a good sense in whether we actually need to justify the choice at all, or whether we could say something like:

this tutorial uses Hatchling by default, but it will work identically with setuptools, Flit, PDM, and others that support the [project] table for metadata.

We still end up with the inherent bias in selecting any default, but we remove any sense of a value judgement in the why something was picked. Could maybe even add something like “Feel free to select another backend” or something.

abravalheri · June 17, 2022, 8:55pm

If the consensus is that choosing a default is necessary and that a random selection is not suitable, both changes proposed by Donald seem indeed appropriate.

On a side note: if Hatchling is going forward as the default selection for the backend in the tutorial, it would be good to update its documentation.

Right now if users look for further information on Hatchling, they would have a hard time find documentation. All the links seem to point out to Hatch and it is not self-evident if everything in Hatch’s documentation applies to Hatchling (especially because all the configuration seems to point to the [tool.hatch] table). When I was following the docs, I had the luck to ask the maintainer directly my doubts and to be pointed out in the right direction, but beginners might not have the same luck.

On a second side note: @bhrutledge, is there any objective criteria in the selection of the default backend for the tutorial? (I think this is a nice information to have as a maintainer, because it is a parameter that all the backends can aim for improvement).

bhrutledge · June 17, 2022, 10:24pm

Beyond general sentiment from a number of folks in this thread and the PR, I was swayed by @henryiii’s timing and output tests.

I had a similar experience and concern, but I’m setting that aside for now in favor of the consensus and what feels like a net improvement. The tutorial could link to the Hatch docs that @ofek has pointed out.

ofek · June 17, 2022, 11:31pm

So others have context, I responded to that here: https://github.com/pypa/packaging.python.org/pull/1031#discussion_r896815686

Yes:

CAM-Gerlach · June 18, 2022, 3:03am

For the purposes of the tutorial, Flit is arguably even more ideal, given its even faster on both metrics, gives the same output and has beginner-friendlyness and maximal simplicity, minimal effort to do things following PyPA standards and recommendations as its overriding design goals:

Make the easy things easy and the hard things possible is an old motto from the Perl community. Flit is entirely focused on the easy things part of that, and leaves the hard things up to other tools.

Which for the purposes of the tutorial is ideal, and users could fairly easily switch to a more powerful backend later if they end up needing it, thanks to PEP 621. But @takluyver didn’t seem comfortable with being the default, and in any case it seems that ship has already sailed.

domdfcoding · June 18, 2022, 6:07am

Do you or Henry know which project was used for those those benchmarks? I’d like to repeat the test with some other backends.

abravalheri · June 18, 2022, 6:13am

Thank you very much @bhrutledge, that is very good information.

that would be a very good thing to help in the development of backends.

craigf · June 18, 2022, 9:15am

Ok, thanks. So in the Creating pyproject.toml section I think it would be helpful to add a Tip/Note saying something like, ‘Some backend systems include the ability to do frontend tasks such as building and publishing. Refer to their documentation for details.’

Also, I just re-read the tutorial and see that hatchling isn’t referenced anywhere else in this tutorial, so there is actually no “default” backend; in fact all four options are simply given in parallel in the tabs.

Additionally, I agree with other comments about needing a guide to choosing a backend, and it would be helpful to reference that from here.

So, I suggest this simplified and neutral wording:
pyproject.toml tells frontend build tools (like pip and build) which backend tool to use to create distribution packages for your project. You can choose from a number of backends. This tutorial illustrates configuration for Hatchling, setuptools, Flit, and PDM. To learn more about the features of these and other backends see [this guide].

craigf · June 18, 2022, 9:19am

I just noticed this in the tutorial: " See [PEP 517] and [PEP 518] for background and details."

I would venture to suggest that references to PEPs be avoided in tutorials. As I saw someone else write recently, they are technical documents, not user-guides.

pf_moore · June 18, 2022, 10:15am

For what it’s worth, I got nerdsniped and I put together a simple benchmark harness:

from logging import INFO, basicConfig, getLogger
from pathlib import Path
from tempfile import TemporaryDirectory

from build import ProjectBuilder
from build.env import IsolatedEnvBuilder

basicConfig(level=INFO, format="%(asctime)s:%(name)s:%(levelname)s:%(message)s")
log = getLogger("main")


PROJECT = {
    "pr/__init__.py": "",
}

def pyproject_toml(name, version, requires, backend):
    return f"""\
[build-system]
requires = {requires!r}
build-backend = "{backend}"

[project]
name = "{name}"
version = "{version}"
description = "Example project"
classifiers = ["License :: OSI Approved :: MIT License"]
dependencies = []
"""

def build_project(target: Path, pyproject, project: dict[str, str]=PROJECT):
    for name, content in project.items():
        name = target / name
        name.parent.mkdir(parents=True, exist_ok=True)
        name.write_text(content, encoding="utf-8")
    (target / "pyproject.toml").write_text(pyproject, encoding="utf-8")

with TemporaryDirectory() as tmp:
    tmp = Path(tmp)
    project = tmp / "project"
    # flit - 695-834
    # pyproject = pyproject_toml("pr", "0.1", ["flit_core >=3.2,<4"], "flit_core.buildapi")
    # hatchling - 430-642
    # pyproject = pyproject_toml("pr", "0.1", ["hatchling"], "hatchling.build")
    # setuptools - 27,302-28,783
    pyproject = pyproject_toml("pr", "0.1", ["setuptools"], "setuptools.build_meta")
    build_project(project, pyproject)
    distribution = "wheel"
    outdir = tmp

    log.info("Creating builder")
    builder = ProjectBuilder(project)
    log.info("Creating environment")
    with IsolatedEnvBuilder() as env:
        builder.python_executable = env.executable
        builder.scripts_dir = env.scripts_dir
        log.info("Installing build dependencies")
        env.install(builder.build_system_requires)
        log.info("Installing extra dependencies")
        env.install(builder.get_requires_for_build(distribution))
        log.info("Building")
        builder.build(distribution, ".", {})
        log.info("Completed")

The comments pick out the timings for just the “Building wheel” step. Flit took 139ms, hatch took 212ms and setuptools took 1.481s.

I kept the project trivial, so this is just measuring the basic overhead of doing the build, not the IO cost of copying a bunch of files.

Take with a pinch of salt, your mileage may vary, etc etc…

pf_moore · June 18, 2022, 11:46am

I should probably also point out that the time to build a wheel is swamped by the time to set up the environment and install the backend, so in all honesty, the time it takes the backend to do its job is irrelevant…

ofek · June 18, 2022, 2:15pm

True except for packaging an app or something that contains thousands of files or more

pf_moore · June 18, 2022, 3:48pm

The cost of putting those files into a zip and calculating hashes (for RECORD) should be relatively constant across backends. But of course feel free to modify the benchmark to test that (it should be relatively easy to do, just change the PROJECT variable).

ofek · June 18, 2022, 4:26pm

I was just speaking to ^, that sometimes packaging files will be much slower than setting up the environment.

CAM-Gerlach · June 19, 2022, 12:29am

Would, then, the time needed to install each backend and their respective deps be valuable, as it would vary between backends? Given @henryiii 's testing did report some pretty non-trivial time differences, and initial environment creation (before installation of any backed-specific dependencies) depends on the frontend, not the backend, is the backend install time where those differences are coming from?

JDLH · June 19, 2022, 1:57am

Hi folks! I just arrived at this very interesting thread. Someone suggested that I give my thoughts, as a developer of Python apps who until now has been pretty naive about all aspects of Python packaging.

1: I support PR #1031 by @henryiii , basically in the spirit of taking incremental steps and iterating for improvement. Let’s resist the temptation to let the infinite possible amendments block approval of all improvement.

2: The biggest surprise I had about PEP 517’s architecture is that its scope is only the creation of wheels and source distributions from Python source code, and that “build” refers only to this set of outputs and inputs.

I am used to a very broad meaning for “build”, where there are lots of different outputs artifacts and lots of creations. Think of what “build” means to a C-language project with autotools: the creation of configuration files, makefiles, object files, libraries, applications, user documentation, developer documentation, etc.

The official Overview of Packaging for Python lists more than 15 classes of deliverable code artifacts, and doesn’t even get into documentation production. But “build”, to PEP 517 and pyproject.toml, apparently mean only two of those artifact classes.

I suggest that a future edit to the explanation or specification of the PEP 517 and pyproject.toml architecture be specific that their scope is limited to these artifact classes, and is not more broad.

3: This tutorial refers readers to “PEP 517 and PEP 518 for background and details”. I agree with @craigf above that we should avoid this. I think the alternative is to improve the PyPA specifications so that they completely specify the architecture formerly known as "PEP 517 and pyproject.toml", without sending the reader off to read PEPs, and then refer to those specifications instead. I imagine an Explanation (in the Diátaxis framework sense) of the architecture would also be useful, and the tutorial could refer to it also. But if we have to refer Tutorial and Guide readers to PEPs, our documentation is not complete. (To be clear: citing PEPs as authorities from Specifications and Explanations is fine.)

4: +100 to the people saying that the voice of packaging.python.org should be neutral with respect to tools. I thought the Managing Application Dependencies pushed a pipenv and requests a bit, though it did list alternatives at the end. Related: documentation pages for tools should look clearly different from the Guide. (I recall a place where a Python Packaging Guide page linked me to an tool’s page, and the formatting of the tool’s page was so similar to the Guide that I did not realise I had changed realms. I felt a bit manipulated. Sadly, I cannot find that link now.)

5: The various pages in the Guide seem to date from different eras of packaging tools: some are from the setup.py era, some are from the setuptools/setup.cfg era, while some are from the pyproject.toml era. It would help to note on the older-era pages that they do refer to older ways of packaging, and point to the more recent alternatives if any.

If anyone can point me to good writing projects for a packaging novice, please do. I like writing the docs that would have helped me as I was learning. There is a lot of good information already in the Python Packaging Guide, but there is also lots of room for improvement.

henryiii · June 19, 2022, 2:01am

I used pipx run cookiecutter gh:scikit-hep/cookie. I made temporary directories and made each one in that temporary directory so I could keep the names the same between them. It’s a package with __init__.py and py.typed in a src directory. For the “error” run, I made the same “mistake” in each package to test the error output.

FYI, you probably normally add on the SDist creation time too, since build makes an SDist then builds the wheel from it by default.

henryiii · June 19, 2022, 2:11am

Flit’s PEP 517 backend is intentionally limited. It does not do Flit’s SCM based file inclusion, and doesn’t exclude files (like __pycache__) by default, and has no way to specify recursive excludes, so you can’t even generally exclude things like that if you want to. Flit’s author was unhappy about PEP 517 being able to build SDists, and really doesn’t seem to want to support it fully.

If you need something truly minimal (Flit is bootstrapable and zero dependency), and as long as you make sure never to use flit to make SDists (since using it could cause you to under-list source contents and make valid SDists but not support PEP 517 SDist production), it’s usable, but probably not ideal for normal users. Hatchling is intended to be useable by simple and complex packages, and has first-class PEP 517 support.