User experience with porting off setup.py

jeanas · November 4, 2023, 12:24pm

I don’t know if I’m qualified for this, but if you think so, count me as interested.

For me, this thread was a realization moment about the outdated and confusing state of much of packaging.python.org. I already have a number of PRs being prepared.

pf_moore · November 4, 2023, 12:29pm

As I meant to imply by the footnote, I don’t plan on organising this myself. I’d be a terrible leader (or even member) of such a group.

oscarbenjamin · November 4, 2023, 12:54pm

Any migration around packaging/building a library is difficult because it is hard to know how exactly downstream users/packagers are using the existing build scripts. A setup.py can do many different things and has a large array of configuration options that can be passed on the command line. It is difficult to know what possible python setup.py ... invocations are being used so it is difficult to anticipate whether or not any new configuration still satisfies all downstream use cases. It is also basically impossible to test this without just putting the sdist on PyPI and waiting for feedback at which point it is too late to go back and change the metadata.

There are two steps in a migration from setup.py to pyproject.toml. The first step is just adding pyproject.toml, specifying the setuptools backend and then moving some static metadata from either setup.py or setup.cfg to pyproject.toml but keeping all of the build code in place in setup.py. In principle this is straight-forward but there are many potential complications that could arise and it is basically impossible for a library author to anticipate them all (see
@henryiii’s comment here).

Actually removing the setup.py might be easy but the premise that everything from setup.py can be moved to pyproject.toml amounts to saying that the build configuration for the package can be specified completely statically which is not the case for many projects. A complicated setup.py typically tries to handle things like different compiler options and building in different ways and so on and so there are some things that are inherently not “static” or at least that cannot be easily expressed statically with the options available.

I don’t think that this is a particularly complicated case but python-flint’s setup.py currently has this:

if sys.version_info < (3, 12):
    from distutils.core import setup
    from distutils.extension import Extension
    from numpy.distutils.system_info import default_include_dirs, default_lib_dirs
    from distutils.sysconfig import get_config_vars
else:
    from setuptools import setup
    from setuptools.extension import Extension
    from sysconfig import get_config_vars
    default_include_dirs = []
    default_lib_dirs = []


libraries = ["flint"]


if sys.platform == 'win32':
    #
    # This is used in CI to build wheels with mingw64
    #
    if os.getenv('PYTHON_FLINT_MINGW64'):
        includedir = os.path.join(os.path.dirname(__file__), '.local', 'include')
        librarydir1 = os.path.join(os.path.dirname(__file__), '.local', 'bin')
        librarydir2 = os.path.join(os.path.dirname(__file__), '.local', 'lib')
        librarydirs = [librarydir1, librarydir2]
        default_include_dirs += [includedir]
        default_lib_dirs += librarydirs
        # Add gcc to the PATH in GitHub Actions when this setup.py is called by
        # cibuildwheel.
        os.environ['PATH'] += r';C:\msys64\mingw64\bin'
        libraries += ["mpfr", "gmp"]
    elif os.getenv('PYTHON_FLINT_MINGW64_TMP'):
        # This would be used to build under Windows against these libraries if
        # they have been installed somewhere other than .local
        libraries += ["mpfr", "gmp"]
    else:
        # For the MSVC toolchain link with mpir instead of gmp
        libraries += ["mpir", "mpfr", "pthreads"]
else:
    libraries = ["flint"]
    (opt,) = get_config_vars('OPT')
    os.environ['OPT'] = " ".join(flag for flag in opt.split() if flag != '-Wstrict-prototypes')


define_macros = []
compiler_directives = {
    'language_level': 3,
    'binding': False,
}


# Enable coverage tracing
if os.getenv('PYTHON_FLINT_COVERAGE'):
    define_macros.append(('CYTHON_TRACE', 1))
    compiler_directives['linetrace'] = True

Some of this is used for building wheels in CI and some of it is used for development work and some of it is used by downstream packagers like conda. The conda packages are built with MSVC on Windows and link to the MPIR library instead of GMP. The PyPI wheels are built with MinGW64 on Windows instead. There is even a CI script that dynamically generates a setup.cfg file to persuade cibuildwheel to use the MinGW compiler:

github.com

flintlib/python-flint/blob/33e5485bda16339eb334c816639e22e57c4658e2/bin/cibw_before_all_windows.sh#L8-L13


      
          #
          # Make a setup.cfg to specify compiling with mingw64 (even though it says
          # mingw32...)
          #
          echo '[build]' > setup.cfg
          echo 'compiler = mingw32' >> setup.cfg

I am not at all happy with the way that all of this is configured with setuptools and cibuildwheel but somehow all of these different cases need to be handled. I just don’t see how to express all of this logic in pyproject.toml and I am sure that any change here would break something downstream.

pf_moore · November 4, 2023, 1:28pm

(Disclaimer: I don’t use setuptools for anything even remotely this complex, this is just “stuff I’ve picked up from others”)

As has been noted elsewhere, and is clearly a significant point of confusion for a lot of people, there’s no need to move any of this out of setup.py. I think this is one of the biggest pain points for people wanting to “modernise setuptools” - they see too much misinformation and feel forced to do more than they need to.

The rest of your points are important and give a good perspective on the sort of “real world” issues we need to provide support for. And I’m probably over-simplifying this point as well, and there’s still a bunch of stuff in there that “you don’t need to move everything to pyproject.toml” misses.

Reflecting on this thread, I think that a major issue here is that the packaging community, when PEP 517 opened up the freedom to develop new backends, enthusiastically embraced that ability and collectively abandoned setuptools to handle their transition to the newer standards more or less on their own. They’ve done a really good job (modern setuptools is a very different beast than pre-PEP517 setuptools) but unsurprisingly, they struggled with resourcing - and documentation, publicity and messaging are some of the first things to suffer in a situation like that. They are also the place where community support can be the most effective, but the community was off elsewhere promoting flit, hatchling, etc., and developing scikit-build, meson-python and so on.

If nothing else, it’s good to see this thread focus community interest back onto improving information and user support around setuptools as a modern build backend.

willingc · November 4, 2023, 3:07pm

I’m sorry that I wasn’t clearer in my earlier post. I used Data Science and Science as growth areas over the past decade. As a member of those communities, I definitely didn’t wish to imply that these communities are the cause of problems. The point that I was awkwardly trying to make was the tools evolved to meet the needs of users and improve the user experience.

@pitrou, you hit on a good point that:

does not seem to address novel use cases

is a challenge. Having a process to improve that would be a reasonable goal.

henryiii · November 4, 2023, 4:06pm

A few quick points (since I’m on mobile):

You can’t build extension modules with setuptools without a setup.py. It’s the only way to add them. Same for customizing commands. Or doing logic at build time other than a small predefined set. And that’s fine, setup.py is and always will be the dynamic build file for setuptools.

If you can move everything to pyproject.toml, then you can delete the setup.py. Otherwise it stays around. Both are fine. In fact, we don’t really care if you move static config or not (though it makes it easier to analyze, say by GitHub’s dependency graph, etc). From the packaging standpoint, you’ve done what we asked you to do when you add a three line pyproject.toml.

There is no built-in packaging solution. Distutils was added in python 2.0 many years ago, and has been removed in 3.12. There’s nothing special about setuptools anymore, it’s no longer added by default. It was added by default in the past, largely because it modified the built-in distutils and made it usable, it turns out it’s really hard to ship a packaging solution that you can’t update regularly. Setuptools actually forces the third party distutils.

This is great, because it means that packages can now choose the build back end at best suits them, and no longer say that they are trying to avoid adding independency, as they all add a dependency. Like Poetry? Use it! (I don’t since they don’t support PEP 621 yet, PDM is better IMO) For compiled packages, scikit-build-core and meson-python have been revolutionary. It was fun sitting with people at the SciPy sprints this year and converting 800+ line setup.py’s into <20 line CMakeLists.txt and a simple PEP 621 pyproject.toml. And now support more platforms like WASM that they didn’t before. IMO, that’s the path forward for compiled projects - setuptools/distutils really wasn’t designed for complex C++ builds. That’s why NumPy had 13,000 LoC dedicated to building before they moved to meson-python in 1.26.

If you have a pure Python package, and you can perform your configuration entirely statically, which is a huge number of packages, then it really doesn’t matter which one you pick. Hatchling is faster, smaller, and simpler than setuptools, and provides better error messages. Setuptools has a lot of legacy to deal with, including still internally being structured as a distutils plugin, while hatchling doesn’t. But if you like setyptools better, use that. You should just not be feel pressure to use it because it is the “default” or “built-in”. Flit-core has 0 dependencies and is 10-20x less code than setuptools, but has a much harder time getting file include/exclude right without some configuration. These are all absolutely fine choices, and it’s kind of the point that packages can select whatever they want, whatever works best for them, everybody is free to innovate and make packaging better. (And there is a lot of room for improvement!)

Check your favorite packages or dependencies, many of them have already moved to something modern. Of the 50 or so packages I help maintain, about 3 still use setuptools, most the others use hatchling or scikit-build-core. All of them have pyproject.tomls, of course. Many of the pypa projects use flit-core or hatchling.

hlovatt · November 4, 2023, 8:27pm

I think this is a major problem. Most people, the utterly overwhelming majority, have simple build requirements and just want to get the job done. These people are the audience the standard library should address both in terms of documentation and pre-installed code.

Instead the documentation, like @henryiii comment, says there are twenty options go investigate them all. Life’s too short. This approach of “hear is a list of things you should go study” is what spurned this thread in the first place.

The packaging experience, as it stands now, has gone backwards for the vast majority. Is there any chance of the packaging community actually agreeing on a pre-installed and documented option?

If not, I think this mess, and it is a mess, will unfortunately continue and packing will remain under resourced and fragmented. The issue is; why would anyone put their time in to fix issues and documentation, when it will be a never ending task of yet more options?

sinoroc · November 4, 2023, 9:27pm

Personally, I doubt that is the case. My feeling is rather that up until 10-ish years ago, the vast majority did not care at all about things like packaging, reproducibility, having local-dev-stage-prod environments, dependency confusion attacks, CI/CD pipelines, GPU optimized dependencies, central package repositories, and all kinds of other related things. Sure some ecosystems have had good handling of some of these things for a while and Python probably was too slow to catch up on some of these things, so what? Let’s also not forget that Python’s popularity is also a relatively new thing. The typical Python user has changed a lot and very quickly over the past 10 years, these “new” users have different expectations and skills than “older” users, the ecosystem needs time to adapt.

I guess one way to make it happen is if someone comes up with a full plan from A to Z including implementation, documentation, migration stories, clear support from the communities, approval from core Python team ^[1], evangelism, and of course willpower and stamina to make it all happen. No such someone has appeared yet ^[2]. And this discussion has happened more than once in the past year alone. There have been multiple times suggestion of the one-true-tool, anyone is free to pitch it and build it, hasn’t happened yet, but we are getting closer.

From my point of view, as things are right now and as they seem to be moving, standards is the most productive thing one can do (second being documentation). This makes it easy to swap pieces of the packaging toolchain around depending on which kind of project you are working on (or maybe which community you belong to), I like it. There are a couple of important missing pieces as standards go that I believe could be game changers and get Python packaging to a comfortable situation: lockfiles and something like PEP 711.

[1] For the pre-installation, and as far as I know core’s current position is clearly that packaging should be its own independent thing and they do want to be involved only when strictly necessary.

[2] Since PEP 517, September 2015 when this slow escape from distutils/setuptools started.

bryevdv · November 4, 2023, 9:36pm

FWIW I actually think it would be great for there to be built-in stdlib support for simple, pure-python cases, and very specifically an intentionally only those cases. By design it should be completely incapable of even contemplating scenarios that require “building” anything (C, C++, Fortran, WASM, TypeScript, whatever), or that require more than trivial declarative configuration. The thing is, that would cover alot of ground, would’t it? And very importantly: make is so that the complex cases only need to be discussed, debated, collaborated on, and coordinated around by the much smaller set of folks who actually have those more complex needs.

pf_moore · November 4, 2023, 10:21pm

A stdlib solution would have serious problems keeping up with standards. If we (for example) had added such a backend to the stdlib for Python 3.12, and then approved PEP 639 to allow specifying license metadata, the first version of that backend that could handle that would be in Python 3.13. That’s a huge delay in adopting functionality compared to a 3rd party backend.

Also, how would users of Python 3.11 and earlier use that backend?

Kwpolska · November 4, 2023, 11:01pm

What if it wasn’t part of stdlib itself, but it was a package shipped with Python, like pip is nowadays? If someone’s life would be materially improved by license metadata, or any other new standard, they could upgrade their copy of simplebuildbackend ^[1]. Everyone else can stay on the default version and it would probably still work (ignoring the new metadata). Users of Python 3.11 (and possibly even older) can install it using pip.

The bigger problem is, of course, designating the one true simple build backend, or developing it from scratch. ↩︎

davidism · November 4, 2023, 11:42pm

We are so far off topic now that I don’t even know what to do with this thread besides close it. This thread has served its purpose in providing a detailed user story, as well as reminding maintainers that there’s a lot to work on (they already knew). If you have specific suggestions about specific issues, make specific threads to discuss them and take specific action on them. But keep in mind that the refrain continues to be “not enough people are available to do all the things asked for on top of all the things that already need attention.”

The best way to improve the situation is to stick with it long term. That means joining the contributors of the project(s) you’re interested in, helping with the existing issues that need clearing, gaining trust among the existing maintainers, suggesting a detailed plan for something new that you care about, gaining support for it, implementing it, and sticking with it to support it for years after that. Yeah, that’s a lot. Yeah, that requires long term commitment and building relationships. No, one person doesn’t have to do everything. But maintaining these tools and satisfying the entire user base is an extremely large and difficult job. Step up and help directly.