Sdists for pure-Python projects

There’s several steps required to transform a raw source tree into a something that can actually be copied into place by an installer:

  • locating the actual modules/packages
  • constructing the manifest, processing the core metadata
  • performing various checks, constructing the entrypoints
  • moving everything into the correct subdirectories to be prepared for installation
  • generating the rest of the metadata required by the built distribution output format (which may or may not be a wheel).

While things have gotten simpler with many modern build backends, historically (and still at present for many projects) this can involve any number of arbitrary dynamic transformations of the source tree into the packaged sdist, and the sdist into the built wheel (or other built/installable output). This could include:

  • fixing the version (setuptools-scm, many backends, etc)
  • generating or transforming the source files
  • moving things around
  • including arbitrary data
  • dynamically constructing the metadata
  • constraining the deps more tightly
  • Etc.

All of these things can be independent of building binary extension modules, and all of them can potentially vary based on how the build is invoked and configured, and on the desired built output format. Historically there were many more than there are today (bdist_msi, bdist_rpm, etc., etc), but third party distributors often can and do make different choices than the particular ones made for the project’s own PyPI wheel.

Not all the way down to machine code in a binary executable image, but it does need to be tokenized, parsed to AST and compiled to bytecode before it is executed at runtime by the interpreter. This is cached to disk (pyc) prior to first execution to non-trivially improve import and execution time.

As @pf_moore mentioned, nearly all distributors’ tooling is set up to build from sdists rather than wheels, because they have their own build distribution formats. Nowadays that often (but far from always) involves building a wheel and extracting the contents, it is done under a specifically-controlled/customized build environment that ensures the tooling works reliably. Furthermore, downstream redistributors typically require the tests to ensure their packages work properly, the docs to bundle in their installers, other metadata, config files and assets from the source (e.g. .desktop files), and the like.

Besides redistributors, it provides a canonical, checksummed (and potentially signed), complete and buildable source archive for that version of the project, independent of the platform, Python version and binary format, that contains the complete project source metadata, can be built to any supported format as required, and built under the conditions under the user’s control. And if, say, there ends up being a bug, compat issue, limitation, etc. in wheel, build, the project’s build backend, etc. that caused something to go wrong in specific cases or future versions, or required files to get omitted from the build distributions (licenses, etc), they can be rebuilt as needed from source.

5 Likes