I am inexperienced with building and distributing (to e.g. PyPI) Python packages. I have written a Python package consisting of multiple modules, but one module belonging to the package must be “made”, from a template.
I think I have the choice between A) making the module when the distributed package is installed (at the installation site, that is), and B) making the module before generating the package distribution.
For distribution I am prepared to author a suitable pyproject.toml and will probably use Setuptools for the builder.
Is there a Pythonic way to approach solving this, one of the choices mentioned that’s more preferable?
I hope my description makes sense and that I am asking in the right place.
As for why I need a module that needs “making” – I used to have the module in the traditional form – ready for use by Python – which featured creation of a number of types dynamically, but the type checker (MyPy) was “blind” to these, and since I very much wanted to heavily rely on type checking and reap the benefits, I re-wrote the module to be trivially generated on-demand, featuring a lot of repetition for the type checker’s sake. I traded dynamic types in for getting additional validation of the program, paying for it with a pre-processing (“making”) step.
The usual separation is between sdist (source distribution) and wheel. You usually want the sdist to be as close to your original sources as possible (so in this case, ungenerated), and the wheel should be ready to install with no further steps. The sdist should include everything it needs to convert into a wheel (that is, to do the generation) later without going back to the original sources or downloading anything.
Any build backend will run slightly different steps for python -m build --sdist compared to python -m build --wheel, and for the latter one you also know which version and platform it’s for (which probably doesn’t matter in your case).
But that’s just all convention. If it makes sense to do things differently, then you can if you need. The limitation will be that a wheel can’t run arbitrary code when it’s installed, and the user might not be able to modify it after it’s been installed. So really, your flexibility is to generate the file before creating the sdist, or to have the sdist->wheel step do the generation. (Exactly how that looks is going to depend on the build backend you choose.)
The Python packaging terminology and model can be a bit
counter-intuitive for those coming from more traditional C-based
(make/autotools) projects. There are effectively two kinds of
packages:
Source distribution (sdist) tarballs are like running make dist
and can then be used by Python package management tools to
automatically build your software on the end user’s system before
installing the result. These effectively have the full flexibility
of running whatever you want during the build step.
Pre-built “wheel” packages are more like doing make and then make install into a temporary location and zipping that up, so that
package management tools merely need to unpack them directly onto
the user’s system. Wheel distribution does not support running
arbitrary routines at install time.
There has been a steady trend in recent years toward pre-built wheel
packages, because they’re easy to audit and users don’t have to
worry about them running unexpected payloads as a part of installing
them. The down-side is that they’re highly dependent on the details
of the target environment, which you mostly need to be able to
predict (and perhaps cross-build for or embed copies of additional
libraries/dependencies to support).
From my point of view the thing with sdist is that it MUST be cross platform, and when creating the sdist I pre-compile, pre-generate, do as much preliminary work as possible while still being cross-platform.
Thank you for all the useful information for me to take in.
Is there or should there be conceptual distinction between what tends to be offered on e.g. Github vs. what the sdist package would contain? I mean, I observe a lot of very popular Python software “lives” as Github projects, for all manner of reasons (ease of contribution etc), and I plan to do the same, but how would/could what is stored in an “sdist” be different from what is otherwise stored in the corresponding repository on Github?
For example, one might use the repository somehow to build the source distribution, then the source distribution can be used to build the “wheel”, is that a sensible or typical approach? Alternatively, if one should strive to basically contain in the source distribution something that is as close as possible to the original source code, then the source distribution will probably necessarily match the repository contents, no?
In the sdist typically you wouldn’t have files like .gitignore, the configuration files of things like linters, formatters, and so on, things that are not needed to build the wheels. There is debate if sdist should contain the tests and the docs. And as I said, from my point of view if there are things that you can build before creating the sdist then you should put them in built form in the sdist.
Opinions on this topic abound, and there is no one “correct”
opinion, just arguments for and against certain choices.
Some projects see their version control systems, e.g. Git (GitHub is
merely a place to host such repositories after all, and far from the
only place at that), as the official way they distribute source code
to downstream consumers while others view it as an implementation
detail of their developer workflow.
As such, some projects want any sdist tarballs to be as close as
possible to what they serve up from version control or may even want
to avoid publishing sdists entirely, while others add files to their
sdists which aren’t present in version control or are generated from
version control metadata and prefer downstream consumers treat that
as their official source distribution.
Many projects also may not have strong opinions and just do
something in between those extremes, perhaps not actually caring
about distributing source and at best see it as only a means to an
end (not all that uncommon with languages like Python where the
distinction between “source code” and “script” is often a fuzzy
one).
Quite often there’s very little difference (it may be helpful to know that sdists existed well before Git, let alone GitHub, were ever conceived, which is why they are quite similar here). However, a lot of users prefer to build from source, and like to treat what’s on PyPI as the “true” source, rather than having to inspect each package to find out where its repository is. So even if it’s a plain copy, it can still be very important and useful.
But it is possible and largely acceptable to preprocess your code before creating the sdist, if there is a need for code generation that can’t be easily done by someone trying to build from source (for example, I don’t include Cython in this, because it’s generally better to specify it as a build-time dependency so that whoever is installing gets an up-to-date version). But sometimes there is value in processing parts of your code first.
One important difference is that an sdist has a version, while your Git repository does not. So at the very least, creating an sdist officially says that “this set of sources are for this version of my project”. It also should include a pyproject.toml so that tools know how to build it - this is not required for a Git repo[1] - and either a PKG-INFO or pyproject.toml sections providing Python-specific metadata.
I’m sure a bunch of people will now angrily tell me that it is, but it’s not required ↩︎
I’d recommend you try development workflow tools like Hatch, PDM, and Poetry. Create a project with each and build the sdist and the wheel. Compare the contents of source tree vs. sdist vs. wheel. They have sensible defaults and usually do the “right thing” (as others have written what the “right thing” is might differ from dev to dev, team to team, project to project, and you would likely notice some differences in the behaviors of those dev workflow tools).