Dynamic versions in editable installations

Some thoughts come to mind in catching up on this thread (thanks for the ping @CAM-Gerlach!).

First off, the stdlib documentation for importlib.metadata has room for a lot of improvements. It shouldn’t be called Using importlib.metadata. The subject is probably complex enough for a Diataxis-like approach, with much clearer separation between how-to and reference guides. It also needs to be much more precise in its language, defining the basic terminology and objects involved, etc. For example, what does “package” refer to in this sentence, an importable package/module, or a distribution?

Let’s say you wanted to get the version string for a package you’ve installed using pip .

I don’t even think “distribution” is defined in the docs.

It seems to me that maybe we need a higher level API which helps with this distinction. It’s too easy to not fully understand the differences between importable packages and distribution packages, and it’s also easy to get misled when maybe the majority of your use of importlib.metadata.version() gives you what you want, then you get an error, and now you don’t trust the API so you fall back to package.__version__. It would be subtle and take some thinking to come up with the right abstraction and API.

Using __version__ certainly has substantial value in specific scenarios, such as troubleshooting, development/editable installs, and quick interactive use.

I think we need to keep in mind there are two use cases for __version__. One is on the producer side (more below) and one is on the consumer side (i.e. inspecting and acting on import foo; foo.__version__. My recommendations above about writing better docs and an API is for the consumer side.

This is a totally understandable confusion, given the lack of definition in the stdlib docs as mentioned above.

I do this, and then pull this into the distribution version via my pyproject.toml:

[project]
dynamic = ['version']
[tool.pdm]
version = {from = 'src/public/__init__.py'}

I mostly like this, except that I do sometimes forget to bump __init__.py. I also sometimes forget to git tag my releases, so I’m not sure that approach would be any better :stuck_out_tongue_winking_eye:

PEP 396 was written in a different era so it shouldn’t be revived. But maybe analyzing the problem in terms of API consumers vs library producers would lead to a better understanding about what, if anything still needs to be done to make the overall experience easier to understand and use.

2 Likes

Do you mean the concept or the exact format and wording?

For sure the wording, but also the focus. For example, the difference between distribution version number and module version number isn’t clearly spelled out. The use of setup() is outdated. The difference between a consumer version number and a producer version number, isn’t described. And I think most importantly, what are version numbers for? The “Deriving” section is also quite outdated.

The actual specification section, for describing __version__ still makes sense to me.

Pinging @jaraco as primary maintainer of importlib_metadata

Indeed—the current content is basically a series of short how-to guides with a dash of tutorial mixed in; there appears to be little to no true “Reference” content (per Diataxis) and no real API documentation at all, which seems to be a pretty serious omission.

The importlib_metadata RTD docs do contain a basic API reference as well as a pkg_resources migration guide, in addition to the “Using importlib.metadata” guide mirrored in the stdlib reference. At least the former could be moved into the stdlib (preferably for all maintained Python versions, given it was declared non-provisional in 3.10), either as one of two top-level sections in the existing document (cleanly separating the tutorial/how-to material from the reference), or a separate one.

Yup, there’s already enough confusion surrounding import vs.distribution packages (this thread being but one example); especially in a module for which the distinction is critical. “Distribution” should always be used to mean “distribution package”, “import packages” called such to avoid ambiguity, and a clear and explicit distinction drawn between the two.

At the very least, it can explicitly link to the existing canonical PyPA glossary definition which describes the difference, and also make an explicit of note of how this affects the returned results in ways many users may not expect, both at the top and in the docs for the package_distributions function, for which the distinction is fundamental to the function’s purpose.

I find having an explicit release checklist with easily-copyable step by step commands to execute each release, or automating the process with e.g. a GitHub action or workflow tool, to be a great solution to this—at least for me, its hard to see now how I could unintentionally make a release without either of these, since setting and tagging the released version is central to what it means to me to make a release, more so than just building and uploading some artifacts.

1 Like

Yep, we have blue set up to automatically release new versions on git tag and that seems like the best approach. I just haven’t yet set things up similarly for other packages I host on GitLab.

2 Likes

The main reason the importlib_metadata docs include API docs and the importlib.metadata docs do not is because to the best of my knowledge CPython doesn’t have the autodoc extension, so can’t utilize the API docs that are already authored in the code and published in the importlib_metadata docs. I’m disinterested in manually copying the docs from one source to another and then maintaining consistency. If someone could figure out how to add autodoc to CPython, it would be trivial to reflect the API. In lieu of that, the best the user gets is a link to the source where they can read the API from source. If someone is interested in authoring and maintaining separate API docs, that might be acceptable.

These seem like good contributions. If you can make them at importlib_metadata, they’ll get merged into CPython as well.

I find that having such a checklist is a maintenance burden in itself and doesn’t scale to dozens of packages. Having automation might help, although if part of the release process is to manually edit files and commit them to the repository, it becomes difficult to generalize that concept, so most fall back to manual processes.

At least for projects based on jaraco/skeleton, which employ the single sourced version, it’s impossible to have a version mismatched. If tagged with a version, a checkout of that revision, and every version of that package downstream, has that version. It doesn’t matter whether it was tagged and pushed from a checkout or tagged in Github - creating the tag is the release trigger. I’d still like to figure out how to manage better release notes automation and bump inference (what’s the next version based on what changes were made).

I know a lot of people are generally allergic to using
pbr · PyPI but it has a mechanism for inferring
your next likely (lowest possible) version number from structured
commit message footers and formulating PEP 440 dev versions based on
that, assuming the project approximately follows SemVer anyway. I’m
certainly not suggesting you use it, but you could look at the
implementation it has and crib some ideas from there if they suit
your needs.

In OpenStack, this is how all our release automation works too,
basically. We take it a step further and have automated tagging as
well. A maintainer pushes a review (like a pull request) to add some
metadata to a central releases repo, which the release managers
review, and once that’s approved and merges it kicks automation
which pushes Git tags signed with the release key. The tags getting
pushed fires an event which triggers building of release artifacts
that then get uploaded automatically to PyPI. Rube Goldberg would be
proud.

Oh, I also meant to mention, we use reno · PyPI
for maintaining release notes. It has mechanisms to inspect your
branch history in order to work out which subsequent tags a release
note belongs to.

Hmm, I never knew those docs existed. I guess it’s your decidion as maintainer, but I feel that because importlib.metadata is presented as the canonical version, with importlib_metadata being the backport for older Python versions[1] it is reasonable for people to expect to find full information in the stdlib docs.

If as you say, technical restrictions make that difficult to do, then would it be possible to at the very least note in the stdlib documentation that API documentation is held in the module docstrings, and can be referenced either via the built in Python help command, or in the importlib_metadata documentation (with a link to that documentation)? I’d be happy to submit a PR for the (stdlib) docs if that’s acceptable.

That would both help people to get access to the information they need, and at least partially address the complaints that the importlib.metadata is unhelpful.


  1. If even that’s not true, then that’s something else that should be communicated better. ↩︎

I have been wanting this for a long time, I can do it if there isn’t a better solution, and you are okay with it.

1 Like

I hope I didn’t come across as excessively critical above—re-reading what I wrote now in context, it does sound rather harsher than I intended.

Ah, I see, thanks. As I seem to recall, the reasons are not just technical in nature but also historical, but I’m not so sure of the history there myself. I’ve brought this to the attention of the rest of the docs team folks to see if they have some input on this. Like you, I’d hate to duplicate effort on this, but if there are reasons to not use Sphinx-Autodoc here, I’d be willing to help @FFY00 with that effort, completing his subject matter expertise with technical writing experience.

In the meantime, if you are okay with adding importlib_metadata to your own intersphinx mapping (or your skeleton, as appropriate), and CPython did the same when its docs were updated from importlib_metadata’s, I could submit a PR to make the various references to the functions discussed within using.rst intershpinx links, which would make both your docs and the CPython version much more useful without requiring any extra effort when syncing the file contents, since they would work identically both places.

Sure, happy to help. I’ve gone ahead and submitted an issue, python/importlib-metadata#398 and a PR, python/importlib-metadata#399 to do so.

Sure, which is why I also suggest automation such as GItHub actions (e.g. triggered on tags, similar to what you describe). Of course, it is much better to have a standardized checklist than a completely ad hoc release process in which omitting a crucial step is a non-extraordinary occurrence, which was the case I was responding to (which also seemed to be a fairly small scale).

There already is a See Also section at the top of the stdlib docs linking to the importlib-metadata docs. I opened another issue, python/importlib_metadata#400, and PR, python/importlib_metadata#401, to note specifically that the API docs can be found there, at least for the time being.

Hopefully there is, and I’ve asked the Python docs community for suggestions, but if not I’d be willing to help as well on the technical writing/docs side.

1 Like

It’s difficult to describe the exact situation, but I’d not characterize importlib_metadata as simply a backport. It’s also a forward port. It presents new features and functionality quickly and then ports those into CPython at its slower development pace, so importlib_metadata (and _resources) is frequently ahead of any version available in Python.

That might be the right thing to do. The importlib.resources docs do point directly to the importlib_resources docs to give discoverability of that additional content. The importlib.metadata docs are more in sync with importlib_metadata docs, though, so it may or may not make sense.

To see what files are kept in sync, see the cpython branch of importlib_metadata.

Just to mention, as I noted on the PR making them, the SC-proposed changes in Python’s stdlib module policy could have some impact on the situation for these two packages.

In fact, there already is such a link, and I’ve opened a PR to update it to explicitly mention the API reference being there for the time being (which is a quite an important portion missing for now, unless/until we figure out what to do about it), as well as proposed some ideas for a longer-term solution.

Are you saying there is a link from the stdlib documentation for importlib.metadata (which is here) to the importlib_metadata documentation (here)? Because if you are, I can’t see it - can you tell me exactly where I should be looking?

Assuming I’ve just missed it, I’d still argue it’s not very obvious :slightly_smiling_face:

Also, it’s pretty hard to determine what features are in which Python version. The importlib-metadata PyPI page has a table matching Python version and importlib-metadata version, but as far as I can see, there’s no way to read the documentation for those older versions (readthedocs only has “stable” and “latest”).

Ideally, if we cannot make the stdlib docs complete, we should at least link from each Python version’s stdlib docs, to a stable URL for the specific version of importlib-metadata relevant for that Python version. Otherwise, it’s an exercise in guesswork and trying things out to use importlib.metadata in code that has to support multiple Python versions. (Although given that the library was provisional until 3.10, that’s to be expected for older versions to a certain extent - but we should avoid it becoming an issue in newer versions).

1 Like

Yes—it looks like it was only added in the current 3.12+ docs (that I linked), not the >=3.11 docs (that you linked), which explains why one of us saw it and one of us didn’t :slight_smile: I assume its worth backporting to the older branches (at least as far as we can, to 3.10 presumably, which as IIRC contains more or less close the current non-provisional API). That might be most conveniently done on the PR which syncs the changes I’ve made in the linked PRs, which relate to it and should also be backport-worthy.

Yup, that’s definitely a concern, and one I believe I also brought up somewhere (though apparently not on this specific thread, as far as I can easily find). In addition to that, while the “using” page on the CPython version does have versionadded/versionchanged notes, the importlib_metadata version does not, and nor does importlib_metadata API docs contain this information either (which admittedly is less critical for projects using the PyPI version, since they can just check the changelog and set their minimum requirement accordingly).

That would be one significant motivation to adding API docs on the CPython side one way or another (starting with the 3.10 non-provision and still in bugfix version, at least, and going forward from there).

Yeah—since 3.10 is the last version still in bugfix, now would be the time to do so.

To be clear, I’d love to see the importlib.metadata docs improved to include full reference materials including API details - at the moment, as the title says, it’s basically just a usage guide. But I don’t have the time to offer to work on that, so I’m in no position to make demands (and as @jaraco is the maintainer, it’s ultimately his call).

Also, while we’re discussing this, what’s up with the importlib.resources documentation? It’s spread across 3 sections, which look like they could have been subsections of a main importlib.resources section. Is that just an error in putting the structure at the wrong level, or is there something more fundamental?

If we can’t come up with a better solution, @FFY00 offered above to take the lead on adding and maintaining the API docs in CPython’s repo, and I offered to assist on the technical writing and docs side of things.

I can’t speak for the actual intent, of course, but the Deprecated Functions section is actually a section of the same document as the main importlib.resources module, but the heading is on the equal level of a standalone top-level document, because the top-level heading also uses - instead of the standard double = (or even single =), which appears somewhat anomoulous (particularly since the title is not standalone). I would think it would make sense to simply move the title up to the correct level, following the docs style guide, which would fix that issue—I can propose a PR unless there’s a clear reason for the current status quo. By contrast, importlib.resources.abc is a submodule, and at least by the general convention of the docs usually justifies its own top-level section (two other submodules, simple and readers, appear to be undocumented).

Interestingly, for importlib.resources, the CPython library reference contains the API docs, whereas the using and migration guides are on the importlib_resources site (with no duplication between them), whereas for importlib.metadata, the CPython docs instead contains the Using guide (which would be perhaps better off split into a Tutorial, How-To and/or Explanation) whereas the importlib_metadata site contains the API docs and Migration Guide alongside an upstream mirror of the Usage guide. The former certainly seems like a more appropriate situation, though I’m sure both could benefit from improvements.

Maybe @jaraco has some further insight?

Some of these points were discussed in Better docs for importlib.resources · Issue #93610 · python/cpython · GitHub and Improving the documentation · Issue #240 · python/importlib_resources · GitHub.

1 Like

This discussion deviated somewhat from the title.

Is there interest in writing a PEP for always-fresh metadata in editable installs?

Since editable installs are since no longer purely implementation-defined hacks since PEP 660, it sticks out as a flaw that they have static metadata that doesn’t match what the build backend would generate.

In the PEP, the description for prepare_metadata_for_build_editable contains:

The hook MAY also create other files inside this directory, and a build frontend MUST preserve, but otherwise ignore, such files; the intention here is that in cases where the metadata depends on build-time decisions, the build backend may need to record these decisions in some convenient format for re-use by the actual wheel-building step.

One option to solve this could be that a file stored there, defined in a follow up PEP, could e.g. define “volatile” metadata fields that have to be re-calculated on access, as well as some runtime hook that would provide the metadata. E.g. the file could look like this

# TOML mapping with the same keys as `pyproject.toml`’s `project` table
# values are references to string variables or callables returning strings
version = 'my_pkg:get_version'

Of course that’s just a first idea, maybe y’all have much better ones!

1 Like