Please make `package.__version__` go away!

On reading dynamic-versioning-at-build-time I came across the packaging guide’s suggested approach to keeping the package’s .__version__ attribute in sync with that specified in the pyproject.toml/setup.cfg/setup.py. To me, this raises a question of why are we still encouraging the setting of __version__ at all?

PEP 396 was rejected because __version__ is superseded. Nobody should be setting __version__ because nobody should be consuming __version__ – they should all be using importlib.metadata.version().

This misplaced need to set a __version__ leads packages to a vast array of overcomplicated and breakable concoctions such as:

  • Making setup.py write the version into package/version.py then have package/__init__.py import it from package/version.py
  • Making setup.py write a package/version.txt and usually creating at least one release that was defunct because the .gitignore-ed version.txt was omitted from the .whls
  • Trying to make setup.py import the version from the package and causing at least most of the runtime dependencies to become build time dependencies
  • Various regex driven tools that bump both version at the same time (e.g. bump2version)
  • The approach prescribed in the packaging docs of having setup.py go regex fishing through package/__init__.py

These are the ones that I am aware of. There will be more. In most cases, I noticed them because they caused their package to break (all for one string that I almost never care about).

Alternatively, many packages have opted to use:

__version__ = importlib.metadata.version(__name__)

(where the abstraction of using __name__ instead of a hard coded string is technically a bug since version() expects a distribution name instead of a package). This is an improvement in that it gets rid of all the build time acrobatics but is still anti-pattern. Either a dependent project cares about the version, in which case there’s no benefit to their not just using importlib.metadata.version() to query it themselves, or the version is not needed in which case it’s just a waste of startup time.

Assuming that it’s not just me who is frustrated by this, could we, instead of suggesting ways to single source the version field, remind people that importlib.metadata.version() exists, that it makes __version__ redundant and advise them not to set it? And then, as developers, could we practice as preached and make a point of not setting __version__ for any new projects we release?

9 Likes

__version__ is guaranteed to be tied to an import package, and specifically the import package you are actually using right now.

importlib.metadata.version has to refer to distribution package and AFAIK you can’t guarantee that it corresponds to the import package you get.

As long as it’s possible for these two to diverge because of conflicting packages, I am personally reluctant to agree that the removal of __version__ is a good idea. It existing allows a quick step to check that what you are importing is actually what you are expecting to import.

A practical example where I have encountered this is a package where I am a co maintainer. It’s import name has always been lark, but we havn’t always had that as the pypi name so it used to be lark-parser. Now we have lark, but we havn’t taken lark-parser offline to not break downstream packages. A few times we get confusing bug reports and asking them to execute import lark; print(lark.__version__) is a quick and easy way to check for this issue.

10 Likes

This also assumes that people are always installing the library. Sometimes, libraries get vendored for various reasons, and library authors accepting bug reports for these are still going to want to know what version was vendored and may want runtime access to that when generating exceptions or user facing errors.

7 Likes

Modern versions of setuptools support reading versions from arbitrary attributes and it’s trivial, so I wouldn’t call it “overcomplicated and breakable concoction”: Configuring setuptools using setup.cfg files - setuptools 71.0.1.post20240718 documentation

flit-core supports reading __version__ from the package as well.

It provides a clean solution to have a single working version trivially, whether you’re importing an installed package, or running from a checkout, or the package unpacked/copied anywhere. And it’s definitely less “complicated” (and faster) than reading the version from metadata — and unlike that, it works in the latter scenarios without employing “overcomplicated and breakable concoctions” to make it work when the package isn’t installed.

4 Likes

I tend to use this. But if import time is important to you (for example, for a CLI that needs quick start up time) you can save some 34 ms using a hardcoded __version__ string.

Yep, this is one option – let the end user call importlib.metadata.version() if and when they need it. It is less ergonomic than __version__, and a CLI might want to provide a --version flag, especially for users who are less familiar with Python and things like pip freeze | grep <name>.

4 Likes

I thought most of importlib.metadata scales with the number of installed packages? (Essentially performing a linear search of your installed *.dist-info packages) So it may be worse than 34ms.

2 Likes

I assume that’s just the import overhead (on their machine), though I believe it’s improved in 3.13.

34 ms sounds enormous, how did you get such a number?

Here is an example comparison here (Python 3.10, local SSD, inside a conda environment):

>>> import numpy as np
>>> import pyarrow as pa

>>> %timeit np.__version__
30.2 ns ± 0.0661 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit importlib.metadata.version('numpy')
335 μs ± 6.32 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

>>> %timeit pa.__version__
28.6 ns ± 0.113 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit importlib.metadata.version('pyarrow')
225 μs ± 753 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

importlib.metadata.version() is still considerably slower than attribute lookup, but not so much as to make a difference for CLI startup. You shouldn’t call it repeatedly, though: cache the result in a global variable instead, or perhaps wrap it inside a lru_cache.

Running this with Python 3.13.0b3:

python -X importtime -c "from norwegianblue import __version__" 2> import.log && tuna import.log

Then hardcoding __version__ = "0.17.1.dev1" here, removing the import and comparing the total import times (of the second runs).

1 Like

Your timings use timeit which is a poor measure of anything that involves any kind of caching. I don’t know whether importlib caches these things but I would guess something like that is happening if your time shows it being 0.2 milliseconds.

I just compared total process time with the time command in the shell for an empty file and one that uses importlib to check a version. By my measure importing importlib and calling version adds about 20 milliseconds to total process time. It takes 20 milliseconds to run an empty file and 40 milliseconds to run one that uses importlib.metadata.version("numpy"):

$ time python importlib_test.py
python importlib_test.py  0.04s user 0.02s system 45% cpu 0.136 total
$ time python empty.py
python empty.py  0.02s user 0.02s system 35% cpu 0.121 total

I haven’t tried to be more precise but the timing difference is reproducible across repeated runs.

2 Likes

Well, not exactly – it was rejected because the core devs thought it best to defer to PyPA (or, more correctly, the folks focusing on packaging standards) I interpreted that as meaning that the core devs didn’t want impose something on the packaging folks – not that they thought __version__ was necessarily a bad idea.

It does say: " The packaging ecosystem has changed significantly in the intervening years since this PEP was first written, and APIs such as importlib.metadata.version() [11] provide for a much better experience."

So I suppose it is endorsing that – but whether importlib.metadata.version() really provides a better experience or not is debatable - -as we can see in this thread, it’s not necessarily better.

The fact is that when a PEP is rejected, it’s rejected, none of us take much time arguing about exactly why it was rejected, and whether that’s consensus or not.

Frankly, when Barry rejected it, I thought of trying to revive it, with some modification, but didn’t have the energy. In fact, it was rejected in response to me asking that it be finally ruled on – it had been languishing for ages.

The PEP was also complicated by the fact that it was making suggestions for the standard library as well – which were more controversial.

Anyway – for now:

Frankly, performance is my least concern, but the facts are:

  1. A LOT of packages use currently __version__, and have for years (decades?)

  2. importlib.metadata.version() is not a “better experience” for many use-cases[1].

  3. We have the tools – it’s just not that hard to provide a __version__ attribute, either hard-coding it, or by calling importlib.metadata.version().

So why not provide that easy and simple API [*] – and one that has a slightly different, but important meaning, as mentioned in this thread: a_module.__version__ is the version of the actual module that was imported – whether it has a different name than the distribution, whether it was installed as a “proper” package, etc.

So what I’d like to see as an official recommendation[2]:

If you have a version attribute in a module it should be called __version__. We really don’t need VERSION and who knows what else out there as well.

if you provide a __version__ attribute it must be in sync with what importlib.metadata.version() would provide, if it’s functional.

It is highly recommended that you use the tools to single source the version and to keep everything in sync.

[1] I really fail to see how:

import some_package
import importlib.metadata

print(importlib.metadata.version('some_package'))

is a “better experience” than:

import some_package
print(some_package.__version__)

never mind:

In [13]: import numpy as np

In [14]: import importlib.metadata

In [15]: importlib.metadata.version('np')
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
File ~/miniforge3/envs/py3/lib/python3.12/importlib/metadata/__init__.py:397, in Distribution.from_name(cls, name)
    396 try:
--> 397     return next(cls.discover(name=name))
    398 except StopIteration:

StopIteration: 

During handling of the above exception, another exception occurred:

PackageNotFoundError                      Traceback (most recent call last)
Cell In[15], line 1
----> 1 importlib.metadata.version('np')

File ~/miniforge3/envs/py3/lib/python3.12/importlib/metadata/__init__.py:889, in version(distribution_name)
    882 def version(distribution_name):
    883     """Get the version string for the named package.
    884 
    885     :param distribution_name: The name of the distribution package to query.
    886     :return: The version string for the package as defined in the package's
    887         "Version" metadata key.
    888     """
--> 889     return distribution(distribution_name).version

File ~/miniforge3/envs/py3/lib/python3.12/importlib/metadata/__init__.py:862, in distribution(distribution_name)
    856 def distribution(distribution_name):
    857     """Get the ``Distribution`` instance for the named package.
    858 
    859     :param distribution_name: The name of the distribution package as a string.
    860     :return: A ``Distribution`` instance (or subclass thereof).
    861     """
--> 862     return Distribution.from_name(distribution_name)

File ~/miniforge3/envs/py3/lib/python3.12/importlib/metadata/__init__.py:399, in Distribution.from_name(cls, name)
    397     return next(cls.discover(name=name))
    398 except StopIteration:
--> 399     raise PackageNotFoundError(name)

PackageNotFoundError: No package metadata was found for np

In [16]: importlib.metadata.version('numpy')
Out[16]: '2.0.0'

NOTE: I use the iPython example to make the point that interactive use IS an important use case.

Or maybe worse:

import bs4
import importlib.metadata

print(importlib.metadata)

OOPS:

PackageNotFoundError: No package metadata was found for bs4

What the heck is the thing called ??? Oh yeah:

> importlib.metadata.version('beautifulsoup4')
 > '4.12.3'

They key point is: we should make things easy and intuitive for all users: System developers; interactive data analysts; and quicky script writers, etc.

AND we should provide the tools for package builders to do that easily which, for the most part, is done.

[2] If I were king, I’d say __version__ should always be provided (at the top level import) – but I think I’ve lost that battle before it starts :slight_smile:

6 Likes

By “if it’s functional” do you mean something like “If importlib would be reporting on the same copy of the module that was actually imported”?

Asking for pip show lark lark-parser (or the importlib.metadata.version() equivalent if you’re worried about xkcd-1987) would answer that question better since it’s explicit about which variant is installed.

Modern versions of setuptools support reading versions from arbitrary attributes

It is an improvement. One thing I really don’t like about it though is that the canonical definition is not the pyproject.toml so we throw away cross build system machine readability and, as far as I see it, all the good stuff that PEP 621 – Storing project metadata in pyproject.toml | peps.python.org could have provided. AFAIK, setuptools doesn’t expose any get_project_declared_version() functionality so the world of tools reading package metadata is now in a worse position than when python setup.py --version was not deprecated.

You only need to query the version if --version is actually requested. Normal usage doesn’t need to know.

In my tests, almost all of the added time is in importing importlib.metadata itself. The lookup time is so small I can’t even see it beneath the noise floor.

Fair enough. It was what the PEP said at least (and it certainly resonated with how I felt).

The user did have to figure out the distribution vs package name mismatch once to pip install the package so I wouldn’t give them too little credit. If this is a problem then I think we need to be pushing towards a policy of package_name == distribution_name (which I’d be in favour of – it amazes me that no one has taken the opportunity to create a malicious package and upload it to PyPI under the name dateutil) since that’s the real cause of confusion.

I’d also hope that promoting importlib.metadata.version() might also implicitly raise awareness of what else is in that package so that I see less cases of using pip freeze or pip show in a subprocess.

Well, that could be explained as: importlib.metadata is new, older packages couldn’t lean on something that didn’t exist and now that it does, removing __version__ would be a breaking change (and, being an attribute, not one easy to deprecate), and then new projects just copy the status quo.


It’s already looking like I picked a loosing battle here but, as a less drastic request, could we at least tuck these not so good ways somewhere out of sight and make the tool.setuptools.dynamic.version.attr = "package.__version__" technique look like less of a footnote? At least that one can only fail at build time so unless the packager is using sdists, a bug can’t get onto PyPI.

What I was thinking of is that the module may not be installed as a proper package at all, and thus importlib wouldn’t report anything. But I suppose yes, that’s another interpretation too. Not well worded – we’d need to find better words if we want to make this “official” :slight_smile:

All this is tied a bit to the perennial confusion between “package” and “distribution” – the words are too often confused. The importlib.metadata.version docstring isn’t bad:

Signature: importlib.metadata.version(distribution_name)
Docstring:
Get the version string for the named package.

:param distribution_name: The name of the distribution package to query.
:return: The version string for the package as defined in the package's "Version" metadata key.
File:      ~/miniforge3/envs/py3/lib/python3.12/importlib/metadata/__init__.py
Type:      function

Though maybe “Get the version string for the named package” should be replaces with “Get the version string for the named distribution”.
and
‘as defined in the package’s “Version” metadata key’
with
‘as defined in the distribution’s “Version” metadata key’

Now that I’ve said that, maybe that’s how we could be more clear about why we might want both __version__ and importlib.metadata.version:

Something like:

A package (or module)'s __version__ attribute provides the version of the imported package.

importlib.metadata.version provides the version of an installed distribution.

3 Likes

I put in a PR for that page a while back, and it got so tangled up in the discussion of what the “best” way to do it was (i.e. this discussion), it never got merged. And I can’t find the PR now – closed due to being stale?

NOTE: the page has been updated since then, but still has the “wrong” way at the top – in my PR I had a section at the bottom “no longer recommended approaches”. At least then, I thought it was good to keep them documented, there’s a lot of old code out there. But maybe it’s time to simple delete it.

Personally, I think the standard for merging a doc change should be more:

“is this better that what was there?”

than

“Is this the best we can do?”

– but what can you do?

And I’d think that we’d all agree now that the
“Read the file in setup.py and get the version.”

approach that’s at the top of that page is NOT recommended these days :frowning:

I’d hope a new PR would be accepted…

4 Likes

I like this, and it’s mostly the case, but I think it’s impossible to enforce it, for multiple reasons:

  1. distributions need to have a unique name on PyPI, and no one is regulating that. For example:

I have a package[*] I’ve been maintaining for years – it’s a pain in the #%$# to build (at least on Windows and Mac), so it wasn’t very helpful to put a source-only dist on PyPI. But I did make it available via conda-forge.

It turns out someone is using the py-gd name on PyPI already (for a small, doesn’t seem to be maintained package) – so I can’t use the import name as the distribution name :frowning:

The unmanaged namespace of PyPI is a big ol’ pain, but it’s what we’ve got. Oh well.

  1. A distribution can install more than one importable package.

and probably more …

[*] py_gd GitHub - NOAA-ORR-ERD/py_gd: python wrappers for libgd graphics drawing lib

Ooh, now there’s another if I were king wish. I vividly remember questioning my sanity when I couldn’t get pip install pkg_resources to work. Then over the next few years, people coming to me with the same confusion. To this day, I still don’t understand why is was part of setuptools

Well, Python documentation folks are generally aligned with the Diátaxis model (I’m not sure if there’s a precise statement of commitment, but then the model itself kind of discourages too much precision, in favor of just making progress)… and an important concept is to keep making small incremental improvements. So I’d sure hope we don’t get hung up on “finding the best before making an update” over making those improvements. So I, at least, agree with Chris on what the standard should be for doc improvements…

3 Likes

Contribute to this guide - Python Packaging User Guide says:

The project aspires to follow the Diátaxis process for creating quality documentation.

Style guide says for the main Python docs:

Python’s documentation strives to follow the Diátaxis framework.