Dynamic versions in editable installations

pf_moore · April 25, 2022, 9:45am

The (importable) package name and the project name need not be the same - for example, pkg_resources is made available by the project setuptools. So you should be checking the version of setuptools, not of pkg_resources. And yes, there’s no automatic way of getting the project name from the importable file, so it needs to be hard coded. But what’s wrong with that? I’m assuming this is adhoc code investigating why something has gone wrong, so hard coding is fine, surely?

Do whatever suits you, though.

abravalheri · April 25, 2022, 10:08am

By reading the BPOs it would seem that __spec__.name is a suitable replacement isn’t it? (Maybe it does not work 100%, but for installed packages being invoked via console_scripts entry-points or via python -m it should work…)

pf_moore · April 25, 2022, 10:23am

>>> pkg_resources.__spec__.name
'pkg_resources'
>>> importlib.metadata.version('pkg_resources')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\importlib\metadata\__init__.py", line 984, in version
    return distribution(distribution_name).version
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\importlib\metadata\__init__.py", line 957, in distribution
    return Distribution.from_name(distribution_name)
  File "C:\Users\Gustav\AppData\Local\Programs\Python\Python310\lib\importlib\metadata\__init__.py", line 548, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for pkg_resources

abravalheri · April 25, 2022, 10:31am

Ah, OK, I understand now what you mean, sorry for the noise. Thank you very much Paul.

CAM-Gerlach · April 25, 2022, 10:48am

Sure, nothing wrong with that at all, particularly for this use case—I was mostly just pointing that out as a sidenote on the code in the OP and another caveat to note when relying on such, and should have mentioned the important point you raise that the code also breaks if the name of the distribution package and import package were not identical.

That gets the name of the import package, I’d presume, not the distribution package (as @pf_moore points out).

takluyver · April 26, 2022, 4:27pm

A few notes on __version__ from my perspective:

As @blink1073 pointed out, Flit (optionally) looks for this to get the version when it’s building a package.
I think the pkg.__version__ convention is still pretty strong in the scientific Python world. We tend to be fairly slow moving with these things.
I use editable installs quite a bit, so I still see significant value in having a version number from the code I imported, not from whenever I did an editable install.
- Editable installs are kind of a hack, and have major limitations. But when the limitations don’t apply, they’re a wonderfully convenient hack.
- Edit: It also means you can have a version number for code you run or import without installing it as a Python distribution at all - small scripts, modules placed next to the script or made available by PYTHONPATH or sys.path manipulations. Sometimes this is done for bad reasons, but these are valid things to do in various situations, and Python supports them.
I dislike packages getting their __version__ dynamically, whether it’s from package metadata or running git describe. And I dislike code that does different things depending on if it thinks it’s installed or in their source directory. I agree with @pf_moore that if you’re providing a __version__ attribute, it should be a constant string in the code. There are tools to help keep it up to date (I tend to use bump2version, mostly out of habit).

So I’m ambivalent about getting rid of them. I dislike slow imports as importlib.metadata scans all installed distributions, but I’d be a bit sad to lose meaningful information about editably installed packages.

tacaswell · June 21, 2022, 4:20pm

I agree with @takluyver that this pattern is very well established in the scientific Python world but go further think we should not remove them.

If I am checking the version of module in code it is almost always either to debug a broken environment (which as @CAM-Gerlach points about above may be a very broken mix of system packages, conda, and pip) in which case I do not trust the packaging system or I am doing version gating to work around a known bug / missing feature in which case I have the module in-hand and would like to just ask it its version.

A thing you can do with the __version__ convention (or anything that operates on the already imported module) is:

import sys


import matplotlib
import pandas
import scipy
import requests
# ...

for pkg, version in sorted(
    {
        k: getattr(v, "__version__", "N/A")
        for k, v in sys.modules.items()
        if "." not in k
    }.items()
):
    # could be clever and filter things from the standard library here
    print(f"{pkg:<25} {version}")

and get a really good diagnostic / watermarking tool. If I am reading this thread correctly there is no generic way to write this with importlib? Using pip list or similar is not quite the same as it is not automatically filtered to what is actually imported. It seems a shame to lose an easy way to get this information.

One way to deal with the version metadata for a develop installed package is to put is a sentinel that means “this is installed” and satisfy any version checks from the point of view of other packages. I think this solves the problem that the meta-data that package management system knows should be static, but acknowledges that if you are in a case where you have installed a packaging in “develop” mode you probably also want to override any (old) information other packages think they know about what versions they support. This can be particularly useful for maintainers of a library who are trying to test bug fixes against something that depends on the library but has put an upper cap on the version support due to the bug they are trying to fix.

One place I disagree with @takluyver is that I am a fan of projects that use git describe to sort out their version string when installed in a “develop” mode, but when “installed” should preresolve down to a string as part of the installation/build process.

I have always understood the git tag to be the ground truth for what a given release “is” with everything else (wheels, sdist, conda packages, .deb, .rpm, …) being strictly a derived artifacts. With that view, develop installed versions correctly (and dynamically) reporting their version (with git sha!) make a lot of sense to me.

fungi · June 21, 2022, 4:59pm

I may be misunderstanding what you’re suggesting, but I rely on
importlib.metadata.distributions() with Python 3.8 and newer to
provide a list of installed packages distribution objects, and then
access their metadata via the provided attrs such as version.

There is also a way to do the same with pkg_resources on older
Python (i.e. 3.7).

That said, I can’t recall trying to inspect editable/develop
installs in such a manner.

rgommers · June 21, 2022, 5:18pm

Another thing you can do is a simple evaluation during debugging, which is very useful and I for one end up doing a lot:

>>> np.__version__
'1.24.0.dev0+291.g2c5f407cf6'
>>> import importlib.metadata
>>> importlib.metadata.version("numpy")
'1.22.3'

importlib is wrong here, it doesn’t seem to understand PYTHONPATH. I don’t care too much that it’s less reliable here, since it’s completely unergonomical anyway for interactive use.

Agreed

brettcannon · June 21, 2022, 9:07pm

Depends on what you mean by generic:

for name in sys.modules:
    try:
        version = importlib.metadata.version(name)
    except importlib.metadata.PackageNotFoundError:
        continue
    else:
        print(f"{name:<25} {version}")

brettcannon · June 21, 2022, 9:09pm

Please open an issue if it’s wrong as it should be based on sys.path and finding the appropriate .dist-info directory.

tacaswell · June 21, 2022, 10:22pm

Depends on what you mean by generic:

Fair, but fixing the name inconsistencies between the the import and the package name is fundamentally a social (and likely intractable) problem whereas adding a __version__ string to the top level module (seems like it) is a very tractable technical problem.

The importlib approach just seems very round about. I have a reference to the thing I want the version of in my hand, I want to ask it what its version is! Even if the PYTHONPATH issue @rgommers identified was fixed, there would still be issues with in being mutated between the import and the importlib call.

I think relying solely on importlib to get packages versions requires assuming both that the only way to get an importable package is via an installation process and that the metadata associated with that installation is right. This thread was started talking about how to deal with editable installs where we are sure that the second assumption is not correct and as @takluyver there are other ways to get something importable (including editable installs, $PYTHONPATH hacking, (too) clever input hooks, just making a module object and putting it in sys.modules, or other exotic things (I have heard rumors of institutions that keep their code in a database and import foo does a query, gets the source, and builds the module)) so the first is not true in general either. You might say “Do not do those things!” which is a fair position, but those things still need a way to carry a version around.

Please forgive my ignorance, is there a way for a package to at import time update its metadata? That could also be a way out of this as all of the dynamic modules could be required to register them selves. It would also make sense to me to extend version so

import foo
import importlib
print(importlib.metadata.version(foo))

works as expected (maybe by first trying foo.__version__ internally ).

CAM-Gerlach · June 22, 2022, 7:23am

The missing piece is mapping import packages/modules to distribution packages, which might have different names and are not necessarily a 1:1 relationship. However, in importlib.metadata in Python 3.10, and in any recent importlib_metadata version, you can use packages_distributions() to get this mapping.

Or, much better, simply use importlib_metadata on all Python versions; there’s little reason to use the legacy pkg_resources at all anymore.

Just to note, I don’t know the details of your specific case, but there are is a substantial number of potential edge cases mixing pip and conda packages, particularly when involving multiple conda packages that map to the same PyPI distribution package (e.g. -base packages), where the conda, .dist-info and __version__ metadata (i.e. the actual import package) are inconsistent, and the latter is generally the closest source of truth to “what version am I actually importing and running”.

Brett Cannon:

Depends on what you mean by generic:

for name in sys.modules:
    try:
        version = importlib.metadata.version(name)
    except importlib.metadata.PackageNotFoundError:
        continue
    else:
        print(f"{name:<25} {version}")

To note, this doesn’t really work as is, since the names in sys.modules are the names of the import packages, while importlib.metadata.version operates on the names of the distribution packages, which could be entirely different (or not even 1:1).

However, packages_distributions() will translate between them, so (namespace packages aside), the following should work:

package_distribution_map = importlib.metadata.packages_distributions()
for import_name in sys.modules:
    try:
        dist_name = package_distribution_map[import_name][0]
    except KeyError:
        continue
    else:
        version = importlib.metadata.version(dist_name)
        print(f"{import_name:<25} {version}")

I’m not sure I follow. This is a purely technical problem, and a solved one—its a simple matter of looking up the import name in the dict returned by importlib.metadata.packages_distributions(), which yields the distribution package name.

However, I still agree with keeping a static __version__ attribute, for the reasons mentioned by @takluyver and myself.

Using __version__ certainly has substantial value in specific scenarios, such as troubleshooting, development/editable installs, and quick interactive use. However, when checking the version programmatically in production code, particularly for things like feature-flags, polyfills and the like, using a distribution package’s version from its metadata via importlib.metadata is strongly preferable, since it enforces a number of guarantees that hold for all versions of all packages (minus fixable bugs or an already broken environment):

it is always present, rather than depending on the whim of the package author
it is the version as uploaded to PyPI, installed by pip, required by dependencies, and pinned by lock files (presuming a non-broken environment, otherwise all bets for anything are off anyway), as opposed to an internal string that doesn’t necessarily correspond to anything other tools see;
It follows a standardized, interoperable format, rather than being an arbitrary string (or even something else)
It is stored and can be read statically, rather than relying on importing the package and executing code

Just to note, aside from editable installs, pretty much all those things are not really good ideas and don’t follow modern accepted packaging standards, practices and conventions, so its hard to see why nominally proscribing including a __version__ attribute would have any effect on those already highly non-standard use cases anyway.

Sure there is; you could, for example, include a small block of code in the package’s __init__.py that rewrites the Version key in the package’s METADATA in .dist-info to whatever is in __version__. But like a guide to bomb-making, I hesitate to provide any further instructions on how to do this lest some poor soul actually try and blow themselves (and much worse, others) up with it.

pf_moore · June 22, 2022, 10:20am

More explicitly, it’s absolutely not supported to modify package metadata in this way, either via code or by hand. Following the broad description here would break the install (by invalidating hashes of installer-controlled files), for example.

It’s no more supported to modify the version metadata than it is to update __init__.py at runtime to change a __version__ = "x.y" line and expect that to work…

That’s only partially true, IMO. importlib provides machinery to let you build all sorts of ways of getting importable packages. And importlib.metadata extends that machinery to allow you to expose metadata. Unfortunately, the ABCs importlib.metadata relies on aren’t documented, but that’s a problem for core Python to address.

It’s still, of course, possible to do low-level things like creating module objects and stuffing them into sys.modules. That’s why there’s a distinction between “(import) modules/packages” and “distribution packages” - importable things can exist without being associated with distributions.

The conflict here is that both importable packages and distributions can be versioned. And there’s no necessary reason why those versions have to be the same (even though often they are). You seem to be interested in importable package versions, not distribution versions. That’s not something the packaging community has considered in detail, because our focus is on distributions.

Maybe what’s needed here is a separate standard, unrelated to distribution metadata, which defines how to get the version of an importable package. That would be a Python language (informational) standard rather than a packaging one, IMO. It would make sense to me that such a standard should cover such questions as:

The obvious approach would be to allow packages/modules to have a special __version__ attribute. But there should probably be an API to read that attribute, to allow for fallbacks as noted below.
Many people prefer to “single source the version”, so the approach should support existing approaches to set the distribution version in only one place. One of which is to not provide a __version__ attribute in the module…
If a package is a top-level package associated with an installed distribution, the package’s version should be the same as the distribution version. And if the package doesn’t explicitly set __version__, the API should get the distribution version on the user’s behalf.
Some decision should be made about submodules - what happens if you ask for the version of (say) rich.progress? Should it return the version of rich? Is rich.progress even allowed to have a __version__? Before you answer that, consider namespace packages where foo.mod1 and foo.mod2 might be distributed independently.

I think the best way of making progress here is for someone to look at independently standardising “versions for importable Python modules”. There was an attempt to do this some years ago in PEP 396, but it was ultimately rejected in favour of distribution versions. Reviewing the discussions on that PEP would likely give some background. I’m also pretty sure there was a more recent discussion on the subject here on Discourse, but I can’t find it right now. A new PEP, explaining that there are different use cases for versions on importable modules as distinct from distributions, and presenting an import module based mechanism, could argue that the conclusion reached for PEP 396 doesn’t reflect the current reality.

Or, of course, we could simply accept the current mostly-OK status quo, and not try to over-engineer something just for the sake of some corner cases

tacaswell · June 22, 2022, 11:29pm

I agree with this and will try to find some co-authors at scipy in July (but also will not be in the slightest bit sad if someone does this without me )

Fair, but the consensus at the top of this thread seemed to be " mod.__version__ is a boondoggle from a previous age and should be removed from libraries" not "mod.__version__ solves a different problem and should " so I am not sure what the mostly-OK status quo is (and that everyone agrees on what it is).

Ah, thank you and I apologize for my opinions getting ahead of my knowledge.

ofek · June 23, 2022, 12:22am

If we want to read the room here, I’m strongly in favor of such a proposal. I always ship the version for use at runtime, just like I do for other languages that compile to a single executable.

pf_moore · June 23, 2022, 9:16am

My view (for what it’s worth) is that some people seem to find mod.__version__ useful, and so use it, but it’s only a convention and not everyone chooses to follow it, which is OK. I don’t have a use for module-level versions, so __version__ is not very important for me, although I do tend to set it (to the distribution version), just in case anyone wants it.

The original part of the discussion was about changing the distribution version (in the package metadata) without reinstalling (in editable installs), which I think is a mistake, and which will almost certainly cause more trouble than it solves.

rgommers · June 23, 2022, 2:59pm

Thanks @brettcannon. Done in importlib.metadata.version can return None · Issue #91216 · python/cpython · GitHub (an already open issue for importlib.metadata.version).

brettcannon · June 23, 2022, 10:23pm

Prior PEP art on this subject is PEP 396 – Module Version Numbers | peps.python.org .

CAM-Gerlach · June 23, 2022, 10:34pm

Pinging @barry in case he’s interested