Please make `package.version` go away!

ChrisBarker-NOAA · July 19, 2024, 3:27am

you found it – thanks!

Turn out I dropped the ball at the last comment:
" @ChrisBarker-NOAA Do you still intend to work on this PR? If not, I’d be willing to submit a redux PR (possibly more focused)."

So yes, I’m happy for someone to pick it up – sorry I got burned out and last track.

ofek · July 19, 2024, 3:37am

This is always preferable for situations where import time is costly, notably CLIs (as Hugo mentioned) and serverless applications.

wim · July 23, 2024, 2:44am

Interesting to see this come up again!

#1276 was my PR from a year ago, the say-no-to-__version__ branch grew merge conflicts since then but I’ve updated it after seeing this discussion.

Unfortunately @ChrisBarker-NOAA and myself seem to have mutually derailed eachothers PRs, so the terribly outdated “single-sourcing version” page survives another year.

The arguments for keeping around a __version__ attribute, here and on the issue tracker, are weak and unconvincing. Simply removing the page is still the best option currently.

For packages which have historically provided a __version__ attribute, and want to move away from it in a backwards-compatible way, I think the example of attrs is interesting to study. Hynek uses a module __getattr__ hook to provide a __version__ attribute if and when someone asks for it, this way the package does not incur an upfront cost of retrieving a mostly-useless attribute eagerly at import-time. As an added bonus, you also have the opportunity at that time to make a deprecation warning advising users what to do instead of accessing a __version__ attribute (i.e. to access the package metadata).

ChrisBarker-NOAA · July 23, 2024, 7:57pm

no, they are strong and convincing

Anyway, this doc should be capturing the state of the practice, not providing an opinionated idea of what people should do.

Maybe – better than keeping something very much old and outdated.

I think slightly better though is to replace this page with one that essentially says something along the lines of:

“”"
It is best practice to specify the version number of your distribution in a single place, rather than trying to keep it in sync in more than one place.

Consult your build system’s docs. for how best to do that.
“”"
And maybe a set of link to the most common build systems.

I’ll go put a comment on the gitHub PR(s)

tacaswell · July 23, 2024, 8:34pm

@matthewfeickert and I did start drafting a PEP to restore __version__

however life got in the way and we did not make any additional progress on it.

I think the best argument is that if version information can only come from importlib.metadata.version then we are saying only things that are “installed” can have versions which seems overly limiting.

If someone proposed having a runtime dependency on some other packaging system (e.g. debian, arch, or conda) I think that would be roundly rejected. In my view the *dist-info directories are part of the pip/wheel packaging ecosystem, not part of the runtime (it is my understanding that, except for the case where importlib reaches out to read them, we should be able to purge the dist-info files from disk without any runtime effect), but the recommended way to get the version at runtime is to leverage it. If you are looking at __version__ in code it is fundamentally a runtime question, not a packaging question and we should have a runtime way, that does not depend on the packaging system, to get it just like every packaging system has the ability to query the version without depending on the runtime.

I think that it was a mistake to cross the streams to begin with, but I have no expectation that we can go back or re-open that discussion again. However, I request that the official line not be actively hostile to __version__ and at a minimum leave the door open to us writing a Scientific Python - Scientific Python Ecosystem Coordination so that we can have __version__ as a standard in the scientific python ecosystem (where is is already a defacto standard).

bwoodsend · July 23, 2024, 9:03pm

Out of curiosity, can anyone else think of a sane way to get the version of an arbitrary version attr using project without building it? I maintain a tool which hinged around being able to read a project’s metadata from its project configuration files so PEP 621 felt like a dream come true until I saw the escape hatch. If it’s your own project then you can just cheat and roll some quick grep but doing it generally seems to require reimplementing whatever setuptools does to map project.version to project/__init__.py where the path is affected by src layouts and the maze of sources.include/find options. I was able to (begrudgingly) add support for setuptools-scm since at least they provide an equivalent public API but setuptools has never liked being used from Python.

I think the best argument is that if version information can only come from importlib.metadata.version then we are saying only things that are “installed” can have versions which seems overly limiting.

When is a collection of Python code deemed formal enough to warrant proper released versioning (as opposed to just running stuff straight out of git) but not formal enough to be given the (IMO more important) benefits of sys.path discover-ability, easy one-line pip install commands and handling of dependencies. To me, there’s no middle ground between a few ad-hoc Python scripts and a proper installable package (well there is but I do my best to avoid it).

ChrisBarker-NOAA · July 23, 2024, 9:34pm

Thanks – really great points, reminded me of a (maybe not very good) analogy:

When I want to know what version of a command line utility I’ve got, I type:

git --version

or

clang --version

or

whatever -- version

I certainly could “simply” use my packaging system to check for me – but wait? yum? or apt? or brew? or ??? and what if a command is installed as part of a larger package that I don’t know the name of? or ???
Yes, Python is more connected to its packaging system than *nix is, but the idea is similar.

And it’s a common use case (for me, anyway) to do

python -c "import this; print(this.__version__")

In the end there are two perfectly reasonable questions to ask:

what version of a given distribution is properly installed in this Python instance?
what version of this module am I running right now?

Sure – most of the time, those are the same question, but not always.

The folks that are heavily involved with PyPA are the folks most interested in packaging, and their experience and focus is, of course, on what I might call “proper” packages:

Managed in a VCS, published to PyPI, distributed to a wide range of users, etc…

However, I think there is also a bit of a focus on a certain class of end users as well – folks building “systems”, like web services and the like, rather than, say, scripting, or data analysis, or hacking at the REPL, or other more “casual” uses of Python.

And that’s great, because those are the hard problems and really need standardization.

But there are a LOT of users out there that are not doing either of those things – Python and its tools should be friendly to those users as well.

Which is what I don’t understand – some of us are saying that __version__ is useful and helpful for us – and we’re being told:

no – you shouldn’t do that – you should use this more robust, but more awkward, thing instead. Oh, and you should deprecate all the existing use cases as well.

Really?

I do understand why folks don’t think __version__ is the best solution to the broader use-case – what I don’t understand is the hostility to it. As I pointed about above:

a) __version__ is currently used by a lot of projects, and has been for years.
b) we have the tools[*] to generate a __version__ attribute and have the distribution properly versioned with a single source of truth.

So why are folks so opposed to __version__? I just don’t get it. What does it cost anyone?

ChrisBarker-NOAA · July 23, 2024, 9:39pm

I spend most of my coding life in the middle ground

Which may be why we have different opinions on this topic – that middle ground is pretty much where __version__ makes the most sense.

But there’s a middle ground you don’t seem to be considering – the folks writing " a few ad-hoc Python scripts" – but using proper packages from PyPI.

Those folks may want to simply type a_module.__version__ and be done.

tacaswell · July 23, 2024, 10:28pm

This is going to be context dependent. It will depend on the sophistication of the person writing the code, if they have a CI/CD pipeline (which cuts multiple ways), how many users of the code there are, what packaging environment you are using, … . Even if you are “just running out of git”, you still want the ability to get a sensible version! Editable installs are (and always have been) core to my personal workflow and to many of the scientists I support ^[1]. In cases where you are running data analysis it is very common to care exactly which commit the results where generated with (e.g. to sort out if things are before or after a bug was introduced/fixed) but doing a “release” on every commit is not really sensible.

As I’ve argued else where (Should sdists include docs and tests? - #120 by tacaswell) the “Truth” of versioning is in your VCS, everything else (.debs, .rpms, wheels, .sdists, files dumped in site-packages, …) are strictly derived artifacts ^[2] and how the version is made available to mod.__version__ can be up to the package. Using dist-info is certainly one (common) way, but it should not be the only way.

My (and putting words in his mouth @ChrisBarker-NOAA 's) point is that we should not write in stone where the threshold of being worth opting into using a process that uses dist-info should be nor what "deserves a version based on how the module was made importable.

Fundamentally, at runtime you do not care (and in my view should not care) how the module you are using was installed nor what one (of the possibly multiple overlapping) packaging ecosystems thinks the version should be, you care what the version of the module object you have in hand is.

we have made some choices that make editable installs hard and it has almost lead to pitchforks ↩︎
In this view, putting the version in pyporject.toml is the first offender of multiple sources of truth and setuptools_scm is the correct way to handle embedding version strings in build artifacts. ↩︎

stinovlas · July 24, 2024, 1:05pm

In the end there are two perfectly reasonable questions to ask:

what version of a given distribution is properly installed in this Python instance?

what version of this module am I running right now?

Sure – most of the time, those are the same question, but not always.

One example when they aren’t the same is when working with editable installs. When I install using pip install -e /path/to/my/package and later checkout different version, pip show still shows the version that was checked out when I called pip install -e. In such cases, __version__ is great.

wim · July 24, 2024, 4:37pm

Version numbers for editable installs do not make a lot of sense to me. If the source code changes, the version number needs to change. Releases have versions. The editable install is usually a temporary state between two released versions, or an unreleased ad-hoc experiment, and does not have a well-defined version number at all.

If you use an editable install in production, or deploy with a git pull, you’re going to need much more than a __version__ attribute to help you on the straight and narrow.

This is a bit hyperbolic. As far as my position goes, the packaging.python.org guide should not recommend adding a __version__ attribute, nor demonstrate any build-backend techniques to keep attributes in sync with metadata. Tools such as pyscaffold should not auto-generate templates that put a __version__ attribute into source.

As for deprecating existing uses, that was prefaced with “For packages which have historically provided a __version__ attribute, and want to move away from it…”.

If an author/maintainer wants a version attribute and finds it useful, go ahead and have one, but I’d be against a PEP attempting to standardize that practice.

mwichmann · July 24, 2024, 5:11pm

It’s pretty widely done to produce a synthetic version for in-progress work, to avoid confusion with released versions, and other in-progress versions. I think even setuptools has support for this, although I may be disremembering where I’ve seen implementations… e.g. algorithms like “if you’re at a tag that looks like a version, use it; else backtrack to such a tag and add to it information taken from date, commit hash, whatever makes sense”.

oscarbenjamin · July 24, 2024, 5:51pm

Who says that anyone is using editable installs in production?

I use editable installs for development but they still need to be able to function correctly which means that different modules need to be able to check different versions of each other. Here SymPy checks the version of an optional dependency python-flint using __version__. Everything is expected to break at runtime if SymPy tries to use python-flint when the versions don’t match the tested ranges.

If I am debugging something and have editable installs of both SymPy and python-flint and I am bisecting one of them or trying different versions of one or both then how is that supposed to work with importlib.metadata?

bwoodsend · July 25, 2024, 8:04pm

I’m surprised that editable installs are a factor here. If you edit an editable install then it’s not longer any version so both importlib.metadata.version() and .__version__ are misinformation. Likewise if you git checkout to any commit that isn’t tagged and released. Your version is at best your commit hash. That also goes for projects that aren’t fully fledged pip install-able projects but still want to be versioned.

So why are folks so opposed to __version__? I just don’t get it. What does it cost anyone?

To be clear, I’m not opposed to __version__ itself (and yes, that contradicts the title of this thread). What I oppose is:

Developers thinking that __version__ is anything more than legacy/convention and must set it whether they find it valuable or not
The wide range of wrong ways currently used to handle the duplication. I don’t like version.attr (mostly because it makes the pyproject.toml no longer machine readable but that’s because I’m in the very small minority of people who maintain something that has to read metadata from arbitrary project sources) but it works so if every __version__ enthusiast adopted that them I’d happy

I’m not fond of there being such thing as a module version but as long as it’s careful not to confuse users with the idea of there being two versions, it can stay.

bwoodsend · July 25, 2024, 8:09pm

Honestly, if I was ever bisecting python-flint to find which commit broken sympy, I’d say that python-flint really needs some better testing. This shouldn’t be seen as a regular use case.

mdrissi · July 25, 2024, 8:22pm

All libraries have bugs. Maintainers will bisect cpython itself to find bugs at times. No level of testing prevents that. This becomes more true wider used library is. I’ve encountered bugs no one reported for years that had been present for several years in libraries with millions of installs and maintainers agreed with report.

So yes I support a library level version for editable installs/internal usage quite helpful and good convention.

oscarbenjamin · July 25, 2024, 11:10pm

Speaking as a maintainer of both projects I can say that the reason for checking __version__ against that particular range of versions is because those are the version combinations that will be tested for compatibility. It is known that older versions will not work. It is not known that future versions will not work but there is a commitment to test those particular version combinations going forwards.

Regardless though I might be bisecting sympy to find a bug in sympy that is unrelated to python-flint. As I go forwards and backwards through the sympy versions I will need it to enable/disable usage of python-flint to ensure that at least import sympy still works.

tacaswell · July 26, 2024, 5:40pm

and

I think this is the key to the core of the conflict as I very much disagree with this.

Using PEP440 it is possible to generate a valid version string for every commit is a VCS (X.Y.Z.devN+gHHH is valid as is X.Y.Z.devN+dirty`) ^[1]. These release are not “final releases” per the terminology of the PyPA and not something we want downstream packagers to distribute (e.g. conda, linux distros, pypi, …), but they are well defined “versions” none the less^[2] and quite useful.

One example of leveraging this is the nightly wheels we maintain in the scientific python space so we and our downstream packages can run CI against development versions ^[3]. Looking at the wheels for Matplotlib we are making good use of all of PEP440.

In the case of the nightly wheels we have gone through a build process and can generate a static version string, but for local development is orders of magnitude more pleasant to work with editable installs. With editable installs __version__ (which can be computed at runtime) will be correct, but importlib.metadata may not be. For example, with Matplotlib we now use meson for our build system (which automatically recompiles c-extensions as needed) you only have to run pip install -e . once and than any commit you change to ^[4] “just works” and has the correct matplotlib.__version__. However the version according to pip/importlib is what ever it happened to be when you initially ran the install ^[5]. This may not be a use case everyone has, but if you have it this functionality is critical!

Developers thinking that __version__ is anything more than legacy/convention and must set it whether they find it valuable or not

We also have decades of teaching our users to look at mod.__version__ to get the version of a package. Dropping the usage of __version__ ^[6] will require a massive reeducation campaign and code churn for (to me) nominal benefit.

The wide range of wrong ways currently used to handle the duplication.

Coming at this a different way, with my project maintainer hat on there is unavoidable duplication of the version information. The “Truth” is someplace between the VCS (tags), a static file in the source, and a social construct which is quite reasonably project dependent. That information then gets projected into __version__ (for our users to have runtime access to the version of the code they are running), into the filename of the sdist, and into dist-info. The version is then further reproduced into the binary-packaging ecosystems’ metadata (wheels, conda, debian, fedora/rhel, macports, nix, …) in what ever way that ecosystem encodes it. Each of these copies is important, each serves a different need, and each has different stakeholders/caretakers. I’m not sure they even are collapsible (for both technical and social reasons) let alone if they should all be collapsed. Once you accept you are going to have N copies, going to N-1 is not very compelling.

per Version specifiers - Python Packaging User Guide “Identifying hash information may also be included in local version labels.” ↩︎
unless there is an effort to remove dev, post, and local from the version strings that I am not aware of. ↩︎
You might say, “well, that is just a sign you need more testing”, but bugs (and intentional breaking changes) happen in all projects. For example, like @mdrissi noted I end up bisecting CPython about once a month to figure out why some downstream package broke on the main branch. ↩︎
to be pedantic, any commit after we change the build system to meson. ↩︎
my understanding is that this is still not technically possible to fix as there is a hard-coded string in dist-info/METADATA, but I have not been fully following those discussions. If this is possible now can someone point me to it so we can fix Matplotlib. ↩︎
assuming for argument we can fix the version that importlib/pip reports. ↩︎

ChrisBarker-NOAA · July 26, 2024, 6:09pm

“legacy” and “convention” are different – it is very clearly “legacy”, but I would argue that it is still very much a “convention”, and some of us thing it is a useful convention, and should maybe even be more codified.

Sure – but that’s what that single-source page, written so long ago, was for. And why I want to keep (an updated version of) it. But tools can (and do) solve that problem.

Then you don’t like any dynamic attributes in a pyproject.toml, yes? And what about pulling the version from VCS, which some folks think is the “true right way” to do it?

Going to pyproject.toml was an attempt to get away from the free-form setup.py approach to a fully declarative approach, but maybe that can’t be done

Hmm – going really off-topic here – but maybe it’s. time to introduce another step in the build process. We already “first build an sdist, then build the wheel from that”.

So maybe there should be a “first build a fully declarative pyproject file, then build the sdist from there …” (or put a generated file in the sdist??

tacaswell · July 26, 2024, 6:18pm

This makes a tremendous amount of sense to me.

Please make `package.__version__` go away!

Please make `package.version` go away!