Generating PEP-440 compliant development releases containing githash

jimmywan · December 2, 2021, 8:01pm

tl;dr
I want to generate releases tagged with a githash to ease rapid shared development. Think shared development of editable installs without having to rewrite all the tooling. Surely I’m not the only person who has ever wanted to do this. Are there any valid alternatives while remaining compliant with PEP-440?

I think I can solve my problem by simply ensuring that I only do this in conjunction with one of the supported suffixes (dev, alpha, etc) dev releases knowing full well that I lose the ability to publish releases at that level sans local version that can be reliably retrieved later.

Is this the only/best way to accomplish my goal?

Problem Statement

I’ve encountered the need to internally distribute development releases of libraries in order to ease developer workflows on interim work. For anyone who has come from say the java world, they are likely very familiar with the concept of snapshot releases. For anyone familiar with git, it should also be readily obvious how a single linearly increasing version scheme is problematic when there is a need for independent but concurrent in-progress work.
Ex, the need for a “dev” release from 2 different branches without knowledge of each other will always choose the exact same “next” version when using the usual schemes made available via say bumpversion.

I was initially able to get things working using something like “0.0.0.dev-c2bf818a7a6492524e3e74f78cd770ebe01d9ee0” under poetry, but that violates the semantic versioning terms dictated by PEP-440, so I tried to use local version identifiers.

That solution works when using poetry and “will install” using pip, but pip will also happily ignore an “==0.2.0” to install a “local version”. Note that poetry does not which is likely in violation of PEP-440.

> pip install my-lib==0.2.0
...
Successfully installed my-lib-0.2.0+dev.df6283508e018618914d269432c7ad3fc4aed1c5

Having now read the fine print carefully, I realize that the use of local version identifiers for this purpose does not match the stated purpose of local versions and results in incorrectly installed dependencies.

I did at least discover that pip seems to happily respect specified local versions.

❯ pip install 'my-lib==0.2.0+dev.6d606f3753d83df8fa6a8a080f3c07375f4beab0' | tail -1
Successfully installed my-lib-0.2.0+dev.6d606f3753d83df8fa6a8a080f3c07375f4beab0
❯ pip install 'my-lib==0.2.0+dev.df6283508e018618914d269432c7ad3fc4aed1c5' | tail -1
Successfully installed my-lib-0.2.0+dev.df6283508e018618914d269432c7ad3fc4aed1c5

Is there any way for me to safely create/use githash based releases in a pip-safe manner other than my stated example of choosing an arbitrary set of semantic versions which can be safely discarded?

Misc notes:

My same approach seems to work fine with say Java/JS(pnpm, yarn, etc). The opinionated restrictions that forced me to try local versions only seems to be present in python as local versions appear to be the only place that I can inject a githash.
My examples all use long githashes, but if it makes a different we were already thinking about switching to short githashes.
pip itself does not appear to have any way to ignore local versions (flags on install). Probably for good reason.
As mentioned above, poetry does not auto-select local versions. While helpful to me, it’s probably in violation of PEP-440.
I’m not aware of any way to explicitly tag my githash-based local version as being a prerelease version.
I’ve specifically avoided linear releases due to their problematic nature during concurrent development and the associated tooling headaches. I’ve done this in the past where the solution was to just keep running bumpversion until it succeeds, and it’s not great. I specifically wanted to avoid having to deal with extraneous bumpversion conflicts which is why my githash based approach updates the version in-place and never commits those specific version changes to source control. As it’s githash based, there’s no reason to commit bumpversion’s changes.
I’ve specifically avoided direct linking to source control due to problematic tool support. Getting local tooling with say Docker as well as all of our CI/CD to work with source-control based dependencies requires a lot of extraneous credentials management that I’d really like to avoid such as injecting ssh keys, setup ssh agents, etc.
I know this can be solved via a cleaner and more orderly progression of fully tested/specced library changes down the entire transitive dependency tree, but lets face it, we live in the real world and development progress can get messy.
Circumventing this via editable installs can be an absolute nightmare. Sometimes you just really want to share a new version of a transitive dependency without forcing someone to do an error prone 35 step install using git checkout and pip install -e.

fungi · December 2, 2021, 8:24pm

The way pbr · PyPI addresses this is to add a
separate package metadata file where it can store the Git commit ID,
and then use devNNN releases precalculating the next possible
release it would be a prerelease of. It’s able to do this by
assuming strict adherence to the Semantic Versioning specification,
and allows developers to provide hints in commit message footers for
whether a particular commit implies a minor or major version
increase will be coming.

The general challenge you’re going to run into is that most folks
want “versions” which always increase, but if you have pre-release
development going on in multiple topic branches for example, it’s
not always clear when one dev “version” is earlier or later than
another. PBR takes a simplistic approach of simply counting the
number of commits since the last Git tag (I expect setuptools-scm
does similarly), but different development branches can easily have
differing commit histories resulting in misleading relationships
between higher and lower dev numbers.

You could, of course, attempt to serialize the commit id or some
abbreviation of it into an integer for a dev version suffix, but the
ordering problem will become far more pronounced as there is now no
clear sequence at all.

jimmywan · December 2, 2021, 8:34pm

The intention here is that opt-in is always explicit to a specific version. As such, if there’s no implicit automatic opt-in, there’s no need for such ordering to exist.
These are typically short-lived use casess where ordering is largely irrelevant and usage comes in the form of explicitly linked draft PRs.

Absolutely, that’s why I chose the git-based approach of don’t even try to order them. If you need to discern ordering you can just go look at the trees in source control and figure it out for yourself there.

By forcing these changes onto a linear scheme, you’re implying an ordering of changes that is likely to be incorrect or misleading.

fungi · December 2, 2021, 9:13pm

Ordering can be useful given appropriate constraints. For example,
many projects publish sdists and wheels of the heads of public
(development or stable) branches. If the history up to the point
where that value was calculated is immutable, it’s fairly safe to
assume that 1.2.3.0dev456 comes before 1.2.3.0dev789 as long as your
reference is the packages from that authority. If, however, you’re
generating versions from your local topic branch, comparing them to
the published package versions is not guaranteed to provide the same
ordering as your local history may not match the authoritative one.

It’s this precise use case which caused the PBR authors years ago to
abandon attempts to encode the commit ID in PEP-440 versions, and
instead store it in separate metadata so that packages can still
query and report it at runtime for debugging purposes (a command was
also included to list the known commit IDs for all installed Python
packages to ease inspection and cataloguing of the installed
environment).

EpicWink · December 2, 2021, 11:39pm

TLDR: check out what setuptools-scm does, and consider hosting the packages on an internal HTTP server

setuptools-scm does something similar, with adding the number of commits as a .dev release and also appending the Git short SHA and date, for example: 0.5.6.dev2+g6de2635.d20211201.

Pip shouldn’t install pre-releases (a/b/rc) unless a pre-release version is specified (eg pip install foo>=1.1a0), and I think development releases act the same way (although I haven’t tested).

The recommendation is for development releases to not be distributed via a package index, so Pip won’t have to resolve a development version. I recommend serving the packages using a simple HTTP server (internal or with basic auth), so Pip’s resolution can be bypassed.

jimmywan · December 6, 2021, 5:10pm

Thanks, I was not aware of setuptools-scm. I will check that out. I’m hesitant to create yet another package repository due to operational complexity, but will keep that solution in our back pocket.

I think the problem here is that we allowed the flexibility to attach the githash-based local version on minor/patch releases. My understanding is that PEP-440 differentiates between prerelease/postrelease/standard versions based upon the version itself as stated here.
At the end of the day, it’s more of my RTFM problem wrt PEP-440 than a pip problem, per se.

So going back to my original statement, I think the following ajustment to my approach takes care of the pre-release issue:

[quote=“JImmyWan, post: 1, topic:12302, full:true”]
I think I can solve my problem by simply ensuring that I only do this in conjunction with one of the supported suffixes (dev, alpha, etc) dev releases knowing full well that I lose the ability to publish releases at that level sans local version that can be reliably retrieved later.[/quote]

adriangb · July 20, 2022, 9:58pm

One interesting idea to get ordered unique dev builds is to use the current timestamp as the monotonically increasing value that is unique to each CI run. I suppose one could even get the timestamp for the commit itself, making it uniquely tied to the commit but also ordered in a sensible way.