Lock files, again (but this time w/ sdists!)

a-reich · February 24, 2024, 5:37pm

It may be informative to consider the conda-lock project for comparison? (I understand the ecosystems are different so it’s not apples to apples, I don’t mean to start another digression about that).
It seems like the design and use case is more similar to the current proposal - users give the locker a set of platforms and each distribution entry in the lockfile is tagged with which platform it’s locked for. The design is intended for the “benefit of acting as an external pre-solve for conda as the lockfiles it generates results in the conda solver not being invoked when installing”.

charliermarsh · February 24, 2024, 5:40pm

I agree with this in principle and I also want something like this, but isn’t a “Python platform” way more than an OS and architecture? Even if the goal is to do something that covers 90%+ of common cases, and to accept that life will be less straightforward for the remaining 10%, I think it is still more difficult than as described here.

It’s possible that I’m misunderstanding something about the proposal, so let me focus on a concrete example. Let’s say that your dependencies in the lockfile are exactly: dependencies = ["build"] . Very straightforward, nothing unusual, extremely popular package. However, build itself has a dependency that looks like this: 'importlib-metadata >= 4.6; python_full_version < "3.10.2"'. (This is real, not contrived.)

What does the resulting lockfile look like?

Are there separate entries for python_full_version < "3.10.2" and python_full_version >= "3.10.2"? How does the resolve know to perform separate resolutions for those two cases? How do installers know which to choose, since there wouldn’t be any difference in the tags IIUC?

(Asking very earnestly, apologies if I’m misunderstanding.)

radoering · February 24, 2024, 5:43pm

Although Poetry currently does a resolution restricted to the locked package versions when installing, I believe it would be possible for a universal resolver like Poetry’s to add the resulting marker conditions for each locked package/version to the lock file. (Marker conditions for two versions of the same package will be mutually exclusive.) Then, the installer only has to evaluate the marker condition of each locked package/version to decide if it’s relevant for the target environment. It only has to choose the best of the locked distribution files of the relevant locked package versions. In other words, I believe it is possible for a universal resolver to create a lock file so that the installer does not have to do a resolution, only evaluate marker conditions and choose one of the locked distribution files. That’s my vision of an environment-independent lock file.

OK, I forgot dependency groups in the previous paragraph. But I think, it’s the same. We don’t do it but it might be possible for a Poetry-like resolver to track (and lock) which groups require a locked package.

I absolutely agree. We might export the standard format for other tools if the demand is high enough but probably will not use it for ourselves.

pf_moore · February 24, 2024, 5:48pm

… and you have no interest in submitting the poetry format for standardisation, to avoid having to support 2 formats?

pf_moore · February 24, 2024, 5:53pm

I’d suggest that a lockfile (as I understand the term) would be for a specific Python version. It seems weird to me to imagine insisting that we have to use Sphinx 7.2.6, but we can use whatever Python version we want…

(Of course, that means I’m expecting most lockfiles in the proposed format to have requires-python pinned to a specific version, which isn’t a constraint I’d considered until now).

Not at all, your questions are insightful and helpful. Keep them coming

cemici · February 24, 2024, 6:05pm

As this thread has established, the proposal here is not suitable as a replacement for the poetry.lock. I do not see how submitting the poetry format for standardisation would reduce any support burden in poetry.

(Conceivably it would have other benefits eg that poetry and pdm lockfiles would become interchangeable, though I do not see a demand for that…)

poetry export is a plugin that is capable (except for bugs) of exporting a fully marker-annotated requirements.txt. It could make sense for the format proposed here to become a target for poetry export; it does not work for the format proposed here to be used in place of poetry.lock.

ofek · February 24, 2024, 6:52pm

The more I think about this design the more I like it and appreciate the fact that there is no resolution at the point of installation thereby also allowing for experimentation for resolvers. Before I forget I would like to note two things that resolvers could experiment with:

For such resolvers that choose to use target triples it would be awesome to copy what Zig does and allow appending a specific version of glibc for Linux e.g. x86_64-unknown-linux-gnu.X.Y which then could be translated appropriately to manylinux tags that would be locked.
Resolvers could come up with their own UX for overrides e.g. a particular dependency at a particular version should use a particular fork in Git. Projects such as PDM, UV, and Poetry already have such configuration so they would not have to change or adopt any standardized config that they may deem inferior to their ideas.

groodt · February 24, 2024, 6:57pm

The way Im now thinking about this proposal is that it replaces the pip-tools style of locked requirements.txt. It improves on them because it addresses some shortcomings in that it can support multiple environments and has more useful metadata.

It would also be a more robust replacement for anyone using “pip freeze” as a mechanism to reproduce an environment.

radoering · February 24, 2024, 7:16pm

Good question. I haven’t thought about it yet.

The first question is: Should I propose the current format, which requires re-resolving, or should I propose what I described as “my vision”? Standardizing the current format makes the transition to “my vision” more difficult so I probably wouldn’t want it to be a standard. Since “my vision” cannot be written by any tool at the moment and I only believe it is (not sure it really is) possible for Poetry, it may turn out that it is impractical for any resolver to create this format. That’s neither a good starting point for standardization.

Further, only universal/environment-independent resolvers will be able to create such a format. I don’t think that resolvers that solve for a specific environment (afaik pip) will be able to create this format. How shall they calculate resulting marker conditions if they only evaluate marker conditions by inserting values?

All in all, I think it’s too early considering my limited resources.

In case uv decides to take this path and is willing to try what I described to make re-resolving unnecessary, they might be earlier in a position to propose such a format as standard.

pf_moore · February 24, 2024, 7:40pm

Fair enough. My key point was that unless someone actually describes the use case that Poetry-style locking solves but Brett’s propsal doesn’t, I don’t see a way forward other than “propose an alternative standard (i.e. standardise Poetry’s approach)”. But from what you say, it looks like Poetry’s approach isn’t really in a position to be standardised, which is fine.

I’d still love it if someone could clearly explain at least one use case that needs whatever it is that Poetry does, which Brett’s proposal doesn’t. At the moment I feel like everyone’s being expected to “just know” what lockfiles are for…

groodt · February 24, 2024, 8:09pm

I think the use case would be something along the lines of:
“generate universal constraints on any platform, install from universal constraints on any platform”

So in this scenario, somebody on Linux can produce a lockfile for somebody on Windows, macos, and any environment not specified ahead of time.

An important distinction is that it may still fail to install on an exotic platform. (Late evaluation)

The proposal (eager evaluation) would appear to fail-fast and refuse to produce a lockfile for any target environments that it wouldn’t be possible to install into at some later point. It gives you stronger guarantees in some sense. For example, if you know you are targeting some “cloud lambda” runtime, and you produce a lockfile for some “cloud lambda” runtime ahead of time, you can broadly send the lockfile to “cloud lambda” runtime and it could fetch and install and run (ignoring a whole suite of sdist or network issues) and you’d have greater guarantees it would work.

cemici · February 24, 2024, 8:25pm

And the way in which the current proposal does not work for that use case is that it requires the resolver to enumerate a combinatorially large number of possible environments in the lock file.

eg consider the build example from earlier: among other things its requirements include:

  'colorama; os_name == "nt"',
  'importlib-metadata >= 4.6; python_full_version < "3.10.2"',
  'tomli >= 1.1.0; python_version < "3.11"',

which already splits python versions into three ranges, and os name into two categories.

A “universal” lockfile for anything depending on build is in a six-way world before it has any other dependencies at all. It does not take very much more variation before this becomes unusable.

groodt · February 24, 2024, 8:30pm

Right, but I think the proposal wouldn’t attempt to enumerate them by sniffing the markers.

I believe the proposal starts by declaring a fixed number of target environments and then produces lockfiles for those.

Installing into environments not defined in the lockfile is “unsupported” / all bets are off. It may indeed work, but the lockfile won’t guarantee that it will.

cemici · February 24, 2024, 8:31pm

sure, but that is saying the same thing from the other end. The result of locking only for a handful of known environments is that you do not have a universal lock file - which is what poetry is trying to achieve.

BrenBarn · February 24, 2024, 8:32pm

I agree with both these comments. To me this seems like the most important thing. Even if the proposal is separate from any concrete tools, it would be useful to see something like “The intent is that with this proposal, we can create tools that will allow a user to do X”.

As usual, I actually think that it is important to make a decision on this matter (or at least clarify it somewhat) before we get too deep into the discussion of any particular proposal. There is no point debating the technical details of a proposal unless, at the conceptual level, we think it can support the use cases that users currently feel are lacking.

If I understand right, the main point of contention is whether the lockfile specifies exactly what will be installed, or only specifies (as @sirosen called it) a “boundary” around what may be installed. What use cases would push users toward wanting one of these behaviors versus the other?

When I’m thinking about using lock files I’m mostly thinking about things like deploying a web app or distributing a desktop app. The goal is to set up a “known working” environment. From this perspective I feel like a “strict” lockfile makes more sense, because if the platform where I’m going to install the lockfile differs enough that the package set needs to be adjusted somehow, then it’s no longer a known-working environment.

If a lockfile format is created and finds its way into user-facing tools, my hunch is that users are going to be more irritated if the “install from this lockfile” operation seems to succeed, but then the software doesn’t actually work, than if the install errors out even though it actually could have worked (e.g., by ignoring certain constraints). In the latter case, there is always the fallback of trying to do a “regular” install (i.e., with install-time resolving). That again suggests to me that it makes more sense to err on the side of strictness, and prefer false-negatives (“this says it won’t work but I can make it work”) to false-positives (“this says it will work but it doesn’t”).

I quoted the footnote here because I think there is a subtle but important difference between what it says and what the main text of your comment says. In the main text, you’re speaking from the perspective of someone writing an installer, and you assume the lockfile you’re handed will work on this environment. But in the footnote it’s the user who knows the lockfile won’t work. But these perspectives are quite different. The beliefs of users about whether the lock file will work is likely to depend on much more than just what tool produced it, and in fact, in many cases the user may have little or no understanding of what tool produced it or how. Instead they will often be seeing instructions on some project’s website that say “to install this project, type installfromlockfile mylockfile.lock”, and it is on that basis that they will believe that it will work.

My point here is just that, although I agree users will probably not try to install a lockfile that they know won’t work, there may be many cases in which they try to install a lockfile that we know won’t work (or that the project author or installer author knows won’t work). I think this is relevant if we think about the robustness or failure modes of the lockfile, along the lines I described above.

So I agree with your point that maybe markers and tags aren’t so relevant. In the end what I see as the point of lockfiles is providing a sort of anchor or lever that allows software authors to provide a greater degree of confidence to software users: “if you use this tool to install this lockfile, and the operation succeeds, you will get a working version of my software”. The usefulness of that guarantee depends on the tradeoff between how many caveats it has to have (i.e., does the user have to use a specific install tool, what happens if the operation fails) and how much can be guaranteed, and I think we should have that tradeoff in mind when thinking about the design.

groodt · February 24, 2024, 8:38pm

Absolutely. We’re in full agreement as far as I can see. Thats why I think poetry agreeing not to support this format (perhaps beyond exporting into the format) makes a lot of sense.

I guess another way to frame the differences would be:

target environments specified ahead of time (this proposal)
target environments unspecified / universal (poetry)

pf_moore · February 24, 2024, 9:03pm

(Note: at least 3 other messages came in while I was writing this. Apologies if some of the comments are therefore a little out of date).

I’m still confused. Let me try and turn that into something that’s as specific as I’d like.

I’m writing some code in Python, and in order to run it, I need a number of libraries. For my development, I’m using a requirements.txt file to install the libraries into a virtual environment, but I don’t fix the versions of my dependencies - as long as my code works, I’m good.

Now, I want to share my code with my colleagues. In order to ensure they get the same results that I get, I want to make sure they are using the same software that I’m using, so I feed my requirements.txt file into a “locker” which spits out a Python lock file. I can now share my scripts, plus that lock file, and my colleagues can use an installer to create an environment containing exactly the same software as I have been using, and do their own analysis and get the same results.

(I’ll note at this point that this use case implies that I want my colleagues to use the same Python version as me, as well. I’m not sure I can think of a realistic scenario where I’d care about them using the same versions of my dependencies, but not use the same Python version).

Brett’s proposal seems to cover this case just fine.

Now let’s suppose that one of my colleagues is using a Mac, and I’m using Windows. OK, that might be an issue. I don’t have access to a Mac. Luckily, though, my locker allows me to say lock --platform=mac and it creates a lock file for the Mac.

This is still supported by Brett’s proposal, assuming that cross-platform resolving is possible. The locker needs to be able to resolve for a Mac while running on a Windows system. And that’s certainly possible (with the usual caveats about sdists and build-time code execution), because you can construct your own marker and tag set, and run a resolve using that. But it’s just a problem for the locker, not for any other part of the chain.

I’m still not seeing why it’s of any advantage to leave the target computer to do the resolve here.

I don’t follow this at all. Isn’t the whole point to be able to know that you can send the lockfile to the target environment and be sure that it will run? That’s why you want to precompute the exact wheels (or sdists if you must) that you want the target to download and install. If you’re not doing that, why not just send a requirements file (or a standardised equivalent)?

Note that I’m very definitely not seeing lockfiles as just “standardised reqirements files”, and if anyone is thinking that, then we have a very different understanding of what locking is - after all, this is a perfectly valid requirements file:

rich
click >= 7.0

But there’s no way I’d consider that as a “lock file”!

I’m really not seeing why people are insisting that there’s a need for the installer to do a second resolve here. That seems to me like it’s the step that lockfiles are designed to eliminate. If it’s not, then what benefit is using a lockfile giving (over a requirements file^[1], to be concrete)?

I’m sorry if I’m being particularly dense here. I’m trying to understand the problem you’re describing, because I don’t want to end up approving a standard that doesn’t address the needs people have. But I’m not going to reject the proposal based on a scenario that no-one can describe clearly enough to show where the proposal fails!

Thank you for being explicit. So what you’re saying is that you want to be able to request a lockfile that’s intended to be usable on any platform, without needing to explicitly state what platforms you’re targetting? OK, I can see that would be an problematic situation - as you say, it’s a combinatorial explosion.

But is it realistic? If, as @ofek suggested, lockers default to supporting the major platforms, and you cover a sensible set of python versions (at worst, 5 versions if you limit yourself to supported minor versions, 2 if you omit security fix only releases), then it’s entirely manageable. And I’m still struggling to think of a scenario where I’d use a lockfile and I couldn’t say something like “you need to use the same Python version as I used”, so that’s just one Python version. And in my scenario, as I described it above, adding a new lock target when people ask for an additional platform to be supported seems like it’s also a perfectly fair way of handling that situation.

Your “universal” lockfile sounds more like a base set of requirements, combined with some form of “snapshot” of the state of the relevant package index(es) to me. Which is an interesting, and potentially useful, idea - but not at all what I’d describe as “locking”.

But the point here isn’t to make judgements on scenarios as described. Thank you (and Greg) for the explanation of the scenario. I think Brett has already said he plans on considering this model out of scope, and based on what I’ve seen here that seems like a fair decision to me. I’m happy to hear of other scenarios if you think I’m still missing something important.

insert “or a standardised equivalent” boilerplate here ↩︎

groodt · February 24, 2024, 9:29pm

You’re not being dense at all! I believe you’ve got it understood now. Your scenario with the colleagues is pretty common (including even in day to day open source where contributors contribute from different platforms).

You’re absolutely correct that Brett’s proposal doesn’t prevent that style of work. But to do so does require it making a few fairly large assumptions:

The target platforms that lockers would target have broad coverage (80%+ or something) I’m sure it’s easy to find data on this
Lockers have sufficient static metadata and heuristics and hacks to produce lockfiles successfully for common target platforms and popular projects (this is improving a lot, I don’t know how it can be measured, outside perhaps grabbing open source requirements.in or pyproject.toml and trying to lock them)

I think it’s fair to say that the level of poetry support could be “supports export into pylock.toml when target environments are specified”

The poetry file is more “Dynamic” in nature, while this proposal is more static and more explicit.

The poetry file is broadly a superset in many ways.

e.g

Poetry can go from tool.poetry.dependencies → poetry.lock → pylock.toml

Mousebender (and other tools) can go from project.dependencies → pylock.toml

ntessore · February 24, 2024, 9:31pm

I agree with everything that is being said here, but I just wanted to add a point: The far most common use I have got out of lockfiles is that I myself want to replicate my exact results years down the line, when my environments are long gone.

cemici · February 24, 2024, 9:34pm

I think it is very reasonable to view poetry.lock as not really a lock file. It is more like a mini-pypi, from which it promises that a solution can be derived at install time.

“Is it realistic?” is a confusing question: it is not only realistic but actually real!

The reproducibility that this brings is certainly valuable: my CI pipeline tomorrow will behave the same way that it did today. But perhaps more valuable is the promise that poetry makes when it writes the lockfile: as a package developer, this is a machine-checked assertion that my package will indeed be installable everywhere.

But having said earlier that “what poetry does” was a digression… I see I am again falling into that.