Lock files, again (but this time w/ sdists!)

cemici · February 25, 2024, 12:13pm

I think this is mostly interesting as a data point for my earlier question: how come no existing tooling has found it necessary or worthwhile to implement file-level locking?

samypr100 · February 25, 2024, 2:39pm

This is probably an implementation specific question, but thought it might be worth asking to avoid ambiguity. From the proposal perspective, is there any special considerations for the tooling when a pylock.toml contained a yanked or deleted (or consequently re-uploaded with same version) entry from PyPI?

ofek · February 25, 2024, 4:32pm

I agree with Armin completely; as I said before please do not include these two keys.

This is pretty much the anti-goal. I think it’s fine for other tools to have their own formats for the snapshot-of-PyPI file type but the one described here with a focus on reproducibility simply does not exist and in the spirit of interoperability should be supported broadly. It’s even fine if it’s not a tool’s preferred/default format like how the Poetry maintainer said that they could provide support for installation but output would be through their separate export command.

With the knowledge of the use cases this proposal is trying to support, I’m curious why you think it would only be users of pip that would have such use cases.

pf_moore · February 25, 2024, 4:48pm

I’ll also say that pip won’t implement this unless it’s an approved standard. We have no interest in adding a new pip-specific feature for this.

ofek · February 25, 2024, 5:02pm

There is a significant amount of such tooling that gets implemented every year. You don’t see it because it happens at companies with private code bases who have security teams giving recommendations and enough money to pay engineers to build something that such teams would approve as a long-term vision.

groodt · February 25, 2024, 7:40pm

Does anyone know the best contact to invite commentary from pip-tools maintainers and whether they’ll support the format?

So far we’ve heard from:

pip
Hatch
poetry
uv

I think we’re still waiting on PDM as well?

groodt · February 25, 2024, 8:04pm

I’m not understanding this comment. There are tools in ecosystems that use archives to distribute packages (zip,bz2,tar.gz,etc)

conda-lock.yml
yarn.lock
Gemfile.lock
package-lock.json
pdm.lock
pip-tools does it as well, but using hashes in a requirements.txt file
etc

Languages that distribute packages as source such as golang and rust, typically lock the source repo and hashes. It’s not necessary to lock individual files in those scenarios because the repository and hashes represents the files.

I might be misunderstanding your comment. Is your comment referring to the poetry.lock and how it does not really lock to files? Or something else that I’m misunderstanding?

jeanas · February 25, 2024, 8:53pm

pip-tools → @webknjaz? PDM → @frostming?

mitsuhiko · February 25, 2024, 9:22pm

I believe it would be fine for a lock file to be eventually standardized. I however find it weird that the standard runs ahead of an actual implementation in this case. There are a lot of open questions, there are fundamental disagreements on the scope and purpose. All of this to me makes it odd to attempt to standardize it at the moment for the use of multiple tools.

ofek · February 25, 2024, 9:28pm

There isn’t even a PEP yet, this is just a preliminary discussion to gauge reception. From my perspective it’s been received well thus far and I would assume an official proposal would happen soon at which point everyone’s feedback will be addressed. Additionally, there will be an implementation because that is now a requirement for packaging PEPs.

It may not be well-known to folks unfamiliar with packaging proposals here but this is how the process goes.

cemici · February 25, 2024, 10:25pm

it has been a long thread, here is a link back to the comment that we are now discussing: Lock files, again (but this time w/ sdists!) - #61 by cemici

with respect to your list

I had in mind tooling in the python eco-system. Happy to agree that file-level locking might or might not be the norm in other parts of the world, but that does not tell us anything about why python tooling seems to have shown little interest in it
I think you are wrong, or perhaps misunderstand me, about pdm.lock and pip-tools. AIUI these both lock at the version-level and provide a list of hashes, installers may choose any of the corresponding files. This is not locking at the level of a single file
I confess that I do not know how conda works

frostming · February 26, 2024, 1:01am

Partially correct, this choice is intentional, because even if the source of the package is recorded, it may have to be changed due to network reasons during actual installation. In this case, this information becomes redundant.

Fortunately, PDM also supports writing this information in the lockfile. By enabling the static_urls and inherit_metdata lock strategies, installers can rely on the information in the lockfile only to produce a package set w/ sources to install.

matteius · February 26, 2024, 5:12am

I was tagged a couple days ago from the pipenv perspective (CC @oz123), I’ve tried -rereading it, and ever since this post is exploding – very hard for us slow readers to digest all that from a pipenv perspective I will note that cross platform locking continues to be a problem, some users have requested multiple lock files to support multi-platform environments, but I have typically pushed back on this. Without a standard to define how this would work best, it has felt like projects would have two separate development paths and it maybe works out or maybe doesn’t depending on level of parity between the disparate lock file variants, and I still think this is largely true.

What motivates the desire for multiple lock files is a deficiency for collecting hashes for system dependent sdists. I don’t fully understand this, why the sdist has to be built for a hash to be collected but I think it also stems from the fact pipenv was written ontop pip and with the pip resolver being written from the perspective of installing dependencies on the system being resolved on. If the hashes were just the sources compressed file, there is still the problem of sub-dependencies. If the sdist cannot be built on the system where lock is invoked, then it becomes harder somehow to follow the sub-dependencies.

This is an important problem to solve; but I don’t have a clear idea of how multiple lock overrides creates a consistent environment across platforms. Though makers already kind of have this problem and are often applied as a weak workaround to something that could be better defined. I keep coming back to it would be great if the resolver (pip) could resolve a complete chain of dependencies and return hashes without having to build the sdists to get there, but honestly not sure how feasible it is nor does it solve the problem of having a generic specification for lock files. I think though the goals I am describing, when abstracted are similar to this specification goals.

One thing I’ll note about pipenv is the meta hash is the hash of the Pipfile (specifiers) and not of the lock-file itself. It has always felt challenging to try and hash the lock file and place that in the _meta of the lockfile because updating that hash would essentially change the hash of the file. So therefore, devs sometimes will edit the lockfile directly, and it either works or it doesn’t, but the hash validation won’t complain since it was based on the dependency spec file.

Another thing this spec calls out lock.git, but what about a more generic lock.scm and a sub key for the type – there are multiple source vaults, pipenv supports mercurial still (I think) and while git is the most populate today, it might not forever be that way.

rgommers · February 26, 2024, 7:08am

Cemici:

eg consider the build example from earlier: among other things its requirements include:
  'colorama; os_name == "nt"',
  'importlib-metadata >= 4.6; python_full_version < "3.10.2"',
  'tomli >= 1.1.0; python_version < "3.11"',
which already splits python versions into three ranges, and os name into two categories.

A “universal” lockfile for anything depending on build is in a six-way world before it has any other dependencies at all. It does not take very much more variation before this becomes unusable.

This is a very useful concrete example, thank you. One thing that would be good to get clarity on is Python version support for the current proposal and how that would be used. Several posts have now stated that likely a “single Python version” is targeted in one environment. That likely meant “a single minor or feature version” (since that is most common and useful, and controlling micro/patch versions is quite difficult), however that leaves in the middle what tools that produce lock files are supposed to do when a package has a marker like python_full_version < "3.10.2".

That said, looking at this example lock file shows the patch version being used:

[[lock]]
markers = { python_full_version = '3.12.0', python_version = '3.12' }

If an installer encounters such a lock and the active Python interpreter doesn’t match exactly to 3.12.0, is it supposed to error out? And if so, isn’t this very fragile? If you get Python from a distro, patch version may change with a distro update, that tends to not be controllable. With the setup-python action on GitHub Actions it is controllable, but common practice is to use minor version only (python-version : '3.12').

pf_moore · February 26, 2024, 10:32am

One thought - lock.git probably also needs the direct key.

ofek · February 26, 2024, 2:39pm

Ralf Gommers:

That said, looking at this example lock file shows the patch version being used:
[[lock]]
markers = { python_full_version = '3.12.0', python_version = '3.12' }
If an installer encounters such a lock and the active Python interpreter doesn’t match exactly to 3.12.0, is it supposed to error out?

~~I would say this part should be changed such that the former should be removed while the latter becomes a version range.~~

Nevermind that wouldn’t work, I’ll let Brett answer.

brettcannon · February 26, 2024, 8:16pm

I got something working at GitHub - brettcannon/mousebender at resolve-markers-tags-requirements . I changed lock.markers and lock.tags to only record what must be true for the lock entry to be appropriate. If you look at mousebender/pylock.marker-example.toml at resolve-markers-tags-requirements · brettcannon/mousebender · GitHub you can see a lock for Python 3.12 and Python 3.8 which have different markers (as generated by mousebender/update-examples.sh at 80267617a92ddbb91e0e34c2117f175ae8c63d38 · brettcannon/mousebender · GitHub).

If this is more inline w/ what people want then there is a question of where to record these details: at the top of the lock entry or for the file that has the requirement? The former groups the details together in one place for easy understanding of when a lock entry applies to an environment. The latter helps understand what distribution added a restriction. And yes, you could have both if we don’t think it would hurt readability.

Or a I can ditch the whole concept of lock entries and we have users specify what lock file they want.

Not necessarily. If I’m working on a Windows app then what I’m locking for is known upfront. A key thing to remember is just because Python is cross-platform does not mean what everyone produces w/ it is.

Nope. Requirements files, for instance, are in the same situation. If you want to avoid yanked files then you could write a tool that tries to check for that. And as for deleted files, you should mirror your own copies of files if you’re worried about them disappearing.

I consider every use of pip-tools to be an attempt at this since their requirements.txt files are not platform-agnostic by design, just typically a happy accident.

I’m not sure I agree w/ this. There are definitely people here asking for a Poetry-style approach to also exist, sure. But that in and of itself doesn’t change “the scope and purpose” of what I am proposing.

Metadata version 2.2 or greater could allow for that.

Because you can’t assume all SCMs are and will be the same.

And my personal experience is having the different formats can be a pain (e.g. if you have to analyze the lock file you get locked in, and that’s assuming their file format is considered stable and not an implementation detail).

I think my prototype I share above w/ Charlie would address this.

brettcannon · February 26, 2024, 8:26pm

Updates:

Clarified some things in the Goals section (for those that come later; nothing new for anyone participating)
Removed file-hashes and creates-at
Marked indexes as
Added a note about GitHub - brettcannon/mousebender at resolve-markers-tags-requirements to lock.markers and lock.tags
Added lock.git.direct

cemici · February 26, 2024, 8:54pm

I see this the other way round. Given that pip-tools is not aiming at platform-agnostic requirements.txt, isn’t it interesting that they nevertheless do not try to lock to specific files?

I think they could, and probably without a huge amount of difficulty - by putting just a single valid hash for each package in the output. But they don’t.

Perhaps being more general turns out indeed to be a “happy” rather than an “unhappy” accident?

I don’t think that this is a particularly impactful point for the proposal either way, I don’t have any particular conclusion, probably this sub-discussion anyway belongs later when the PEP talks about what its goals and use cases are.

I just think it is interesting to note that there does indeed seem to be a gap in the current ecosystem here, and wonder what - if anything - that tells us.

sethmlarson · February 26, 2024, 9:18pm

Another thought, the current lock entry doesn’t have a link between index and the file URL. This makes reconstructing the software’s identity (which for Package URLs requires the index) a process that requires querying each index in the indexes list. If each lock entry had its own “index” value then the software identity can be reversed from only the lock entry.