I think this is mostly interesting as a data point for my earlier question: how come no existing tooling has found it necessary or worthwhile to implement file-level locking?
This is probably an implementation specific question, but thought it might be worth asking to avoid ambiguity. From the proposal perspective, is there any special considerations for the tooling when a pylock.toml contained a yanked or deleted (or consequently re-uploaded with same version) entry from PyPI?
I agree with Armin completely; as I said before please do not include these two keys.
This is pretty much the anti-goal. I think it’s fine for other tools to have their own formats for the snapshot-of-PyPI file type but the one described here with a focus on reproducibility simply does not exist and in the spirit of interoperability should be supported broadly. It’s even fine if it’s not a tool’s preferred/default format like how the Poetry maintainer said that they could provide support for installation but output would be through their separate export command.
With the knowledge of the use cases this proposal is trying to support, I’m curious why you think it would only be users of pip that would have such use cases.
There is a significant amount of such tooling that gets implemented every year. You don’t see it because it happens at companies with private code bases who have security teams giving recommendations and enough money to pay engineers to build something that such teams would approve as a long-term vision.
I’m not understanding this comment. There are tools in ecosystems that use archives to distribute packages (zip,bz2,tar.gz,etc)
conda-lock.yml
yarn.lock
Gemfile.lock
package-lock.json
pdm.lock
pip-tools does it as well, but using hashes in a requirements.txt file
etc
Languages that distribute packages as source such as golang and rust, typically lock the source repo and hashes. It’s not necessary to lock individual files in those scenarios because the repository and hashes represents the files.
I might be misunderstanding your comment. Is your comment referring to the poetry.lock and how it does not really lock to files? Or something else that I’m misunderstanding?
I believe it would be fine for a lock file to be eventually standardized. I however find it weird that the standard runs ahead of an actual implementation in this case. There are a lot of open questions, there are fundamental disagreements on the scope and purpose. All of this to me makes it odd to attempt to standardize it at the moment for the use of multiple tools.
There isn’t even a PEP yet, this is just a preliminary discussion to gauge reception. From my perspective it’s been received well thus far and I would assume an official proposal would happen soon at which point everyone’s feedback will be addressed. Additionally, there will be an implementation because that is now a requirement for packaging PEPs.
It may not be well-known to folks unfamiliar with packaging proposals here but this is how the process goes.
I had in mind tooling in the python eco-system. Happy to agree that file-level locking might or might not be the norm in other parts of the world, but that does not tell us anything about why python tooling seems to have shown little interest in it
I think you are wrong, or perhaps misunderstand me, about pdm.lock and pip-tools. AIUI these both lock at the version-level and provide a list of hashes, installers may choose any of the corresponding files. This is not locking at the level of a single file
Partially correct, this choice is intentional, because even if the source of the package is recorded, it may have to be changed due to network reasons during actual installation. In this case, this information becomes redundant.
Fortunately, PDM also supports writing this information in the lockfile. By enabling the static_urls and inherit_metdatalock strategies, installers can rely on the information in the lockfile only to produce a package set w/ sources to install.
I was tagged a couple days ago from the pipenv perspective (CC @oz123), I’ve tried -rereading it, and ever since this post is exploding – very hard for us slow readers to digest all that from a pipenv perspective I will note that cross platform locking continues to be a problem, some users have requested multiple lock files to support multi-platform environments, but I have typically pushed back on this. Without a standard to define how this would work best, it has felt like projects would have two separate development paths and it maybe works out or maybe doesn’t depending on level of parity between the disparate lock file variants, and I still think this is largely true.
What motivates the desire for multiple lock files is a deficiency for collecting hashes for system dependent sdists. I don’t fully understand this, why the sdist has to be built for a hash to be collected but I think it also stems from the fact pipenv was written ontop pip and with the pip resolver being written from the perspective of installing dependencies on the system being resolved on. If the hashes were just the sources compressed file, there is still the problem of sub-dependencies. If the sdist cannot be built on the system where lock is invoked, then it becomes harder somehow to follow the sub-dependencies.
This is an important problem to solve; but I don’t have a clear idea of how multiple lock overrides creates a consistent environment across platforms. Though makers already kind of have this problem and are often applied as a weak workaround to something that could be better defined. I keep coming back to it would be great if the resolver (pip) could resolve a complete chain of dependencies and return hashes without having to build the sdists to get there, but honestly not sure how feasible it is nor does it solve the problem of having a generic specification for lock files. I think though the goals I am describing, when abstracted are similar to this specification goals.
One thing I’ll note about pipenv is the meta hash is the hash of the Pipfile (specifiers) and not of the lock-file itself. It has always felt challenging to try and hash the lock file and place that in the _meta of the lockfile because updating that hash would essentially change the hash of the file. So therefore, devs sometimes will edit the lockfile directly, and it either works or it doesn’t, but the hash validation won’t complain since it was based on the dependency spec file.
Another thing this spec calls out lock.git, but what about a more generic lock.scm and a sub key for the type – there are multiple source vaults, pipenv supports mercurial still (I think) and while git is the most populate today, it might not forever be that way.
This is a very useful concrete example, thank you. One thing that would be good to get clarity on is Python version support for the current proposal and how that would be used. Several posts have now stated that likely a “single Python version” is targeted in one environment. That likely meant “a single minor or feature version” (since that is most common and useful, and controlling micro/patch versions is quite difficult), however that leaves in the middle what tools that produce lock files are supposed to do when a package has a marker like python_full_version < "3.10.2".
If an installer encounters such a lock and the active Python interpreter doesn’t match exactly to 3.12.0, is it supposed to error out? And if so, isn’t this very fragile? If you get Python from a distro, patch version may change with a distro update, that tends to not be controllable. With the setup-python action on GitHub Actions it is controllable, but common practice is to use minor version only (python-version : '3.12').
If this is more inline w/ what people want then there is a question of where to record these details: at the top of the lock entry or for the file that has the requirement? The former groups the details together in one place for easy understanding of when a lock entry applies to an environment. The latter helps understand what distribution added a restriction. And yes, you could have both if we don’t think it would hurt readability.
Or a I can ditch the whole concept of lock entries and we have users specify what lock file they want.
Not necessarily. If I’m working on a Windows app then what I’m locking for is known upfront. A key thing to remember is just because Python is cross-platform does not mean what everyone produces w/ it is.
Nope. Requirements files, for instance, are in the same situation. If you want to avoid yanked files then you could write a tool that tries to check for that. And as for deleted files, you should mirror your own copies of files if you’re worried about them disappearing.
I consider every use of pip-tools to be an attempt at this since their requirements.txt files are not platform-agnostic by design, just typically a happy accident.
I’m not sure I agree w/ this. There are definitely people here asking for a Poetry-style approach to also exist, sure. But that in and of itself doesn’t change “the scope and purpose” of what I am proposing.
Metadata version 2.2 or greater could allow for that.
Because you can’t assume all SCMs are and will be the same.
And my personal experience is having the different formats can be a pain (e.g. if you have to analyze the lock file you get locked in, and that’s assuming their file format is considered stable and not an implementation detail).
I think my prototype I share above w/ Charlie would address this.
I see this the other way round. Given that pip-tools is not aiming at platform-agnostic requirements.txt, isn’t it interesting that they nevertheless do not try to lock to specific files?
I think they could, and probably without a huge amount of difficulty - by putting just a single valid hash for each package in the output. But they don’t.
Perhaps being more general turns out indeed to be a “happy” rather than an “unhappy” accident?
I don’t think that this is a particularly impactful point for the proposal either way, I don’t have any particular conclusion, probably this sub-discussion anyway belongs later when the PEP talks about what its goals and use cases are.
I just think it is interesting to note that there does indeed seem to be a gap in the current ecosystem here, and wonder what - if anything - that tells us.
Another thought, the current lock entry doesn’t have a link between index and the file URL. This makes reconstructing the software’s identity (which for Package URLs requires the index) a process that requires querying each index in the indexes list. If each lock entry had its own “index” value then the software identity can be reversed from only the lock entry.