Structured, Exchangeable lock file format (requirements.txt 2.0?)

uranusjr · April 23, 2020, 12:57am

One idea I’m having is to have a field for a lock file to declare platform compatibility. But ultimately this can never be guaranteed due to the state of Python packaging; the tool can have all the intent to produce a platform-agnostic dependency tree, but the result can never be theoretically platform independent. So I think the most practical approach to this is to position this as describing what would happen if your intention is applied on this machine and let tools decide how much it wants to extrapolate the result.

I’d say it should fail by default. We can recommend this in the spec, but honestly I think most users just don’t care about validation enough and ultimately most tools would just grow a --no-validate if hash algorithm support becomes a problem.

Given my primary goal is to handle the every-tool-can-do-this scenario, my suggestion would be to error out if an installer does not support the resolver’s result. And if the different tools differ in how the dependency can be satisfied, the resolver (the tool producing the file) is responsible for describing how each should work, and let the installer (the tool consuming the file) decide which route to choose. For example, numpy can be installed either with pip or Conda, so the resolver can produce something like

{
    "conda": {
        "name": "numpy",
        "source": "anaconda",
        "version": "1.18.1"
    },
    "python": {
        "name": "numpy",
        "source": "pypi",
        "version": "1.18.1"
    },
    "dependencies": {...}
}

to tell the installer it can satisfy this either by using pip or Conda. So the rule of satisfying a dependency should be

All of dependencies must be satisfied.
At least one of the remaining keys can be satisfied.

And the installer should reject the file as supported otherwise.

All these questions really make me think more deeply about the underlying philosophy behind all the choices I think the main thing I have in mind for this is to separate “populating an environment from user request” into a resolver-installer pipeline, and the installer part is really just reduced to simply downloading a thing, and apply it (and only it) into the environment. All the questions about what things to find when to find them, and in what situations, are all answered by the resolver. This may not be how all package manager lock files currently work (which I think is likely also why the term becomes confusing), but it (IMO) the best way to abstract the process and provide an exchange format for tools to understand.

Another thing I want to mention:

I think that’s a good approach. One other thing I think might be possible is to write it as a lossy export format (what the input file would be if we assume certain architecture variables). This might not always be possible (and Spack can say “nope can’t generate that for this environment”), but I imagine it could be usedul when it works (I could ver possibly be wrong since I am not familiar the audience of Spack).

EpicWink · April 23, 2020, 1:44am

It seems agreed that the purpose of the lock file is to enable tools to re-build an environment effectively identical to the original. What if instead the purpose was to simply state what the original environment was?

This would change the mindset of the development of support for the lock file: instead of having the tools worry about the differences between the features of the tools producing and consuming the lock file, you leave that to the developer. The end-user should know more intimately whether pip is good enough to specify their environment, or if more of the environment needs to be specified.

Each tool would put in to the lock file as much of the environment as they track. The lock file would have to support all kinds of markers, which would make it more of an open-ended standard.

This would remove conditions from the lock-file (such as Python version, platform etc), and push the choice of environment on to the end-user.

I imagine this is what currently happens with Poetry and pipenv

This is solved in the fact that the end-user chooses the tool which has the features they need

uranusjr · April 23, 2020, 2:42am

I don’t. The file I have in mind sits in the middle of the resolver-installer pipeline, and communicate what the resolver thinks the environment should look like for the installer to materialise. What the resolver comes up is an intent, not an actual environment, so there’s no original to replicate. In other words, the lock file is not derived from an environment, but the environment is derived from a lock file.

What you describe would be more like what I call a freeze file (because it is the idea behind pip freeze). It is indeed useful in some cases, but much less so than a lock file, because a lot of the context would be lost in the installation process.

I can say Poetry definitely does not do this. Pipenv does not want to do this either (what it actually does is another issue). I’ve also been trying to clarify the format is exactly not this (and therefore has a chance to satisfy the tools’ needs), but I think I’ve completely failed again, seeing you take away the entirely opposite message

alanbato · April 23, 2020, 2:49am

Isn’t that our current situation? If a “standardized file” produced by, say, Pipenv can only be read and executed properly by Pipenv, then what’s the different between that standardized file and the current use of Pipfile.lock?

EpicWink · April 23, 2020, 3:13am

Ahh, you’re right. That is what I mean. Sorry for adding to the confusion.

And I’d have said the format you’re describing is a specification. Again, this stems from my confusion on the naming of “lock file”, which your reply immediately cleared up for me. This topic’s quite long, I’d forgotten about your break-down of spec vs intent vs actualisation.

The benefit is standardising the components of the specification which are common between tools, while the tool-specific segments have no guarantee to be acted upon by other tools. A team of app developers may wish to ensure the platform and Python version of their deployed app, but know that their project is pure Python so the developers can use their platform of choice, for example.

Edit: now that I think about that example, however, it might make more sense to explicitly specify that during dev, a minimum Python and platform feature-set is required, but during production, a specific Python version and platform feature-set is required. External developers using tools which don’t honour these dev requirements may get confused as to why they can’t build and/or use the app, and must resort to opening a ticket

cjolowicz · May 12, 2020, 2:59pm

I would like to contribute two potential requirements for a lock file format to this discussion, which I have not seen mentioned but which seem important to me (apologies in advance if I missed something).

Any lock file format should be designed to be easy to merge by git and similar tools.
It should be possible to define dependency sets which do not include the project itself.

The second point allows us, for example, to have a rich suite of linters with deterministic CI, without constraining the core dependencies of the package. Extras, by contrast, define dependencies on top of the core dependencies. So do the development dependencies of Poetry and (I believe) pipenv.

About the first point: Merge conflicts arise quite frequently with Poetry’s lock files when dependencies change on different branches. By contrast, with a tool like yarn you also regularly get merge conflicts in the lock file (yarn.lock), but in most cases yarn can then figure out the conflict resolution itself. This is much harder to do with Poetry’s lock file format.

uranusjr · May 12, 2020, 6:47pm

Thanks for raising these issues! I think both of them are very relevant. In fact, one of the reasons I chose to explicitly mandate the JSON formatting options (key ordering and indentation rules) in my proposal above is exactly to produce minimal diff output. I will try to remember to put this in as a discussion point on formats.

As for the second point, I think people omit that (maybe unintentionally) since it is a core characteristic to requirements.txt (compared to say Poetry’s pyproject.toml), and a must have if the goal is to replace the format. I proposed the metapackage feature as an attempt to generalise this use case, so the user can refer to these dependency sets by a name.

lorenanicole · April 6, 2021, 7:35pm

Hi there! I was wondering if there was any new activity on this? I see the PEP 650 but haven’t been able to find anything more in the last 9 months or so on this.

uranusjr · April 7, 2021, 1:11am

There has been some private discussions going on, but I don’t think things have reached the state to be announced publicly yet. Considering how diversive and opinionated this topic is, it’s feared a premature publication may cause the discussion to diverge too much and impossible to come to a conclusion (like this thread). So… stay tuned

h-vetinari · August 1, 2021, 10:42am

That PEP @uranusjr was alluding to has arrived: PEP 665: Specifying Installation Requirements for Python Projects

Topic		Replies	Views
PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application Packaging	180	14073	April 16, 2023
Standalone app deployment story Packaging	34	9457	December 17, 2019
What is the status on a pip lock file? Packaging	8	7590	April 3, 2023
PEP 650: Specifying Installer Requirements for Python Projects Packaging	65	5427	June 24, 2022
Lock files, again (but this time w/ sdists!) Standards	310	11563	March 29, 2024

Structured, Exchangeable lock file format (requirements.txt 2.0?)

Related Topics