Structured, Exchangeable lock file format (requirements.txt 2.0?)

I can give you two problems that I live with regularly which impact users (and obviously make my life more difficult :wink:).

One is deploying to a PaaS/serverless platform (this is more general, but this is specific scenario is the one I have lived through). If that service wanted to install dependencies on behalf of the user, how are they to do that today? The best they can do is add support for every tool that users may want, or try to force users to a single tool. Obviously a “vendor lock-in” version of forcing users to a specific tool is not exactly a great result when the community has not come to an agreement. And so you might say that perhaps people should bundle their dependencies with their code? OK, how do you do that with Poetry? Pipenv? Pip? You’re once again back to documenting and trying to support users by teaching them how to use their tool to install dependencies for potentially a different platform in order to deploy them to their production system (and this doesn’t have to be cloud-specific; Docker or any other system where your dev OS differs from production plays into this).

Two, how do editors install and manage your dev requirements for you? For VS Code we have to manually add support for every tool where we want to help users install e.g. a linter or formatter. And if we ever add support to help walk users through setting up a development environment we will need to support installing all dev requirements which will once again be tool-specific. And I know Poetry is bumping up against this because we have not gotten around to supporting it fully in the Python extension for VS Code (it’s on our roadmap, BTW).

How is that not a lock file? I’m curious as to what your definition of a lock file is compared to a list of packages to install that are specified to a specific version?

I’ll also note that Installing dependencies in containerized applications has been announced as a tool that reads the various lock file formats we have going in the community and tries to abstract them out for orchestration purposes. The fact we need a tool for that I think plays into this discussion. :slight_smile:

I think there’s an important distinction to make here. In Spack we talk about abstract and concrete specifications.

Abstract Specs

An abstract spec is only partially constrained. It has the names of packages you want, maybe some features, versions, compilers, and other preferences. That’s what the developer tells you they “require” to set up the environment.

Concrete Specs

A concrete spec has everything. Lockfiles are concrete. They have the names, hashes, versions, etc. of packages and dependencies, and they can very well be tied to particular environments, platforms, resolution algorithms, etc.

Reproducibility vs. portability

Which one you use depends on how you want an environment to be reproduced. The abstract spec is more portable but less reproducible, because a different resolver or platform can affect what you get. The lockfile lets you produce exactly what you got, but it may not work at all if you change the OS/arch/python version/etc.

I see a use for both of these types of reproducibility. Sometimes you just want the app to be built how it needs to be built for the environment (abstract). But if you want to avoid surprises, and you know you’ll be in the same environment, you want a lockfile to reproduce things exactly. Or maybe several lockfiles, if you deploy to multiple environments, but don’t want churn in any one of them.

Spack environments have an abstract spack.yaml and a concrete, generated spack.lock, described here, and you can set up an environment from either. Both have their uses.

Making a “minimal” spec

If you want to trim this down to a “minimal” specification, I think you really need to define how “concrete” you want the “standard” lockfile to be. What attributes should be included, where are they expected to be valid, do they depend on a particular resolver, etc.

Spack’s format has a lot more that I think you want to handle here – compilers, architectures, flags, build options, etc., and it’s very tied to particular platforms. To be honest, I think that stuff is very much needed when you talk about native dependencies, but if you can rely on a spec like manylinux to provide most of the assumptions, then maybe you can dispense with a lot of it.

For pure Python, I think packages, versions, and options are probably sufficient and useful for a lockfile spec. But maybe the spec should standardize some abstract format (i.e., a better requirements.txt for portability) as well as the lockfile.

I still think there are going to be OS-dependent/resolution-sensitive things in a pure python lockfile (as @dstufft mentioned). So it might be worth saying in the spec when that will happen and when the reproducibility guarantee isn’t cross-platform for pure Python stuff. Or maybe the lockfile should mark parts that are OS-sensitive so that a tool can either require the same OS, or try to re-resolve them (which is quite hard).

2 Likes

This I think is solvable using some list of things to install (more on this later).

This I do not think is solvable. Poetry is almost certainly going to expect that if you add a new dependency, it gets added under the [tool.poetry] section of pyproject.toml. Pipenv is almost certainly going to expect that if you add a new dev dependency, it gets added to Pipfile not to some hypothetical lock file. Unless you just mean “we want a list of dev dependencies”, which is roughly the same thing as the first case, just with a qualifier as to what kind of dependencies you want.

I’m actually struggling to try and put to words what I’m trying to convey here. To my mind, a lock file doesn’t describe a list of things to install, it describes the state of the world at the point the lockfile was created. This means that, given a deterministic resolver, resolving the same set of dependencies will always resolve to the same set.

I’ve carefully worded that, because an important thing here I think is the ability to include things that the resolver might not actually take into account (e.g. extra packages it doesn’t need). Hypothetically, a lock file could contain a complete snapshot of the entirety of PyPI at the time of creation and the end result would still be the same.

Different implementations of a lock file could take this to varying degrees, such as “locking” in specific files it used, or recording the end result of the resolver (such that this lockfile is only valid on a specific platform) or by attempting to exhaustively resolve all the combinations of conditional dependencies to include.

This idea is a little bit strained, because some lock files can be implemented as a list of packages to install if the features of the installer are sufficiently simple, but makes more sense when you start thinking of more complex installer features. Like Pipenv will resolve the fully set of dependencies, as if you specified to isntall the development dependencies, even if you didn’t ask for that and will reflect all of them into the lock file. I assume that poetry does something similiar.

Part of honestly though is I think to actually be a replacement for poetry’s lock file, Pipenv’s lockfile, Spack’s lockfile, etc is it has to actually support through some mechanism all of the features that each of those tools have. However as soon as you start to add support for those features, you either mandate that all tools support those features (thus making the lowest common denominator the superset of features) OR you end up in a weird situation where the tools are using the same format on the surface, but uses of that format aren’t actually interchangeable because properly using said format relies on interpreting implementation specific data inside that format. I think the former is unresolvable (you’re never going to get every tool to agree to the same set of features, if you did we wouldn’t have multiple tools) and the latter puts users in a really bad place where we claim to have this interoptable standard, but it’s not really interoptable because to actually use it requires relying on implementation specific details. So given that I don’t think a replacement for the various lockfiles is meaningfully possible, this “it’s not a lockfile I swear” is largely an attempt to get the same benefit in the one major use case I can see for an interoptable lockfile (I run a platform and want a way to describe the dependencies you need me to install) by treating that as a distinct artifact.

A lot happened since I last responeded, so forgive me if I miss something and do not respond. Please feel free to point them out. A lot of the discussed have also been carried out well IMO, so I’ll try to respond mostly to points that seem to still be left open to me.

My intention behind to proposal is solely on the concrete specs. The idea is to make the format to represent the result of a resolution process, and be immediately consumable for an installation process to create a fully operable runtime. So it is not intended to be passed into a resolver; the only resolution logic (in some sense) needed would be to process conditional dependencies, i.e. not install certain things on a certain platform, which is specified in the proposal by environment markers. Environment markers have limitations, of course, but I believe it is possible to produce a reasonable declarative system that can describe most scenarios on conditional dependencies.

Honestly this (and other comments you’ve made) seem to me we’re actually trying to have the same thing, except you don’t agree with it being a lock file. Let’s call it something else then. Quoting myself from a previous message:

This is probably my fault; I’ve been calling the idea a “lock file” (and even name the repo as such), and that likely makes people start at the wrong track right from the beginning.

To be clear, the format is created to solve a problem, and as long as I can get the problem solved, it can be called whatever and be classied as whatever. Lock file, requirements.txt but more structured. It does not matter (to me, at least).

It is definitely not my intent to make the meta packages installer-depended. The meta dependency thing is a reimagination to the common multi-requirements.txt pattern, e.g. you have a requirements.txt, test-requirements.txt, doc-requirements.txt, etc. There are however downsides to having free-form include syntax (-r in requirements.txt) that I wish to address.

My own mental model to this is actually in trees (graphs? I’m not good at data structures). The dependencies in the project collectively form a tree-like structure, and the meta packages are the nodes near (or at) the root to start the traversal process that collects required dependencies.

It might be easier to think this like how the core medata specifies package dependencies. The top-level dependencies key ists everyhing that could be required by the project, but some of them have an extra = marker. The meta dependency thing declares that extra, and specifies what installing the extra would pull into the dependency. And the "" group lists the dependencies that don’t have an extra = marker.

The ; syntax exists to handle a (rather common) conditional dependency case, also mentioned in the thread:

The listing format would need to distinguish between those different versions of the same package, so the syntax is added to address that problem. The scheme can be anything really, but I figured it’s easier if we have a proposal to begin with than let people figure out what they should do (or even jump into an incorrect conclusion this is not possible and the format is doomed :slightly_smiling_face:).

Again, honestly, I think it is entirely in line with what you are describing, at least from how I read it. I genuinely do not get how it seems this way to you. Is it the name “lock file” makes you think it should represent something you have in mind (that I don’t get), which is entirely different from what I intend the proposal to be?


Note: @dstufft posted a comment while I’m writing this. I haven’t read it, but I’ll try to post this first anyway to avoid getting stuck in catching up with new messages without responding. And it doesn’t help I’m already bad at explaining things one at a time.

1 Like

Maybe it would help if you could outline the rough steps you see an installer taking to install the dependencies declared in a lockfile (I don’t mean the low level stuff like downloading a wheel or something). Particularly around a few features:

The second pattern is reserved to support cases where a Python distribution needs to be specified differently depend on the platform. For example, docutils 0.15 only supports Python 3, while Python 2 support is available as 0.15.post1. This pattern allows the lock file to conditionally use docutils@0 for 0.15, and docutils@1 for 0.15.post1.

How does an installer know whether to use the first docutils entry or the second docutils entry? Is it possible to have platforms where no docutils will be installed? If so do I have to make a third docutils entry that is for an empty platform or something?

A valid normalized name surrounded by a pair square brackets, i.e. satisfying regular expression ^\[[a-z0-9][-a-z0-9]*\]$ . A dependency using such key should be a meta-dependency that points optional direct dependencies of the project, similar to Setuptools’s extra_requires entries.

So I presume something like dev requires in Pipenv would map to a [dev] meta-dependency, and thus something like poetry could theoretically install it by doing some poetry incanation to install that extra. However, poetry supports arbitrary extras, so what if someone added a [tests] meta-dependency, how does Pipenv install from that lockfile? Pip doesn’t really have the concept of specifying extras for a requirements.txt file (you can fake it with several named files), so how does pip install a lockfile with a [dev] and a [tests] extra?

extendable for declaring dependencies from alternative package management systems.

I assume this means that instead of a python key, something could have a conda key, or a deb key or something and it’ll specify something that comes from another system. Given that these keys aren’t standardized, how do you see a tool like pip handling a dependency that has a deb key instead of a python key?

Some other questions:

  • What level of portability do we assume is possible for a lock file? Does a single lockfile work for Windows, macOS, and Windows? If it does, do we assume tooling that currently generates a platform specific lockfile will adapt to generate a platform independent one? If it does not, do we assume tooling that generates platform independent lockfiles will stop doing that? If we leave it up to the generator of the lock file to decide, are we expecting tooling to be able to cope with either/or situation?
  • Validations can be empty, is it allowed to require it? Presumably since 1 of N is the threshold to declare something value, if a tool doesn’t support a hash algorithm it should just skip it and move onto the next, but what it it doesn’t support any of the hash algorithms?

Roughly speaking, I’m wondering three major things:

  • How does this actually function?
  • In cases where the feature sets of the involved tooling do not overlap, how do we handle that, particularly when generating from a tool that supports X feature, to installing with one that does not?
  • In cases where the feature sets of the involved tooling does overlap, but their opinions on how to interpret some specific bit of data differs, how do we handle that disparity?
1 Like

Let’s say Pipenv specifies its default group to use the "" meta dependency, and [dev] for the develop group.

On calling pipenv sync, it starts with the "" meta, and recursively collect dependencies:

lock = read_lock_content(filename)

collected = {}
collect_dependency("", collected)

with the implementation:

from packaging.markers import Marker

def collect_dependency(key, collected):
    if key in collected:
        return
    into[key] = current = lock["dependencies"][key]
    for child, marker in current["dependencies"].items():
        if marker and not Marker(marker).evaluate():
            continue
        collect_dependency(child, collected)

and install things in the collected dict.

Since dependencies is the result of a resolution, at most one docutils should be collected here, otherwise the resolution should have failed with a conflict. It is also possible no docutils is installed, if none is visited during the collection. This either means it is not needed on this platform (the marker evaluation excludes it or a dependency requiring it), or it belongs to another group not requested here (e.g. dev).

For pipenv sync --dev, both "" and "[dev]" need to be collected. This would still be duplicate-free if the lock file was generated by Pipenv itself, but the implementation can add additional checks to ensure there are no conflicting dependencies (by comparing the part before ;).

Pipenv can still install from it, and will simply ignore everything only collectable through the [tests] meta-dependency. In pip’s case, it would need to user to tell it what to install. Here’s an interface I think would work:

pip install -l 'path-to.lock.json'  # This installs the "" meta-package.
pip install -l 'path-to.lock.json[dev]'  # This installs both "" and "[dev]" meta-packages.

I’d say pip should error out without installing anything if any of the dependencies it needs to satisfy contains keys other than python and dependencies. The PEP 517-ish idea @dustin thought of sounds interesting, but I have not thought into it to determine whether it would work, or how. I think that would need be a follow-up extension to the format. I’d also say pip is too low-level and shouldn’t support this interop feature even if it ends up getting specified.

(This post is getting long and I need to leave for now. I promise I’ll come back to the other points when I have more time.)

If the format supports locking per platform and per Python version, then both Poetry and pipenv can use it, but it is up to them to choose whether they actually insert the information for all platforms / Python versions.

Yes, but like with Nix, it could be valuable for your users to be able to consume such a Python lock file. Then, when they use it they would still generate a Spack-specific lock file. There is e.g. a tool, poetry2nix that allows building Poetry projects with Nix. Yes, additional information is needed when using extension modules, but other than that it saves a lot of work that is now handled by Poetry.

It’s a matter of choosing what information is to be contained in it. The more goes in, the more usable it can become for more other tools, but it adds a burden. For an initial exchangeable lock format I suggest not including compiler info and such, but who knows, in a couple of years more people want reproducible environments, then that choice can be revisited.

1 Like

I will say that other than standardizing the metadata for projects, this locking/environment concept is the last thing on my list for ‘packaging’-related metadata (after this my personal packaging project left is making sure there are libraries to support all of the PEPs). So I’m not expecting a proliferation of 3 lines in a TOML file to solve a ton of problems as I personally can’t think of others worth trying to standardize or are universal enough to want to standardize.

1 Like

One idea I’m having is to have a field for a lock file to declare platform compatibility. But ultimately this can never be guaranteed due to the state of Python packaging; the tool can have all the intent to produce a platform-agnostic dependency tree, but the result can never be theoretically platform independent. So I think the most practical approach to this is to position this as describing what would happen if your intention is applied on this machine and let tools decide how much it wants to extrapolate the result.

I’d say it should fail by default. We can recommend this in the spec, but honestly I think most users just don’t care about validation enough and ultimately most tools would just grow a --no-validate if hash algorithm support becomes a problem.

Given my primary goal is to handle the every-tool-can-do-this scenario, my suggestion would be to error out if an installer does not support the resolver’s result. And if the different tools differ in how the dependency can be satisfied, the resolver (the tool producing the file) is responsible for describing how each should work, and let the installer (the tool consuming the file) decide which route to choose. For example, numpy can be installed either with pip or Conda, so the resolver can produce something like

{
    "conda": {
        "name": "numpy",
        "source": "anaconda",
        "version": "1.18.1"
    },
    "python": {
        "name": "numpy",
        "source": "pypi",
        "version": "1.18.1"
    },
    "dependencies": {...}
}

to tell the installer it can satisfy this either by using pip or Conda. So the rule of satisfying a dependency should be

  • All of dependencies must be satisfied.
  • At least one of the remaining keys can be satisfied.

And the installer should reject the file as supported otherwise.


All these questions really make me think more deeply about the underlying philosophy behind all the choices :slightly_smiling_face: I think the main thing I have in mind for this is to separate “populating an environment from user request” into a resolver-installer pipeline, and the installer part is really just reduced to simply downloading a thing, and apply it (and only it) into the environment. All the questions about what things to find when to find them, and in what situations, are all answered by the resolver. This may not be how all package manager lock files currently work (which I think is likely also why the term becomes confusing), but it (IMO) the best way to abstract the process and provide an exchange format for tools to understand.

Another thing I want to mention:

I think that’s a good approach. One other thing I think might be possible is to write it as a lossy export format (what the input file would be if we assume certain architecture variables). This might not always be possible (and Spack can say “nope can’t generate that for this environment”), but I imagine it could be usedul when it works (I could ver possibly be wrong since I am not familiar the audience of Spack).

1 Like

It seems agreed that the purpose of the lock file is to enable tools to re-build an environment effectively identical to the original. What if instead the purpose was to simply state what the original environment was?

This would change the mindset of the development of support for the lock file: instead of having the tools worry about the differences between the features of the tools producing and consuming the lock file, you leave that to the developer. The end-user should know more intimately whether pip is good enough to specify their environment, or if more of the environment needs to be specified.

Each tool would put in to the lock file as much of the environment as they track. The lock file would have to support all kinds of markers, which would make it more of an open-ended standard.

This would remove conditions from the lock-file (such as Python version, platform etc), and push the choice of environment on to the end-user.

I imagine this is what currently happens with Poetry and pipenv

This is solved in the fact that the end-user chooses the tool which has the features they need

I don’t. The file I have in mind sits in the middle of the resolver-installer pipeline, and communicate what the resolver thinks the environment should look like for the installer to materialise. What the resolver comes up is an intent, not an actual environment, so there’s no original to replicate. In other words, the lock file is not derived from an environment, but the environment is derived from a lock file.

What you describe would be more like what I call a freeze file (because it is the idea behind pip freeze). It is indeed useful in some cases, but much less so than a lock file, because a lot of the context would be lost in the installation process.

I can say Poetry definitely does not do this. Pipenv does not want to do this either (what it actually does is another issue). I’ve also been trying to clarify the format is exactly not this (and therefore has a chance to satisfy the tools’ needs), but I think I’ve completely failed again, seeing you take away the entirely opposite message :disappointed_relieved:

Isn’t that our current situation? If a “standardized file” produced by, say, Pipenv can only be read and executed properly by Pipenv, then what’s the different between that standardized file and the current use of Pipfile.lock?

Ahh, you’re right. That is what I mean. Sorry for adding to the confusion.

And I’d have said the format you’re describing is a specification. Again, this stems from my confusion on the naming of “lock file”, which your reply immediately cleared up for me. This topic’s quite long, I’d forgotten about your break-down of spec vs intent vs actualisation.

The benefit is standardising the components of the specification which are common between tools, while the tool-specific segments have no guarantee to be acted upon by other tools. A team of app developers may wish to ensure the platform and Python version of their deployed app, but know that their project is pure Python so the developers can use their platform of choice, for example.

Edit: now that I think about that example, however, it might make more sense to explicitly specify that during dev, a minimum Python and platform feature-set is required, but during production, a specific Python version and platform feature-set is required. External developers using tools which don’t honour these dev requirements may get confused as to why they can’t build and/or use the app, and must resort to opening a ticket

1 Like

I would like to contribute two potential requirements for a lock file format to this discussion, which I have not seen mentioned but which seem important to me (apologies in advance if I missed something).

  1. Any lock file format should be designed to be easy to merge by git and similar tools.
  2. It should be possible to define dependency sets which do not include the project itself.

The second point allows us, for example, to have a rich suite of linters with deterministic CI, without constraining the core dependencies of the package. Extras, by contrast, define dependencies on top of the core dependencies. So do the development dependencies of Poetry and (I believe) pipenv.

About the first point: Merge conflicts arise quite frequently with Poetry’s lock files when dependencies change on different branches. By contrast, with a tool like yarn you also regularly get merge conflicts in the lock file (yarn.lock), but in most cases yarn can then figure out the conflict resolution itself. This is much harder to do with Poetry’s lock file format.

4 Likes

Thanks for raising these issues! I think both of them are very relevant. In fact, one of the reasons I chose to explicitly mandate the JSON formatting options (key ordering and indentation rules) in my proposal above is exactly to produce minimal diff output. I will try to remember to put this in as a discussion point on formats.

As for the second point, I think people omit that (maybe unintentionally) since it is a core characteristic to requirements.txt (compared to say Poetry’s pyproject.toml), and a must have if the goal is to replace the format. I proposed the metapackage feature as an attempt to generalise this use case, so the user can refer to these dependency sets by a name.

1 Like

Hi there! I was wondering if there was any new activity on this? I see the PEP 650 but haven’t been able to find anything more in the last 9 months or so on this.

There has been some private discussions going on, but I don’t think things have reached the state to be announced publicly yet. Considering how diversive and opinionated this topic is, it’s feared a premature publication may cause the discussion to diverge too much and impossible to come to a conclusion (like this thread). So… stay tuned :slightly_smiling_face:

3 Likes

That PEP @uranusjr was alluding to has arrived: PEP 665: Specifying Installation Requirements for Python Projects

1 Like