Structured, Exchangeable lock file format (requirements.txt 2.0?)

In an ideal world, yes, but I think we’re already past the point of no return here. The number of users that each individual tool has is non-trivial. I think it’s far more likely that we’ll find a way to standardize a lock file than to agree one a single tool…

I suspect that standardising a lock file format would be easier/quicker than convincing everyone to use a single tool here. (I presume you’d be less comfortable with your own proposal if I told you that we were going to settle on pipenv? :wink:)

But joking aside, yes you do have a point that Python has a somewhat-unique problem in not having a single project management tool of choice. (Although not completely unique - Java and C# among others have no universal tool either as far as I know).

Having various tools have their own native lockfile format, but a mechanism for generating a standard format for deployment, seems like the standard sort of solution for this type of problem. Why would it be unacceptable here? (Tools that wanted to avoid the “generate standard format” step could support the standard natively, but there’s no requirement to do so).

1 Like

And yet, I think people can be willing to move if there is enough incentives to do so. I have seen it first hand with people moving from pipenv to Poetry. And if we tell them that it’s the path forward this might ease the transition. And yes, I must admit it’s not trivial at all to do due to the various workflows people have today due to the lack of real consensus up until now but that’s something that’s worth the effort.

And we somewhat have one in the form of requirements.txt, don’t we?

Introducing yet another format can lead to even more fragmentation than we already have and it would not solve anything.

I feel we (as the two camps) are talking past each other here. It is most definitely not my intention to force all package managers related to Python to use the same lock file format. Spack (for example) definitely should not use the exact same lock file format as pip; this would be a terrible thing to do in a lot of ways. But that’s not the proposal here. This is probably my fault; I’ve been calling the idea a “lock file” (and even name the repo as such), and that likely makes people start at the wrong track right from the beginning. (I’m avoiding the term in this post from now on, hopefully this helps avoid the wrong prepossession.)

To me, a unified dependency description format solves the too-many-formats problems from a different direction. Instead of starting from a package manager and look at what the format should do, it starts from the common scenario there’s a project running on Python and needs to install third-party site packages and works its way back to include all the required information to make this viable. This line up with the problem @dustin is trying to address; all of the package managers can solve this problem, they all solve it the same way (by installing stuff into site-packages), but they all describe that common solution in vastly different ways. This is also the reason why the proposed structure is very flexible with open fields everywhere; package managers can add whatever they want in their lock files to make things work, but as long as the project is still within the install stuff into site-packages boundary, that part can be fulfilled by another package manager. (And if it goes out of bounds, you’ll need specialised tools anyway.)

I think Python is “hurt” by its incredible interoperability here. All the tools listed here are language-specific package managers, but it is incredibly common for people to build a project that leverages components not entirely Python. In an ideal world we can all use one package manager everywhere, but it is incredibly demanding design flexibility and development power to even sniff at that goal. So in practice tools generally do certain things better by trading the ability to do other things cleanly (or at all). The result is that groups of people favour tools that are good at different things. Which is still a good thing (since we are unable to build one thing to fit all needs), but that doesn’t mean we can’t make common ground on the common things.

1 Like

I could get behind a lockfile format with this scope, as at least it would tell me what I need to know to manage things in site-packages. In Spack, we would probably use this to establish constraints and conflicts for pure python packages that we might want to link into a single environment. A spec would certainly help.

I start to worry when we start talking about native components, as this is exactly where we’re trying to provide way more metadata, and where our install model is quite different from wheels. But, if it accomplishes the goal of standardizing all the info I need to reproduce a python/wheel deployment, a standard would be useful because I could at least read the existing format and understand it.

So basically Spack wouldn’t write these lockfiles but maybe we could get some benefit from reading them and including the result in some superset of what is standardized in Python.

This might be a totally insane idea, but what if the “exchangable lock file format” could optionally specify what installer should be used, sort of like PEP517/518 specifies build backends? So pip could act as a “universal” installer, and either:

  • just install from some “pip-standard” lock file format, or:
  • install the specified installer and then invoke it against it’s preferred lock file.

That way:

  • runtime providers only need to concern themselves with a single tool, pip (which they’re already supporting anyways)
  • folks that want to continue using some non-pip installer can continue to do so with no change to their existing workflow
1 Like

I’m going to start off by saying I am supportive of some sort of standard for the same reasons @dustin do: people deploying Python apps currently don’t’ have a solid standard to rely on for this (and I don’t count requirements.txt as that isn’t a spec, it’s a pip feature). Basically I’m fine with Python packaging working towards standardizing artifacts over specific tools.

We did, and it’s called distutils. :wink: And then people weren’t happy with that and so it became setuptools. And people were still not totally happy and so we ended up with our current situation. It isn’t that people haven’t tried to come up with a single tool for this stuff, it’s just people have yet to naturally gravitate towards a single tool.

Now if we standardize the artifacts tools consume and produce, then people can much more easily start gravitating towards a single tool as movement would be easier than it currently is.

Once again, some would say we did when packaging.python.org started suggesting pipenv. :wink:

Or even more insane, just have people specify the installer and not even attempt the standardized lock file? How would that look/operate? PEP 518-like declaration of what the installer is and a PEP 517-like declaration of what to call? How does this look to e.g. pipx and Heroku or Azure App Service? I guess in both cases:

  1. A temp virtual environment would be created
  2. The installer would be installed
  3. The destination environment would also be created (if necessary)T
  4. The installer would install into the target environment.

With a standardized lock file we can skip the first two steps. So I think that if we can come up with a standardized lock file it would be great, but Dustin’s “insane” idea might work as a fallback if we can’t reach consensus.

But they were never meant to provide a consistent experience like modern tools (like the ones I mentioned before) do. And the lack of determinism was the biggest problem in my opinion.

I don’t think introducing yet another format is the way to go and will only lead to even more fragmentation, which is not something we need at the moment.

Well, I have my opinion about pipenv but it at least proved that people are willing to gravitate and work towards a single tool with enough incentives and it makes their lives easier. That’s also why Poetry is gaining traction: it provides a consistent workflow for Python projects from start to finish. There is a demand for that and dismissing it won’t lead us anywhere.

I really think most people don’t care if we end up standardizing a lock file (which might not even happen for the reasons I mentioned above), what they want is for things to work, nothing more, the rest is just implementation detail. I value the time of developers (which includes my time) and the lock file is not really the pain point here. The pain point is consistency and ease of use: how do I bootstrap my project? how do I add/remove a dependency to my project? how do I publish my project? That’s what people care about, that’s what I care about.

I don’t think introducing PEP-517-like hooks everywhere will help but it will certainly lead to more clutter, especially if it’s the answer to every attempt at standardizing anything (I remember seeing it in the discussions around the metadata standardization with the proposed idea of a metadata provider).

I understand that I may be in the minority here to hope for a single, standard tool (but extensible like I will do with poetry by introducing a plugin system) but to me it’s the way forward. Maybe I am naive in thinking it’s even remotely possible but I am willing to try anyway.

1 Like

Very strong +1 on this (but with caveats, see below).

I believe that it would be good if that were the case. However, I don’t think that this is going to be a quick path, and the Python community has been burned in the past by prematurely declaring that we have one standard tool (distutils, pipenv to name a couple of examples). You obviously believe that poetry can be that tool, and hope that it will become so relatively quickly, but there’s definitely still some way to go there. Also, I don’t think interoperability questions go away just because there’s only one key tool at one end of the communication process. (After all, people could still want to write custom scripts to write or process environment specifications).

And in any case, “defined by the one standard tool” is still “implementation defined”, and that’s what we are trying to avoid here.

Some means of describing “this is the environment to deploy” is a requirement that people are hitting now, and “wait for one tool to take over the world” seems like a difficult solution to sell. So for the immediate term, I think this work has value (and once we have a defined format, tools can adopt it saving effort in the future).

And just to be clear, requirements.txt isn’t a usable solution here - it’s tightly tied to pip’s implementation (indexes are specified using pip options!) and underspecified in many ways (offhand, I don’t know what the required encoding for the file would be, for example). Even if people wanted to make requirements.txt the solution, it would still need standardising.

I get the desire not to over-engineer things, but I think this is a case where there’s a clear need, and a strong requirement for a standard to handle interoperability in a well-defined way. We can debate the specifics of any given format, but we ultimately need to write up and agree a format one way or another.

1 Like

For whatever it’s worth, I think I agree with @sdispater here. I’ve found this idea offputting from the start and I’ve never really been quite able to find the words to really describe what I don’t like about it. It entirely feels like we’re trying to define something that, to me, feels obviously like it should be an installer specific concern.

Ultimately, I do not think that the lockfile can actually be truly tool independent. Different tools want sufficiently different things that in order to make the hypothetical lock file actually able to support all of various use cases, that we have to make it so flexible that you’re going to get different results if you install the same lock file with pip versus poetry versus spack vs whatever. If you can get different results, then the standardization is basically useless, because platforms can’t just say “oh hey I understand this lockfile, I don’t need to worry about what installer to use I can just use my standard one”.

1 Like

That’s a fair point, and I’m not immediately against it. However, I do think that there are two things we should take into account.

  1. Going back to the title of this issue, the requirements.txt file is a de facto standard right now for how people describe environments. And it’s a rubbish format in all of the ways we’ve objected to in other areas - it’s implementation-defined, under-specified and has weird edge cases. A standardisation effort that replaces the current requirements.txt format with a new, independent format backed by a common library (packaging, probably) seems like 100% a good idea to me. I’m happy to concede that maybe the discussion has ended up over-engineering the solution (maybe as a result of taking too many ideas from pipfile/pipfile.lock and cargo.lock?) But a requirements file format is absolutely fair game for standardisation.
  2. Doing nothing, and hoping that we standardise on a single tool, is not a good way of solving the N * M formats issue. We should have a common transport format for “environment specifications”. Even if the only effect is to convert N * M to N + M, and nothing uses the transport format as native, that’s still a significant interoperability benefit (unless you cling to the idea that one of N or M is going to equal 1).

So sure, “standard lock file format that everyone uses” may be an inappropriate goal (I don’t have an opinion on that, really) but "standardising requirements.txt" and “common transport format for describing environments” are both reasonable goals, with real benefits for users.

1 Like

To be honest, I don’t really even understand the problem that this is trying to solve. Like putting aside the deficiency inherent in requirements.txt (because things like under specified, weird cases, etc are not problems that require some common interchange format.

As stated, (and honestly the only problem I can think of that this actually even could solve) is if someone setup a project using say, poetry, and then they want to install it again using pip. I don’t honestly think that’s something we can really meaningfully solve with a lock file without making the file so free form / generic that it isn’t actually usable in that form for anything but the most trivial case.

Here’s an example, Pipenv only supports “things I depend on” and “things i depend on in development” while poetry supports an arbitrary number of named grouping of sub dependencies (via extras). So I create a lockfile in poetry that has a “tests” extra for my current project. Then I try to install that with Pipenv, and it does… what? ignores the tests extra? installs them anyways? Then we have pip whose “lockfile” support (via requirements.txt) doesn’t support any concept of named groups of dependencies, it just installs everything, what does it do?

Another example (that @sdispater mentioned) Pipenv’s lock file it generates is tied to the platform it’s currently running on. Pip’s is… well it doesn’t generate a lockfile at all, so sometimes it’s tied to the platform it’s running on (if someone just created it with pip freeze) sometimes it’s not (if they carefully curated a requirement.txt with appropriate environment markers), and it sounds like poetry will resolve the full set of dependencies, independent of what OS is being used so that the output of the resolver is deterministic even across OSs. So what does poetry do if it tries to consume a Pipenv lockfile that is missing an OS specific dependency?

Fundamentally, this feels like standardization for the sake of standardization to me. Either we pair this hypothetical lock file down so far that it means anything but a bare bones installer can’t support it, and thus we’re not actually achieving anything or we leave it so wide open that it’s standardized in name, but in practice the meaning of the contents is so unstandardized that consuming the file is still tool specific anyways.

To me this is sort of like how poetry says it uses the “standard” pyproject.toml file, which means in reality 99% of it’s use of that file is in the “dump a bunch of implementation specific stuff in this name space” portion, where there is no real meaningful difference in it using the “standard” file, and using a completely custom poetry.toml file, except in that the word standard is being used kind of as an implicit marker of “good for some unspecified reason”.

3 Likes

So thinking about this more, I can think of one way to possibly make it half way workable-- which is to give up on the idea that many (if any) of the tools are going to use this as their native lock file, and instead attempt to minimize the things this does, to arrive at some basic common subset that all of the tools can produce as an artifact that platforms like Heroku/Google/etc could potentially consume for creating an environment. I’m struggling to articulate what this would be exactly, but it’s more like, an interchange format for a list of things to install, rather than a lock file

This would… probably look pretty similiar to requirements.txt. A list of things to install, some validation around ensuring that you got the correct package (hashes etc), possibly some information about the specific source of packages (likely defaulting to PyPI, etc). Ideally we’d have something a bit more structured than requirements.txt, but that’s just bikeshedding.

What I do not think it would have, is “flexibility”. Flexibility here effectively just means implementation defined behavior, which is to antithesis of an exchange format.

Looking at what is mentioned in the lock-file repository by @uranusjr.

  • The “meta” dependency feature probably needs to be removed, it goes beyond listing what should be installed into an environment, into trying to implement an actual lock file that depends on features of the installer.
  • The weird ; syntax which I don’t even really understand how it’s supposed to work probably needs removed or it needs more explanation how it actually functions. How does an installer pick between two different listings for the same dependency?
  • I’m super confused by the "" feature depdency… isn’t the dependencies of the current project listed under the top level "dependency" key? Why would you list them under "" too? If not then what are the top level keys for?

Actually, most of this entire document seems really like it doesn’t implement the thing I’m trying to suggest at all (which is fair, it wasn’t trying to), and it doesn’t seem to be implementing what @dustin and @brettcannon are looking for either. How is this even envisioned to be working? How would a tool like pip install from this lock file, even in the case that it’s currently designed for? Having gone through and read the entire spec extensively now, I’m even more confused what problem it’s even attempting to solve.

The problem I see as useful to solve (and the reason I’ve started describing it as a “way of describing environments”) is providing a way for something to dump out a spec of an environment that can be used later by possibly a different tool to recreate that same “intended” environment.

That’s something that providers like Heroku currently want to do, and they typically rely on requirements files. Which suck (ill-defined, etc) but are in practical terms something that you can say pip install -r requirements.txt and get something.

But apparently providers are getting asked to support pipenv lockfiles, and then poetry lockfiles. If I were a provider, I’d want Python to decide on a single format. And as an end user, I wouldn’t particularly want Heroku to design it, as then I’d have to redo it when I switched to PythonAnywhere. Etc.

I’m pretty sure that’s the key problem that’s worth solving in this area.

Yes, IMO that was a mistake in hindsight. The attraction of “one config file to rule them all” led us to have a “… and a bunch of non-standardised stuff” get-out clause. Let’s not do that again.

I think this is possibly a problem worth solving, but I don’t think it has anything to do with lock files. We kinda of crossed paths while typing our messages up :wink: but that is basically exactly what I meant in my last message. requirements.txt isn’t really a lockfile (although you can kind of force it into a lock file like shape if you squint your eyes and mush it around enough), it’s really kind of exactly what we’re looking for here, just a list of things to install, it’s just kind of weird and funky.

Hypothetically, we could “solve” this problem by specifying requirements.txt and saying that we expect tools like poetry, Pipenv, etc to have some mechanism to reflect out a requirements.txt as some sort of a deployment artifact. We wouldn’t then expect tools like poetry, pipenv, etc to start ingesting requirements.txt or using them in place of Pipfile.lock or poetry.lock.

Note: I’m not suggesting that the right answer is actually just use requirements.txt, but rather that’s sort of a useful way to frame it in my mind. That we’re not actually trying to replace Pipfile.lock or poetry.lock or whatever, but just defining what is effectively just a build artifact in a deployment pipeline.

I don’t actually think it was a mistake. I think it’s worked fine (other than some issues where people started using it really quickly for black that collided with a choice in pip to change behavior based on the mere existence of the file, but I don’t think that’s really a failing of that choice to add the tools section). I’m just pointing out that defining a standard that’s ultimately so flexibile where things are actually implementation specific, doesn’t make interoperability magically happen. Pip does not and will never understand [tool.poetry.*]. That’s fine because we don’t pretend that it will or that calling pyproject.toml a standard means it will.

I can give you two problems that I live with regularly which impact users (and obviously make my life more difficult :wink:).

One is deploying to a PaaS/serverless platform (this is more general, but this is specific scenario is the one I have lived through). If that service wanted to install dependencies on behalf of the user, how are they to do that today? The best they can do is add support for every tool that users may want, or try to force users to a single tool. Obviously a “vendor lock-in” version of forcing users to a specific tool is not exactly a great result when the community has not come to an agreement. And so you might say that perhaps people should bundle their dependencies with their code? OK, how do you do that with Poetry? Pipenv? Pip? You’re once again back to documenting and trying to support users by teaching them how to use their tool to install dependencies for potentially a different platform in order to deploy them to their production system (and this doesn’t have to be cloud-specific; Docker or any other system where your dev OS differs from production plays into this).

Two, how do editors install and manage your dev requirements for you? For VS Code we have to manually add support for every tool where we want to help users install e.g. a linter or formatter. And if we ever add support to help walk users through setting up a development environment we will need to support installing all dev requirements which will once again be tool-specific. And I know Poetry is bumping up against this because we have not gotten around to supporting it fully in the Python extension for VS Code (it’s on our roadmap, BTW).

How is that not a lock file? I’m curious as to what your definition of a lock file is compared to a list of packages to install that are specified to a specific version?

I’ll also note that Installing dependencies in containerized applications has been announced as a tool that reads the various lock file formats we have going in the community and tries to abstract them out for orchestration purposes. The fact we need a tool for that I think plays into this discussion. :slight_smile:

I think there’s an important distinction to make here. In Spack we talk about abstract and concrete specifications.

Abstract Specs

An abstract spec is only partially constrained. It has the names of packages you want, maybe some features, versions, compilers, and other preferences. That’s what the developer tells you they “require” to set up the environment.

Concrete Specs

A concrete spec has everything. Lockfiles are concrete. They have the names, hashes, versions, etc. of packages and dependencies, and they can very well be tied to particular environments, platforms, resolution algorithms, etc.

Reproducibility vs. portability

Which one you use depends on how you want an environment to be reproduced. The abstract spec is more portable but less reproducible, because a different resolver or platform can affect what you get. The lockfile lets you produce exactly what you got, but it may not work at all if you change the OS/arch/python version/etc.

I see a use for both of these types of reproducibility. Sometimes you just want the app to be built how it needs to be built for the environment (abstract). But if you want to avoid surprises, and you know you’ll be in the same environment, you want a lockfile to reproduce things exactly. Or maybe several lockfiles, if you deploy to multiple environments, but don’t want churn in any one of them.

Spack environments have an abstract spack.yaml and a concrete, generated spack.lock, described here, and you can set up an environment from either. Both have their uses.

Making a “minimal” spec

If you want to trim this down to a “minimal” specification, I think you really need to define how “concrete” you want the “standard” lockfile to be. What attributes should be included, where are they expected to be valid, do they depend on a particular resolver, etc.

Spack’s format has a lot more that I think you want to handle here – compilers, architectures, flags, build options, etc., and it’s very tied to particular platforms. To be honest, I think that stuff is very much needed when you talk about native dependencies, but if you can rely on a spec like manylinux to provide most of the assumptions, then maybe you can dispense with a lot of it.

For pure Python, I think packages, versions, and options are probably sufficient and useful for a lockfile spec. But maybe the spec should standardize some abstract format (i.e., a better requirements.txt for portability) as well as the lockfile.

I still think there are going to be OS-dependent/resolution-sensitive things in a pure python lockfile (as @dstufft mentioned). So it might be worth saying in the spec when that will happen and when the reproducibility guarantee isn’t cross-platform for pure Python stuff. Or maybe the lockfile should mark parts that are OS-sensitive so that a tool can either require the same OS, or try to re-resolve them (which is quite hard).

2 Likes

This I think is solvable using some list of things to install (more on this later).

This I do not think is solvable. Poetry is almost certainly going to expect that if you add a new dependency, it gets added under the [tool.poetry] section of pyproject.toml. Pipenv is almost certainly going to expect that if you add a new dev dependency, it gets added to Pipfile not to some hypothetical lock file. Unless you just mean “we want a list of dev dependencies”, which is roughly the same thing as the first case, just with a qualifier as to what kind of dependencies you want.

I’m actually struggling to try and put to words what I’m trying to convey here. To my mind, a lock file doesn’t describe a list of things to install, it describes the state of the world at the point the lockfile was created. This means that, given a deterministic resolver, resolving the same set of dependencies will always resolve to the same set.

I’ve carefully worded that, because an important thing here I think is the ability to include things that the resolver might not actually take into account (e.g. extra packages it doesn’t need). Hypothetically, a lock file could contain a complete snapshot of the entirety of PyPI at the time of creation and the end result would still be the same.

Different implementations of a lock file could take this to varying degrees, such as “locking” in specific files it used, or recording the end result of the resolver (such that this lockfile is only valid on a specific platform) or by attempting to exhaustively resolve all the combinations of conditional dependencies to include.

This idea is a little bit strained, because some lock files can be implemented as a list of packages to install if the features of the installer are sufficiently simple, but makes more sense when you start thinking of more complex installer features. Like Pipenv will resolve the fully set of dependencies, as if you specified to isntall the development dependencies, even if you didn’t ask for that and will reflect all of them into the lock file. I assume that poetry does something similiar.

Part of honestly though is I think to actually be a replacement for poetry’s lock file, Pipenv’s lockfile, Spack’s lockfile, etc is it has to actually support through some mechanism all of the features that each of those tools have. However as soon as you start to add support for those features, you either mandate that all tools support those features (thus making the lowest common denominator the superset of features) OR you end up in a weird situation where the tools are using the same format on the surface, but uses of that format aren’t actually interchangeable because properly using said format relies on interpreting implementation specific data inside that format. I think the former is unresolvable (you’re never going to get every tool to agree to the same set of features, if you did we wouldn’t have multiple tools) and the latter puts users in a really bad place where we claim to have this interoptable standard, but it’s not really interoptable because to actually use it requires relying on implementation specific details. So given that I don’t think a replacement for the various lockfiles is meaningfully possible, this “it’s not a lockfile I swear” is largely an attempt to get the same benefit in the one major use case I can see for an interoptable lockfile (I run a platform and want a way to describe the dependencies you need me to install) by treating that as a distinct artifact.