Structured, Exchangeable lock file format (requirements.txt 2.0?)

dustin · April 20, 2020, 9:43pm

This might be a totally insane idea, but what if the “exchangable lock file format” could optionally specify what installer should be used, sort of like PEP517/518 specifies build backends? So pip could act as a “universal” installer, and either:

just install from some “pip-standard” lock file format, or:
install the specified installer and then invoke it against it’s preferred lock file.

That way:

runtime providers only need to concern themselves with a single tool, pip (which they’re already supporting anyways)
folks that want to continue using some non-pip installer can continue to do so with no change to their existing workflow

brettcannon · April 21, 2020, 12:34am

I’m going to start off by saying I am supportive of some sort of standard for the same reasons @dustin do: people deploying Python apps currently don’t’ have a solid standard to rely on for this (and I don’t count requirements.txt as that isn’t a spec, it’s a pip feature). Basically I’m fine with Python packaging working towards standardizing artifacts over specific tools.

We did, and it’s called distutils. And then people weren’t happy with that and so it became setuptools. And people were still not totally happy and so we ended up with our current situation. It isn’t that people haven’t tried to come up with a single tool for this stuff, it’s just people have yet to naturally gravitate towards a single tool.

Now if we standardize the artifacts tools consume and produce, then people can much more easily start gravitating towards a single tool as movement would be easier than it currently is.

Once again, some would say we did when packaging.python.org started suggesting pipenv.

Or even more insane, just have people specify the installer and not even attempt the standardized lock file? How would that look/operate? PEP 518-like declaration of what the installer is and a PEP 517-like declaration of what to call? How does this look to e.g. pipx and Heroku or Azure App Service? I guess in both cases:

A temp virtual environment would be created
The installer would be installed
The destination environment would also be created (if necessary)T
The installer would install into the target environment.

With a standardized lock file we can skip the first two steps. So I think that if we can come up with a standardized lock file it would be great, but Dustin’s “insane” idea might work as a fallback if we can’t reach consensus.

sdispater · April 21, 2020, 8:09am

But they were never meant to provide a consistent experience like modern tools (like the ones I mentioned before) do. And the lack of determinism was the biggest problem in my opinion.

I don’t think introducing yet another format is the way to go and will only lead to even more fragmentation, which is not something we need at the moment.

Well, I have my opinion about pipenv but it at least proved that people are willing to gravitate and work towards a single tool with enough incentives and it makes their lives easier. That’s also why Poetry is gaining traction: it provides a consistent workflow for Python projects from start to finish. There is a demand for that and dismissing it won’t lead us anywhere.

I really think most people don’t care if we end up standardizing a lock file (which might not even happen for the reasons I mentioned above), what they want is for things to work, nothing more, the rest is just implementation detail. I value the time of developers (which includes my time) and the lock file is not really the pain point here. The pain point is consistency and ease of use: how do I bootstrap my project? how do I add/remove a dependency to my project? how do I publish my project? That’s what people care about, that’s what I care about.

I don’t think introducing PEP-517-like hooks everywhere will help but it will certainly lead to more clutter, especially if it’s the answer to every attempt at standardizing anything (I remember seeing it in the discussions around the metadata standardization with the proposed idea of a metadata provider).

I understand that I may be in the minority here to hope for a single, standard tool (but extensible like I will do with poetry by introducing a plugin system) but to me it’s the way forward. Maybe I am naive in thinking it’s even remotely possible but I am willing to try anyway.

pf_moore · April 21, 2020, 9:40am

Very strong +1 on this (but with caveats, see below).

I believe that it would be good if that were the case. However, I don’t think that this is going to be a quick path, and the Python community has been burned in the past by prematurely declaring that we have one standard tool (distutils, pipenv to name a couple of examples). You obviously believe that poetry can be that tool, and hope that it will become so relatively quickly, but there’s definitely still some way to go there. Also, I don’t think interoperability questions go away just because there’s only one key tool at one end of the communication process. (After all, people could still want to write custom scripts to write or process environment specifications).

And in any case, “defined by the one standard tool” is still “implementation defined”, and that’s what we are trying to avoid here.

Some means of describing “this is the environment to deploy” is a requirement that people are hitting now, and “wait for one tool to take over the world” seems like a difficult solution to sell. So for the immediate term, I think this work has value (and once we have a defined format, tools can adopt it saving effort in the future).

And just to be clear, requirements.txt isn’t a usable solution here - it’s tightly tied to pip’s implementation (indexes are specified using pip options!) and underspecified in many ways (offhand, I don’t know what the required encoding for the file would be, for example). Even if people wanted to make requirements.txt the solution, it would still need standardising.

I get the desire not to over-engineer things, but I think this is a case where there’s a clear need, and a strong requirement for a standard to handle interoperability in a well-defined way. We can debate the specifics of any given format, but we ultimately need to write up and agree a format one way or another.

dstufft · April 21, 2020, 2:33pm

For whatever it’s worth, I think I agree with @sdispater here. I’ve found this idea offputting from the start and I’ve never really been quite able to find the words to really describe what I don’t like about it. It entirely feels like we’re trying to define something that, to me, feels obviously like it should be an installer specific concern.

Ultimately, I do not think that the lockfile can actually be truly tool independent. Different tools want sufficiently different things that in order to make the hypothetical lock file actually able to support all of various use cases, that we have to make it so flexible that you’re going to get different results if you install the same lock file with pip versus poetry versus spack vs whatever. If you can get different results, then the standardization is basically useless, because platforms can’t just say “oh hey I understand this lockfile, I don’t need to worry about what installer to use I can just use my standard one”.

pf_moore · April 21, 2020, 4:05pm

That’s a fair point, and I’m not immediately against it. However, I do think that there are two things we should take into account.

Going back to the title of this issue, the requirements.txt file is a de facto standard right now for how people describe environments. And it’s a rubbish format in all of the ways we’ve objected to in other areas - it’s implementation-defined, under-specified and has weird edge cases. A standardisation effort that replaces the current requirements.txt format with a new, independent format backed by a common library (packaging, probably) seems like 100% a good idea to me. I’m happy to concede that maybe the discussion has ended up over-engineering the solution (maybe as a result of taking too many ideas from pipfile/pipfile.lock and cargo.lock?) But a requirements file format is absolutely fair game for standardisation.
Doing nothing, and hoping that we standardise on a single tool, is not a good way of solving the N * M formats issue. We should have a common transport format for “environment specifications”. Even if the only effect is to convert N * M to N + M, and nothing uses the transport format as native, that’s still a significant interoperability benefit (unless you cling to the idea that one of N or M is going to equal 1).

So sure, “standard lock file format that everyone uses” may be an inappropriate goal (I don’t have an opinion on that, really) but "standardising requirements.txt" and “common transport format for describing environments” are both reasonable goals, with real benefits for users.

dstufft · April 21, 2020, 5:33pm

To be honest, I don’t really even understand the problem that this is trying to solve. Like putting aside the deficiency inherent in requirements.txt (because things like under specified, weird cases, etc are not problems that require some common interchange format.

As stated, (and honestly the only problem I can think of that this actually even could solve) is if someone setup a project using say, poetry, and then they want to install it again using pip. I don’t honestly think that’s something we can really meaningfully solve with a lock file without making the file so free form / generic that it isn’t actually usable in that form for anything but the most trivial case.

Here’s an example, Pipenv only supports “things I depend on” and “things i depend on in development” while poetry supports an arbitrary number of named grouping of sub dependencies (via extras). So I create a lockfile in poetry that has a “tests” extra for my current project. Then I try to install that with Pipenv, and it does… what? ignores the tests extra? installs them anyways? Then we have pip whose “lockfile” support (via requirements.txt) doesn’t support any concept of named groups of dependencies, it just installs everything, what does it do?

Another example (that @sdispater mentioned) Pipenv’s lock file it generates is tied to the platform it’s currently running on. Pip’s is… well it doesn’t generate a lockfile at all, so sometimes it’s tied to the platform it’s running on (if someone just created it with pip freeze) sometimes it’s not (if they carefully curated a requirement.txt with appropriate environment markers), and it sounds like poetry will resolve the full set of dependencies, independent of what OS is being used so that the output of the resolver is deterministic even across OSs. So what does poetry do if it tries to consume a Pipenv lockfile that is missing an OS specific dependency?

Fundamentally, this feels like standardization for the sake of standardization to me. Either we pair this hypothetical lock file down so far that it means anything but a bare bones installer can’t support it, and thus we’re not actually achieving anything or we leave it so wide open that it’s standardized in name, but in practice the meaning of the contents is so unstandardized that consuming the file is still tool specific anyways.

To me this is sort of like how poetry says it uses the “standard” pyproject.toml file, which means in reality 99% of it’s use of that file is in the “dump a bunch of implementation specific stuff in this name space” portion, where there is no real meaningful difference in it using the “standard” file, and using a completely custom poetry.toml file, except in that the word standard is being used kind of as an implicit marker of “good for some unspecified reason”.

dstufft · April 21, 2020, 6:44pm

So thinking about this more, I can think of one way to possibly make it half way workable-- which is to give up on the idea that many (if any) of the tools are going to use this as their native lock file, and instead attempt to minimize the things this does, to arrive at some basic common subset that all of the tools can produce as an artifact that platforms like Heroku/Google/etc could potentially consume for creating an environment. I’m struggling to articulate what this would be exactly, but it’s more like, an interchange format for a list of things to install, rather than a lock file

This would… probably look pretty similiar to requirements.txt. A list of things to install, some validation around ensuring that you got the correct package (hashes etc), possibly some information about the specific source of packages (likely defaulting to PyPI, etc). Ideally we’d have something a bit more structured than requirements.txt, but that’s just bikeshedding.

What I do not think it would have, is “flexibility”. Flexibility here effectively just means implementation defined behavior, which is to antithesis of an exchange format.

Looking at what is mentioned in the lock-file repository by @uranusjr.

The “meta” dependency feature probably needs to be removed, it goes beyond listing what should be installed into an environment, into trying to implement an actual lock file that depends on features of the installer.
The weird ; syntax which I don’t even really understand how it’s supposed to work probably needs removed or it needs more explanation how it actually functions. How does an installer pick between two different listings for the same dependency?
I’m super confused by the "" feature depdency… isn’t the dependencies of the current project listed under the top level "dependency" key? Why would you list them under "" too? If not then what are the top level keys for?

Actually, most of this entire document seems really like it doesn’t implement the thing I’m trying to suggest at all (which is fair, it wasn’t trying to), and it doesn’t seem to be implementing what @dustin and @brettcannon are looking for either. How is this even envisioned to be working? How would a tool like pip install from this lock file, even in the case that it’s currently designed for? Having gone through and read the entire spec extensively now, I’m even more confused what problem it’s even attempting to solve.

pf_moore · April 21, 2020, 6:44pm

The problem I see as useful to solve (and the reason I’ve started describing it as a “way of describing environments”) is providing a way for something to dump out a spec of an environment that can be used later by possibly a different tool to recreate that same “intended” environment.

That’s something that providers like Heroku currently want to do, and they typically rely on requirements files. Which suck (ill-defined, etc) but are in practical terms something that you can say pip install -r requirements.txt and get something.

But apparently providers are getting asked to support pipenv lockfiles, and then poetry lockfiles. If I were a provider, I’d want Python to decide on a single format. And as an end user, I wouldn’t particularly want Heroku to design it, as then I’d have to redo it when I switched to PythonAnywhere. Etc.

I’m pretty sure that’s the key problem that’s worth solving in this area.

Yes, IMO that was a mistake in hindsight. The attraction of “one config file to rule them all” led us to have a “… and a bunch of non-standardised stuff” get-out clause. Let’s not do that again.

dstufft · April 21, 2020, 7:07pm

pf_moore:

The problem I see as useful to solve (and the reason I’ve started describing it as a “way of describing environments”) is providing a way for something to dump out a spec of an environment that can be used later by possibly a different tool to recreate that same “intended” environment.

That’s something that providers like Heroku currently want to do, and they typically rely on requirements files. Which suck (ill-defined, etc) but are in practical terms something that you can say pip install -r requirements.txt and get something .

But apparently providers are getting asked to support pipenv lockfiles, and then poetry lockfiles. If I were a provider, I’d want Python to decide on a single format. And as an end user, I wouldn’t particularly want Heroku to design it, as then I’d have to redo it when I switched to PythonAnywhere. Etc.

I’m pretty sure that’s the key problem that’s worth solving in this area.

I think this is possibly a problem worth solving, but I don’t think it has anything to do with lock files. We kinda of crossed paths while typing our messages up but that is basically exactly what I meant in my last message. requirements.txt isn’t really a lockfile (although you can kind of force it into a lock file like shape if you squint your eyes and mush it around enough), it’s really kind of exactly what we’re looking for here, just a list of things to install, it’s just kind of weird and funky.

Hypothetically, we could “solve” this problem by specifying requirements.txt and saying that we expect tools like poetry, Pipenv, etc to have some mechanism to reflect out a requirements.txt as some sort of a deployment artifact. We wouldn’t then expect tools like poetry, pipenv, etc to start ingesting requirements.txt or using them in place of Pipfile.lock or poetry.lock.

Note: I’m not suggesting that the right answer is actually just use requirements.txt, but rather that’s sort of a useful way to frame it in my mind. That we’re not actually trying to replace Pipfile.lock or poetry.lock or whatever, but just defining what is effectively just a build artifact in a deployment pipeline.

I don’t actually think it was a mistake. I think it’s worked fine (other than some issues where people started using it really quickly for black that collided with a choice in pip to change behavior based on the mere existence of the file, but I don’t think that’s really a failing of that choice to add the tools section). I’m just pointing out that defining a standard that’s ultimately so flexibile where things are actually implementation specific, doesn’t make interoperability magically happen. Pip does not and will never understand [tool.poetry.*]. That’s fine because we don’t pretend that it will or that calling pyproject.toml a standard means it will.

brettcannon · April 21, 2020, 7:33pm

I can give you two problems that I live with regularly which impact users (and obviously make my life more difficult ).

One is deploying to a PaaS/serverless platform (this is more general, but this is specific scenario is the one I have lived through). If that service wanted to install dependencies on behalf of the user, how are they to do that today? The best they can do is add support for every tool that users may want, or try to force users to a single tool. Obviously a “vendor lock-in” version of forcing users to a specific tool is not exactly a great result when the community has not come to an agreement. And so you might say that perhaps people should bundle their dependencies with their code? OK, how do you do that with Poetry? Pipenv? Pip? You’re once again back to documenting and trying to support users by teaching them how to use their tool to install dependencies for potentially a different platform in order to deploy them to their production system (and this doesn’t have to be cloud-specific; Docker or any other system where your dev OS differs from production plays into this).

Two, how do editors install and manage your dev requirements for you? For VS Code we have to manually add support for every tool where we want to help users install e.g. a linter or formatter. And if we ever add support to help walk users through setting up a development environment we will need to support installing all dev requirements which will once again be tool-specific. And I know Poetry is bumping up against this because we have not gotten around to supporting it fully in the Python extension for VS Code (it’s on our roadmap, BTW).

brettcannon · April 21, 2020, 7:35pm

How is that not a lock file? I’m curious as to what your definition of a lock file is compared to a list of packages to install that are specified to a specific version?

brettcannon · April 21, 2020, 7:38pm

I’ll also note that Installing dependencies in containerized applications has been announced as a tool that reads the various lock file formats we have going in the community and tries to abstract them out for orchestration purposes. The fact we need a tool for that I think plays into this discussion.

tgamblin · April 21, 2020, 7:43pm

I think there’s an important distinction to make here. In Spack we talk about abstract and concrete specifications.

Abstract Specs

An abstract spec is only partially constrained. It has the names of packages you want, maybe some features, versions, compilers, and other preferences. That’s what the developer tells you they “require” to set up the environment.

Concrete Specs

A concrete spec has everything. Lockfiles are concrete. They have the names, hashes, versions, etc. of packages and dependencies, and they can very well be tied to particular environments, platforms, resolution algorithms, etc.

Reproducibility vs. portability

Which one you use depends on how you want an environment to be reproduced. The abstract spec is more portable but less reproducible, because a different resolver or platform can affect what you get. The lockfile lets you produce exactly what you got, but it may not work at all if you change the OS/arch/python version/etc.

I see a use for both of these types of reproducibility. Sometimes you just want the app to be built how it needs to be built for the environment (abstract). But if you want to avoid surprises, and you know you’ll be in the same environment, you want a lockfile to reproduce things exactly. Or maybe several lockfiles, if you deploy to multiple environments, but don’t want churn in any one of them.

Spack environments have an abstract spack.yaml and a concrete, generated spack.lock, described here, and you can set up an environment from either. Both have their uses.

Making a “minimal” spec

If you want to trim this down to a “minimal” specification, I think you really need to define how “concrete” you want the “standard” lockfile to be. What attributes should be included, where are they expected to be valid, do they depend on a particular resolver, etc.

Spack’s format has a lot more that I think you want to handle here – compilers, architectures, flags, build options, etc., and it’s very tied to particular platforms. To be honest, I think that stuff is very much needed when you talk about native dependencies, but if you can rely on a spec like manylinux to provide most of the assumptions, then maybe you can dispense with a lot of it.

For pure Python, I think packages, versions, and options are probably sufficient and useful for a lockfile spec. But maybe the spec should standardize some abstract format (i.e., a better requirements.txt for portability) as well as the lockfile.

I still think there are going to be OS-dependent/resolution-sensitive things in a pure python lockfile (as @dstufft mentioned). So it might be worth saying in the spec when that will happen and when the reproducibility guarantee isn’t cross-platform for pure Python stuff. Or maybe the lockfile should mark parts that are OS-sensitive so that a tool can either require the same OS, or try to re-resolve them (which is quite hard).

dstufft · April 21, 2020, 8:49pm

brettcannon:

One is deploying to a PaaS/serverless platform (this is more general, but this is specific scenario is the one I have lived through). If that service wanted to install dependencies on behalf of the user, how are they to do that today? The best they can do is add support for every tool that users may want, or try to force users to a single tool. Obviously a “vendor lock-in” version of forcing users to a specific tool is not exactly a great result when the community has not come to an agreement. And so you might say that perhaps people should bundle their dependencies with their code? OK, how do you do that with Poetry? Pipenv? Pip? You’re once again back to documenting and trying to support users by teaching them how to use their tool to install dependencies for potentially a different platform in order to deploy them to their production system (and this doesn’t have to be cloud-specific; Docker or any other system where your dev OS differs from production plays into this).

This I think is solvable using some list of things to install (more on this later).

This I do not think is solvable. Poetry is almost certainly going to expect that if you add a new dependency, it gets added under the [tool.poetry] section of pyproject.toml. Pipenv is almost certainly going to expect that if you add a new dev dependency, it gets added to Pipfile not to some hypothetical lock file. Unless you just mean “we want a list of dev dependencies”, which is roughly the same thing as the first case, just with a qualifier as to what kind of dependencies you want.

I’m actually struggling to try and put to words what I’m trying to convey here. To my mind, a lock file doesn’t describe a list of things to install, it describes the state of the world at the point the lockfile was created. This means that, given a deterministic resolver, resolving the same set of dependencies will always resolve to the same set.

I’ve carefully worded that, because an important thing here I think is the ability to include things that the resolver might not actually take into account (e.g. extra packages it doesn’t need). Hypothetically, a lock file could contain a complete snapshot of the entirety of PyPI at the time of creation and the end result would still be the same.

Different implementations of a lock file could take this to varying degrees, such as “locking” in specific files it used, or recording the end result of the resolver (such that this lockfile is only valid on a specific platform) or by attempting to exhaustively resolve all the combinations of conditional dependencies to include.

This idea is a little bit strained, because some lock files can be implemented as a list of packages to install if the features of the installer are sufficiently simple, but makes more sense when you start thinking of more complex installer features. Like Pipenv will resolve the fully set of dependencies, as if you specified to isntall the development dependencies, even if you didn’t ask for that and will reflect all of them into the lock file. I assume that poetry does something similiar.

Part of honestly though is I think to actually be a replacement for poetry’s lock file, Pipenv’s lockfile, Spack’s lockfile, etc is it has to actually support through some mechanism all of the features that each of those tools have. However as soon as you start to add support for those features, you either mandate that all tools support those features (thus making the lowest common denominator the superset of features) OR you end up in a weird situation where the tools are using the same format on the surface, but uses of that format aren’t actually interchangeable because properly using said format relies on interpreting implementation specific data inside that format. I think the former is unresolvable (you’re never going to get every tool to agree to the same set of features, if you did we wouldn’t have multiple tools) and the latter puts users in a really bad place where we claim to have this interoptable standard, but it’s not really interoptable because to actually use it requires relying on implementation specific details. So given that I don’t think a replacement for the various lockfiles is meaningfully possible, this “it’s not a lockfile I swear” is largely an attempt to get the same benefit in the one major use case I can see for an interoptable lockfile (I run a platform and want a way to describe the dependencies you need me to install) by treating that as a distinct artifact.

uranusjr · April 21, 2020, 9:20pm

A lot happened since I last responeded, so forgive me if I miss something and do not respond. Please feel free to point them out. A lot of the discussed have also been carried out well IMO, so I’ll try to respond mostly to points that seem to still be left open to me.

My intention behind to proposal is solely on the concrete specs. The idea is to make the format to represent the result of a resolution process, and be immediately consumable for an installation process to create a fully operable runtime. So it is not intended to be passed into a resolver; the only resolution logic (in some sense) needed would be to process conditional dependencies, i.e. not install certain things on a certain platform, which is specified in the proposal by environment markers. Environment markers have limitations, of course, but I believe it is possible to produce a reasonable declarative system that can describe most scenarios on conditional dependencies.

Honestly this (and other comments you’ve made) seem to me we’re actually trying to have the same thing, except you don’t agree with it being a lock file. Let’s call it something else then. Quoting myself from a previous message:

This is probably my fault; I’ve been calling the idea a “lock file” (and even name the repo as such), and that likely makes people start at the wrong track right from the beginning.

To be clear, the format is created to solve a problem, and as long as I can get the problem solved, it can be called whatever and be classied as whatever. Lock file, requirements.txt but more structured. It does not matter (to me, at least).

It is definitely not my intent to make the meta packages installer-depended. The meta dependency thing is a reimagination to the common multi-requirements.txt pattern, e.g. you have a requirements.txt, test-requirements.txt, doc-requirements.txt, etc. There are however downsides to having free-form include syntax (-r in requirements.txt) that I wish to address.

My own mental model to this is actually in trees (graphs? I’m not good at data structures). The dependencies in the project collectively form a tree-like structure, and the meta packages are the nodes near (or at) the root to start the traversal process that collects required dependencies.

It might be easier to think this like how the core medata specifies package dependencies. The top-level dependencies key ists everyhing that could be required by the project, but some of them have an extra = marker. The meta dependency thing declares that extra, and specifies what installing the extra would pull into the dependency. And the "" group lists the dependencies that don’t have an extra = marker.

The ; syntax exists to handle a (rather common) conditional dependency case, also mentioned in the thread:

The listing format would need to distinguish between those different versions of the same package, so the syntax is added to address that problem. The scheme can be anything really, but I figured it’s easier if we have a proposal to begin with than let people figure out what they should do (or even jump into an incorrect conclusion this is not possible and the format is doomed ).

Again, honestly, I think it is entirely in line with what you are describing, at least from how I read it. I genuinely do not get how it seems this way to you. Is it the name “lock file” makes you think it should represent something you have in mind (that I don’t get), which is entirely different from what I intend the proposal to be?

Note: @dstufft posted a comment while I’m writing this. I haven’t read it, but I’ll try to post this first anyway to avoid getting stuck in catching up with new messages without responding. And it doesn’t help I’m already bad at explaining things one at a time.

dstufft · April 21, 2020, 10:08pm

Maybe it would help if you could outline the rough steps you see an installer taking to install the dependencies declared in a lockfile (I don’t mean the low level stuff like downloading a wheel or something). Particularly around a few features:

The second pattern is reserved to support cases where a Python distribution needs to be specified differently depend on the platform. For example, docutils 0.15 only supports Python 3, while Python 2 support is available as 0.15.post1. This pattern allows the lock file to conditionally use docutils@0 for 0.15, and docutils@1 for 0.15.post1.

How does an installer know whether to use the first docutils entry or the second docutils entry? Is it possible to have platforms where no docutils will be installed? If so do I have to make a third docutils entry that is for an empty platform or something?

A valid normalized name surrounded by a pair square brackets, i.e. satisfying regular expression ^\[[a-z0-9][-a-z0-9]*\]$ . A dependency using such key should be a meta-dependency that points optional direct dependencies of the project, similar to Setuptools’s extra_requires entries.

So I presume something like dev requires in Pipenv would map to a [dev] meta-dependency, and thus something like poetry could theoretically install it by doing some poetry incanation to install that extra. However, poetry supports arbitrary extras, so what if someone added a [tests] meta-dependency, how does Pipenv install from that lockfile? Pip doesn’t really have the concept of specifying extras for a requirements.txt file (you can fake it with several named files), so how does pip install a lockfile with a [dev] and a [tests] extra?

extendable for declaring dependencies from alternative package management systems.

I assume this means that instead of a python key, something could have a conda key, or a deb key or something and it’ll specify something that comes from another system. Given that these keys aren’t standardized, how do you see a tool like pip handling a dependency that has a deb key instead of a python key?

Some other questions:

What level of portability do we assume is possible for a lock file? Does a single lockfile work for Windows, macOS, and Windows? If it does, do we assume tooling that currently generates a platform specific lockfile will adapt to generate a platform independent one? If it does not, do we assume tooling that generates platform independent lockfiles will stop doing that? If we leave it up to the generator of the lock file to decide, are we expecting tooling to be able to cope with either/or situation?
Validations can be empty, is it allowed to require it? Presumably since 1 of N is the threshold to declare something value, if a tool doesn’t support a hash algorithm it should just skip it and move onto the next, but what it it doesn’t support any of the hash algorithms?

Roughly speaking, I’m wondering three major things:

How does this actually function?
In cases where the feature sets of the involved tooling do not overlap, how do we handle that, particularly when generating from a tool that supports X feature, to installing with one that does not?
In cases where the feature sets of the involved tooling does overlap, but their opinions on how to interpret some specific bit of data differs, how do we handle that disparity?

uranusjr · April 22, 2020, 7:41am

Let’s say Pipenv specifies its default group to use the "" meta dependency, and [dev] for the develop group.

On calling pipenv sync, it starts with the "" meta, and recursively collect dependencies:

lock = read_lock_content(filename)

collected = {}
collect_dependency("", collected)

with the implementation:

from packaging.markers import Marker

def collect_dependency(key, collected):
    if key in collected:
        return
    into[key] = current = lock["dependencies"][key]
    for child, marker in current["dependencies"].items():
        if marker and not Marker(marker).evaluate():
            continue
        collect_dependency(child, collected)

and install things in the collected dict.

Since dependencies is the result of a resolution, at most one docutils should be collected here, otherwise the resolution should have failed with a conflict. It is also possible no docutils is installed, if none is visited during the collection. This either means it is not needed on this platform (the marker evaluation excludes it or a dependency requiring it), or it belongs to another group not requested here (e.g. dev).

For pipenv sync --dev, both "" and "[dev]" need to be collected. This would still be duplicate-free if the lock file was generated by Pipenv itself, but the implementation can add additional checks to ensure there are no conflicting dependencies (by comparing the part before ;).

Pipenv can still install from it, and will simply ignore everything only collectable through the [tests] meta-dependency. In pip’s case, it would need to user to tell it what to install. Here’s an interface I think would work:

pip install -l 'path-to.lock.json'  # This installs the "" meta-package.
pip install -l 'path-to.lock.json[dev]'  # This installs both "" and "[dev]" meta-packages.

I’d say pip should error out without installing anything if any of the dependencies it needs to satisfy contains keys other than python and dependencies. The PEP 517-ish idea @dustin thought of sounds interesting, but I have not thought into it to determine whether it would work, or how. I think that would need be a follow-up extension to the format. I’d also say pip is too low-level and shouldn’t support this interop feature even if it ends up getting specified.

(This post is getting long and I need to leave for now. I promise I’ll come back to the other points when I have more time.)

FRidh · April 22, 2020, 5:38pm

If the format supports locking per platform and per Python version, then both Poetry and pipenv can use it, but it is up to them to choose whether they actually insert the information for all platforms / Python versions.

Yes, but like with Nix, it could be valuable for your users to be able to consume such a Python lock file. Then, when they use it they would still generate a Spack-specific lock file. There is e.g. a tool, poetry2nix that allows building Poetry projects with Nix. Yes, additional information is needed when using extension modules, but other than that it saves a lot of work that is now handled by Poetry.

It’s a matter of choosing what information is to be contained in it. The more goes in, the more usable it can become for more other tools, but it adds a burden. For an initial exchangeable lock format I suggest not including compiler info and such, but who knows, in a couple of years more people want reproducible environments, then that choice can be revisited.

brettcannon · April 22, 2020, 10:32pm

I will say that other than standardizing the metadata for projects, this locking/environment concept is the last thing on my list for ‘packaging’-related metadata (after this my personal packaging project left is making sure there are libraries to support all of the PEPs). So I’m not expecting a proliferation of 3 lines in a TOML file to solve a ton of problems as I personally can’t think of others worth trying to standardize or are universal enough to want to standardize.