PEP 751: lock files (again)

On this topic, the Motivation section of the PEP currently includes Dependabot as an example tool that might benefit from a lockfile standard.

However, the full Dependabot functionality (as opposed to only security alerts about vulnerable packages) will require it being able to update the lockfile, rather than just needing to read it. If package managers have any tool specific config/state stored in pyproject.toml / elsewhere, that will presumably get out of sync with the lockfile for anything other than simplistic lockfile changes. And in fact, it seems the Dependabot use-case would still need to perform full package resolution given that the new version of a package could change the dependency graph? And as such, Dependabot will still probably need to support/run all of the individual package managers anyway?

Are cross-tool lockfile updating use-cases ever going to be viable? If not, should the PEP motivation explicitly state that the lockfile is primarily aimed at read-only use-cases (such as package installation, or SBOM generation etc), and drop the mention of Dependabot?

4 Likes

What would you do if there was a [tool.xxx] section in the lockfile? If you error, then that effectively says that anything containing a tool section is non-portable. If you ignore it, that says that data in [tool.xxx] must not result in different files being installed. Either choice is valid, but they have different implications (and I think that the PEP should probably clarify what the intended behaviour is, even if it’s only as a SHOULD rather than a MUST).

I think that limited support (in the form of security alerts only) could still be useful, but I’m not a heavy user of Dependabot so that’s not an informed opinion. Certainly, if cross-tool updating isn’t viable, that should be noted as a limitation for Dependabot. See below, though.

I would absolutely hope that if we have a standard lockfile format, and a given lockfile includes no tool-specific data (i.e., no [tool.xxx] sections) then any tool would be able to use it, both for installing from and for updating. If some capabilities are optional (for example, multi-environment locking), then tools that don’t support than capability could reject the lockfile as unsupported - but as I said previously I think we should be very cautious about allowing optional capabilities if we want to claim this standard helps interoperability.

Sorry this is a terminology issue. By environment I was talking about the concept of Hatch environments not cross-platform environments because I was responding to what Charlie was talking about regarding workspaces. So, as an example, if there is an environment named foo with dependencies bar and workspace members ./w1 and ./w2 then it would have its own cross-platform (or whatever we’re calling it now) lock file even if another environment defined the exact same dependencies.

Does that make sense?

I don’t think this will be useful to Hatch as I just mentioned but it’s possible I don’t understand the current discussion.

My assumption is that any tool is able to consume the standard file. In the case of Hatch I don’t actually have an immediate implementation in mind for the near future and am going to continue passing stuff to dependent installers like pip and UV.

This makes sense. The question we were talking about, though, is whether ./w1 and ./w2 both get their own lockfiles, and/or whether users are “allowed” to sync their environment to just ./w1 and its dependencies, or only “allowed” to sync the entire workspace at once.

The latter in the case of Hatch as my intended design is such that environments get lock files, not projects themselves.

2 Likes

Concretely for a Pants user (where Pants uses Pex lockfiles) wanting to deploy an AWS Lambda function.

The user background:
The user has a monorepo using a single (Pex) lock. That single lock covers many binaries, libraries, tests & tools. Amongst all this code are a few cloud functions. In particular the user is focused on deploying one of these functions to AWS Lambda. This lambda function uses a small subset of the lock. Concretely, lets say the full lock for the repo is generated from input requirements ["foo", "bar[extra1,extra2]", "baz", "spam"], but the lambda function in question just imports from “baz”, and “baz” turns out to have a subgraph of “baz 0.1”, “spam 0.2” and “interior 0.3”.

The service provider background:
AWS Lambda’s can be deployed in many ways. Two are:

  1. code zip + requirements file
  2. code zip containing requirements too

Note that style 1 is presumably dictated by today’s (or maybe a bit yesterday’s) de-facto standards. If a lock file format / semantic were standardized a way to deploy might be code zip + lock file bringing all the benefits of locked artifacts to method 1.

So, assuming AWS latches on to this standard and allows deploys via code zip + lock file the user in question has a problem if they want to use this method. The lock contains way more than they need - 100s of dependencies they do not use. This impacts their lambda deploy latencies at the very least. There are two ways to fix this afaict:

  1. The user subsets their lock file producing a new lockfile that is a subset of the true lockfile and asks AWS lambda to deploy using that.
  2. The lock file standard support sub-setting directly and deployment method 1 changes to: coze zip + lock file + optional list of requirement strings to resolve from the lock file - aka AWS supports doing the subset because the lock standard does. If the optional list of requirements is not present, use the whole lock.

Concretely then, the user deploys to AWS lambda by handing it their code zip, their repo’s single lockfile, the requirements list [“baz”] to resolve from the lock.

3 Likes

John provides a great, real-life example. Hatch is going to enforce the first paradigm (as that is basically the reason for the concept of Hatch environments):

I have so many thoughts on this but I’m trying not to dominate the conversation so I’ll keep it short.

This might’ve been rhetorical, but I think the answer would differ for resolving (updating) vs. installing. My preference would be:

  1. Installers must be able to install from any lockfile regardless of [tool.xxx] metadata. This also puts some constraint on resolvers, since they can’t require the use of any [tool.xxx] at install-time.
  2. Resolvers can reject an existing lockfile if it contains [tool.xxx] metadata, and they’re asked to update it (or even if it lacks [tool.xxx] metadata and the current tool is xxx).

At least for us, I don’t see why we’d need [tool.xxx] for installs. We’d mostly use it for bookkeeping during resolution.

Related to the comments above about Dependabot, from my perspective, this preserves the core benefits of standardization:

  • Dependabot support (at least, alerts, but not updates)
  • Cloud-provider installation
  • Installer interoperability (e.g., use pip to install your PDM project)

From this perspective, the PEP would be trying to standardize on a single file format that could replace poetry.lock, uv.lock, etc., while focusing the benefits of standardization around the installer operations. (It would be a non-goal for, e.g., you to take a Poetry-produced lockfile and run uv add flask to update it. I think this is both harder to achieve and less valuable.)

Critically, though, we’d still be trying to obviate the tool-specific file formats. This is different than f we decided that the scope of the PEP was to create an interoperable format for installers only. That would actually make things a lot easier (it’d be like the “locked” requirements.txt format that tools use today, except standardized and with all the information you need to install (like URLs, rather than just versions). In that world, we could probably even get rid of [tool.xxx] entirely which would be great for ensuring spec compliance. But it has the downside that users have to deal with and learn multiple files (both poetry.lock and pylock.toml).

4 Likes

Great example. And to clarify, I think this could either happen via…

  1. The lockfile format tracks enough information that it’s possible for tools to implement subsetting based on the standardized information alone.
  2. You write proprietary information to [tool.xxx] that enables your tools’ users to subset using your tool.

Lastly: if everyone prefers it and we put the markers on the nodes rather than edges, it’s fine, it’s not a deal-breaker. I think we’ll still be able to support “multiple entrypoints to the lockfile” even without writing extra metadata to [tool.uv]. But it’d be nice if the PEP decided that this was explicitly supported or an explicit non-goal. Otherwise, we might get it working but only “by accident” due to incidental details in the format that could change over time.

1 Like

Yes, but 2 effectively advances the state of the art no-where. Users can already subset a Pex lock to a hashed requirements.txt. I’m only really interested in 1. If 2 is all I get, then this whole PEP exercise just means I implement a new export format from Pex lock that is the new standard. There is no motivating reason to actually switch to the standard afaict.

2 Likes

Thank you. That’s an excellent real life example, and given Ofek’s response, it’s clear that there’s a desire for both “install this lockfile” (unqualified) and “install this subset of the lockfile”.

One question - you say “the requirements list”. Can you describe *precisely" what you’d expect here? Because a requirements list can contain things like foo>2.0, foo[some_extra]; python_version > "3.8", or foo @ https://some/url. I imagine none of them would be suitable for defining a subset of a lockfile, which is why I want to be clear how we specify a subset.

It wasn’t, so thanks for answering. I’m happy with that answer, but I will note that it implies that you’re committing workflow tools to limit their functionality to what the lockfile format supports. Which explains why you are pushing for the format to support all of uv’s functionality, but means that reducing the scope of the standard to well-understood functionality, leaving more experimental features to a later iteration of the spec (when tools have had a chance to determine the best approach), is difficult, if not impossible, to achieve.

OK, that makes sense. If it is what @brettcannon intends for the PEP, then it’ll be worth stating that explicitly. Otherwise we will get people (by analogy with pyproject.toml) expecting lockfiles to be portable between tools.

2 Likes

I mean any PEP-508 requirement specifier - period. Its either satisfiable in the lock or it isn’t. This is how Pex locks work today. From your examples, foo>2.0 and foo[some_extra]; python_version > "3.8" and foo @ https://some/url would all work.

For example:

:; pex3 lock create --pip-version 24.2 --style universal --interpreter-constraint ">=3.8" requests "certifi @ https://github.com/certifi/python-certifi/archive/445b9cd2539f51b0aec4971a8ec02ded3943327f.zip" --indent 2 -o lock.json

:; jq '.locked_resolves[] | .locked_requirements[] | select(.project_name == "certifi")' lock.json
{
  "artifacts": [
    {
      "algorithm": "sha256",
      "hash": "a9ef7809e30370137ed69d89e45e3c36515d36a87649d8253251df4bfb038174",
      "url": "https://github.com/certifi/python-certifi/archive/445b9cd2539f51b0aec4971a8ec02ded3943327f.zip"
    }
  ],
  "project_name": "certifi",
  "requires_dists": [],
  "requires_python": ">=3.6",
  "version": "2024.8.30"
}

:; pex3 lock export-subset --format pip-no-hashes "certifi @ https://github.com/certifi/python-certifi/archive/445b9cd2539f51b0aec4971a8ec02ded3943327f.zip" --lock lock.json
certifi==2024.8.30

:; pex3 lock export-subset --format pip-no-hashes "certifi" --lock lock.json
certifi==2024.8.30

:; pex3 lock export-subset --format pip-no-hashes "certifi < 2024.8.30" --lock lock.json
Failed to resolve compatible artifacts from lock lock.json for 1 target:
1. /home/jsirois/.local/bin/tools.venv/bin/python:
    Failed to resolve all requirements for cp311-cp311-manylinux_2_39_x86_64 interpreter at /home/jsirois/.local/bin/tools.venv/bin/python from lock.json:

Configured with:
    build: True
    use_wheel: True

Dependency on certifi not satisfied, 1 incompatible candidate found:
1.) certifi 2024.8.30 does not satisfy the following requirements:
    <2024.8.30 (via: certifi<2024.8.30)

And, for an interior node:

:; pex3 lock export-subset urllib3 --lock lock.json
urllib3==2.2.3 \
  --hash=sha256:ca899ca043dcb1bafa3e262d73aa25c465bfb49e0bd9dd5d59f1d0acba2f8fac \
  --hash=sha256:e7d814a81dad81e6caf2ec9fdedb284ecc9c73076b62654547cc64ccdcae26e9

:; pex3 lock export-subset "urllib3>=2" --lock lock.json
urllib3==2.2.3 \
  --hash=sha256:ca899ca043dcb1bafa3e262d73aa25c465bfb49e0bd9dd5d59f1d0acba2f8fac \
  --hash=sha256:e7d814a81dad81e6caf2ec9fdedb284ecc9c73076b62654547cc64ccdcae26e9

:; pex3 lock export-subset urllib3[brotli] --lock lock.json
Failed to resolve compatible artifacts from lock lock.json for 1 target:
1. /home/jsirois/.local/bin/tools.venv/bin/python:
    Failed to resolve all requirements for cp311-cp311-manylinux_2_39_x86_64 interpreter at /home/jsirois/.local/bin/tools.venv/bin/python from lock.json:

Configured with:
    build: True
    use_wheel: True

Dependency on brotli (via: urllib3[brotli] -> brotli>=1.0.9; platform_python_implementation == "CPython" and extra == "brotli") not satisfied, no candidates found.

Sorry, do you mind expanding on this a bit? Perhaps with an example?

Hmm, OK. I think I’d need to see how this works in practice - especially for a graph-based lockfile with multiple roots - before I’d be 100% comfortable supporting this. It feels like there’s a bunch of edge cases we’d need to pin down. It probably needs the “how to install from a lockfile” algorithm to be explicitly written out.

Suppose we decided not to support multiple roots in the lockfile spec, because we weren’t sure that all the issues had been ironed out yet. Then uv couldn’t support multiple roots, because the standard doesn’t support it, and you can’t handle it in the tool.uv section because we’ve said that can’t affect what gets installed.

And worse, because nobody can support multiple roots without violating the spec, nobody can work on ironing out those issues so that we could add multiple root support in a later version of the spec.

fwiw, i do think it would be ideal (for selfish user reasons) if lockers would be able to lock from an arbitrary starting lockfile. Though, not that you should expect two users to use two different tools on the same project seamlessly.

Today, converting from uv to poetry or vice versa means effectively losing any lockfile state you might have previously had. It would be ideal if uv lock on a poetry-produced lockfile would effectively turn it into a uv-produced lockfile (not needing to retain any tool.poetry data), but basing the resolution off the original file’s locked versions. Insofar as poetry lock or uv lock today both work with and without a preexisting lock file, and behave differently in those two cases, it at least doesn’t seem obvious that it should error.

With that said, it obviously doesn’t need to be a PEP requirement of the tools, but it would be an ideal end-user outcome.

First, apologies in the delay in responding! Just did the first driving trip with the baby and I’m at the core dev sprints, so lots going on! But thanks to everyone carrying on the conversation w/o me.

Same, which is what’s making balancing what ends up in the PEP so hard. I fully understand we want users who are ultimately who will use installers to be happy, but I also realize that we can’t put too much onus on lockers which could drive them to not wanting to support the PEP.

And I still expect to support this use case. The PEP will never require that you support all possible platforms.

Yes! Your anyio example I think is a good illustration of the two approaches: do you just write at the place where you depend no anyio (i.e. the edge from what depends on anyio; Requires-Dist), or do you try to flatten all the required information into anyio’s entry in the lock file (i.e. the node)?

The trick with that is then you’re supporting both approaches at once. It’s not the worst thing (if you can flatten the graph to write out a set of packages then recording the graph you flattened from is also possible), but it’s setting expectations for users as to what can be relied on.

I can’t think of a scenario where an installer should be allowed to not support installing for multiple environments as the same stuff to detect whether an environment is supported by the lock file would be necessary.

That’s the plan. This is also what makes figuring out the right level of feature set so hard.

What does Dependabot do when it updates a pyproject.toml file that has a [tool] section? Does it just leave it alone?

We could also record the tool or command used to create the lock if necessary so that something like Dependabot could know what it could do to recreate the lock file so any [tool] sections that the user wants to have.

That’s actually the motivating example for supporting different lock files with different names, e.g., pylock.aws-lambda.toml.

I’m okay with that. The thing we can’t know about [tool] is whether it’s metadata that’s fine to leave around and actually isn’t affected by the lock changing, or if it’s critical somehow.

That’s the whole reason I’m talking about this as I don’t want ambiguity for what lockers are expected to produce and thus what installers can rely on.

I think to answer that question we should decide what we expect Dependabot to do here. If we are okay w/ somehow having the lock file record the tool used to generate it so Dedendabot can also use that same tool in a similar way, then I think that’s a reasonable compromise. But if we don’t want to do that then we need to decide whether having [tool] is worth it or if tools like Dependabot just can’t regenerate a lock file in the same way as the user may have.

That is not what would happen in this example. There would be no motivation to store, say, 10 subset lock files of the real single central lock file. Instead, the subset lock would be created just in time from the central lock for export to AWS with the zip file. As I mentioned before, this is ~no different than subset exporting to the current requirements.txt format and would not be a motivation for Pex to switch its central format. It would just now support a new export format alongside requirements.txt unfortunately.

Oh, I know that’s what we would like to see happen. But I also want to acknowledge what we think should happen and what actually happens can be widely different.

I don’t follow your “we’s”. I’m saying that’s what Pants / Pex users would actually do. Use 1 central lock file and, instead of sub-setting 10 different lambda lock files to check in to version control along side it redundantly, just export just in time as needed.

1 Like