PEP 751: lock files (again)

pf_moore · September 6, 2024, 6:49pm

Ah, I see what you mean. It’s a good point, and one I think @brettcannon will have to respond to. I understand his fear that getting too deep into handling sdists might kill the proposal, so I’d support something that essentially just said certain scenarios (like build backends that aren’t locked as prebuilt wheels) aren’t supported. But I don’t think that leaving it up to installers to decide what to do is reasonable here.

Specifically, I’d want pip to follow the letter of the spec, and if locking build backends as sdists is unsupported by the spec, pip won’t support it. And we’d send anyone who claims to have a use case for it to the standards process, to get the lockfile spec updated. I absolutely do not want every installer to have to tackle these issues independently.

Please don’t drop out of the discussion. You’ve raised some valid points and your perspective is useful and important. The level of confusion over this particular point isn’t even close to being the worst we’ve seen over the years of lockfile discussions. And it’s often been me who’s been the confused one, so I should know

(Side issue - I still wish PEP 517 had just required that all build backends must be available as wheels. We considered it. It would have saved so much stress over the years…)

brettcannon · September 6, 2024, 9:33pm

Seems reasonable.

But you are saying all other packages that were installed but not mentioned in the lock file stay, correct?

This would probably be a “SHOULD” scenario.

Yeah, but this PEP doesn’t have to solve everything upfront either; it’s versioned for a reason.

To be clear, that was meant to suggest sdist support in the lock file would be, “you can install the sdist for a package” and not bother about worrying about locking the build back-end used.

I think my response is I’m going to drop [[packages.build-requires]]. It’s been nothing but a headache and point of confusion from the start. And since no other tool that does locking currently supports locking build back-end requirements for an sdist I don’t feel like I need to be the one trying to introduce some innovative solution. Lockers can experiment thanks to the various tool sections and then eventually propose an update to the file format which adds such support.

pf_moore · September 6, 2024, 9:58pm

That’s when you start getting into complexities. If one of the other packages depends on one from the lockfile, but at a different version than the lockfile specifies, you can’t just leave it alone, as that would give a broken environment. One obvious solution is to fail, but some installers might want to try to find a compatible version and install that.

I’d say what to do in that situation should probably be left to the installer.

Fair enough. But actually, a simple non-resolver lockfile installer might not even be able to tell there’s a conflict.

In all honesty, I don’t have good answers here, because I can’t imagine a scenario where installing a locked set of packages into anything other than an empty environment is a reasonable thing to do. The best answer here, in my view, would be to get some concrete feedback from someone who actually needs to do this. And if we don’t find anyone in that situation, I wouldn’t object to the spec explicitly stating that the only supported use case is installing into an empty environment, and support for non-empty target environments is (at least in this version of the spec) entirely at the discretion of installers.

We had this type of issue with pip’s --target option. No-one ever really thought about non-empty targets, and we’ve had years of confusion as a result.

That sounds reasonable to me. I assume you’ll add something to the “Rejected options” section of the PEP explaining that locking build dependencies was deliberately taken out of scope, and will need to be addressed in a future version of the format if there is sufficient interest in the feature to warrant it?

brettcannon · September 6, 2024, 10:05pm

I think saying installers MAY provide a way to try to install into a pre-existing environment is good enough since installing into an empty environment is the easiest.

And it goes against the use case of lock files providing a way to list exactly what’s going to be in the environment in the end.

Yep!

notatallshaw · September 6, 2024, 10:13pm

because I can’t imagine a scenario where installing a locked set of packages into anything other than an empty environment is a reasonable thing to do.

Maybe I’m reading this wrong, but my experience with lock files is that it is normal to install into a non-empty environment.

The normal workflow with pip-tools, uv, poetry, etc. at the moment is that when a lockfile gets updated you use pip-sync, uv sync, uv pip sync, poetry install --sync, etc. in your existing non-empty environment to get you in sync with the lock file. You also may also be syncing with only a subset of the lock file, based on the particular dependency group you’re interested in.

I only install using a lock file into an empty environment on initial project setup.

brettcannon · September 6, 2024, 10:33pm

I tried using pipx run --spec ../poetry poetry lock and I got the same errors again.

A made a gist with extras to see how that looked.

[project]
name = "lock-example"
version = "2024.1"
requires-python=">=3.12"
dependencies = ["trove-classifiers"]

[project.optional-dependencies]
extra-A = ["httpx; os_name=='posix'"]
extra-B = ["requests; os_name=='nt'"]

PDM

[metadata]
groups = ["default", "extra-A", "extra-B"]
# ...

[[package]]
name = "anyio"
groups = ["extra-A"]
marker = "os_name == \"posix\""
# ...

[[package]]
name = "certifi"
groups = ["extra-A", "extra-B"]
marker = "os_name == \"posix\" or os_name == \"nt\""
# ...

# ...
[[package]]
name = "trove-classifiers"
groups = ["default"]
# ...

Poetry

[[package]]
name = "anyio"
optional = true
# ...

[[package]]
name = "certifi"
optional = true
# ...

[[package]]
name = "trove-classifiers"
optional = false
# ...

[extras]
extra-a = ["httpx"]
extra-b = ["requests"]

uv

[[package]]
name = "lock-example"
dependencies = [
    { name = "trove-classifiers" },
]
# ...

[package.optional-dependencies]
extra-a = [
    { name = "httpx", marker = "os_name == 'posix'" },
]
extra-b = [
    { name = "requests", marker = "os_name == 'nt'" },
]

[package.metadata]
requires-dist = [
    { name = "httpx", marker = "os_name == 'posix' and extra == 'extra-a'" },
    { name = "requests", marker = "os_name == 'nt' and extra == 'extra-b'" },
    { name = "trove-classifiers" },
]

Everyone lists the extras and what packages they contain somehow
For the marker in the extras
- PDM propagates the markers to each package
- Poetry leaves them out (@radoering does Poetry refer pack to the pyproject.toml or did I mess up my [tool.poetry] section?)
- uv records them with the package both as optional dependencies and as metadata (@charliermarsh what’s the difference?)

pf_moore · September 6, 2024, 10:39pm

Hmm, that’s a good point. On reflection, when I said “empty” what I was really thinking about was an environment with nothing in it but stuff managed by the lockfile (I hadn’t thought of syncing with a subset, but it’s the same in the sense that the state of the environment as a whole is captured in the lockfile).

I’m not sure how best to capture that idea in the spec (or even if we need to), though. Maybe something like:

Installers MAY choose to not support installing into environments containing packages which are not managed by the lockfile. If they do choose to install into such environments, the presence of additional packages MUST NOT affect the results of installing the lockfile.

notatallshaw · September 7, 2024, 12:49am

stuff managed by the lockfile

I don’t think I understand what “managed” means here. A common workflow with tools that currently use lock files looks like:

I create some base requirements
I generate a lock file
I create an environment and the installs into it based on that lock file
I modify the base requirements (add, remove, change)
I generate a new lock file
I sync the environment with the new lock file

If someone else was to turn up after step 5 had been completed with no information about the prior steps, how are they to tell that the old environment is related to the new lockfile? I may have made a small modification to the base requirements, or I may have completely altered them and the lock file.

charliermarsh · September 7, 2024, 1:06am

[package.metadata] effectively represents the input requirements, whereas [package.optional-dependencies] represent the resolved versions (but we omit the version if it’s unambiguous given the contents of the lockfile).

Originally we didn’t write [package.metadata], but we added it to help facilitate lockfile invalidation (i.e., the input dependencies changed, so we need to re-resolve) as opposed to something like a checksum or hash.

ofek · September 7, 2024, 2:39am

I also agree wholeheartedly with punting that for now. Whenever that happens it’s the last piece of reproducibility as I enumerated here: The purpose of a lock file

6/7 is pretty darn good

radoering · September 7, 2024, 5:06am

Maybe some caching issue? I do

pipx uninstall poetry
pipx install git+https://github.com/radoering/poetry.git@lock-markers-and-groups3a

to make sure to install into a fresh venv and do not get any errors.

Without the PR, Poetry just re-resolves at install time so the lock file is not that interesting.

With the PR:

[[package]]
name = "anyio"
# ...
groups = ["main"]
markers = "os_name == \"posix\" and extra == \"extra-a\""
# ...

[[package]]
name = "certifi"
# ...
groups = ["main"]
markers = "os_name == \"posix\" and extra == \"extra-a\" or os_name == \"nt\" and extra == \"extra-b\""
# ...

[[package]]
name = "trove-classifiers"
# ...
groups = ["main"]
# ...

In contrast to PDM, Poetry will not consider extras as groups but handle them in markers. I do not know how PDM handles the certifi entry in the lock file at install time but in my opinion the locked marker is not accurate because it should be different for extra-A and extra-B.

If I define two groups extra-A and extra-B instead of extras in Poetry, the result (with the PR) will look as follows:

[[package]]
name = "anyio"
# ...
groups = ["extra-A"]
markers = "os_name == \"posix\""
# ...

[[package]]
name = "certifi"
# ...
groups = ["extra-A", "extra-B"]
markers = {extra-A = "os_name == \"posix\"", extra-B = "os_name == \"nt\""}
# ...

[[package]]
name = "trove-classifiers"
# ...
groups = ["main"]
# ...

Please note that markers for certifi is now a dict instead of a string because they are different for the two groups.

BrenBarn · September 7, 2024, 8:15am

I have similar questions but I’m seeing it from a different angle. To me the question is not what you install the lockfile into but what you wind up with once you install it. And my answer is that the lockfile is not just locking what you put in but what you get out — that is, you can only ever use (at most) one lockfile per environment at a time, and the purpose of the lockfile is to totally determine the resulting state of that environment. So it doesn’t matter what the environment has in it before you install the lockfile, but you know that after you install the lockfile, the environment will have exactly what the lockfile says, no more and no less.^[1]

I feel like somewhere in the thread there was discussion about multiple lockfiles but I don’t recall what it was and I never really understood it. I’d be interested to hear about use cases that envision using a lockfile but not in the all-or-nothing way I described above.

And yet here we are with this PEP because we decided the old one that didn’t allow sdists was no good. . . ? Although I respect all the engineering that’s going on here, I still think we would be better off if we try, with every packaging proposal, to drive a wedge between sdists and wheels at every opportunity, and move away from expecting that source distributions are something that install tools should install. Then we will not find ourselves X years later going “I wish we hadn’t allowed sdists in lock files because it would have saved us a lot of stress.”

You could of course install other stuff into it after you use the lockfile, but if you try to use another lockfile to do that, it will totally override whatever the first lockfile did. If you want to somehow combine two lock states, you need to merge the two lockfiles, not the two environment states that they create. ↩︎

pf_moore · September 7, 2024, 9:59am

Ah. Yes, that’s the problem I’m trying to flag - but it’s generally handled by workflow management tools which update your environment in the “add, remove, change” step as well as at sync time. (I think - I’ve not used any workflow tools extensively). I was only thinking of “managed by the lockfile” in the very limited sense of “is included in the lockfile”, and specifically trying to distinguish that from the broader “managed as part of the project the lockfile was created from”. I clearly didn’t do a very good job of making that distinction, and on reading your example, I’ve come to the conclusion that it’s not actually going to be a helpful extension over my original proposal of simple installers only supporing installation into an empty environment.

Let’s reframe the scenario.

User A is responsible for managing an environment for a team. They use a workflow tool like PDM to do so.
They create the environment, add various packages, and generate a lockfile. They send that lockfile to their users.
Users can use any tool to create an environment from that lockfile. Let’s say they use pip, and pip has implemented the “bare minimum” lockfile installer algorithm (scan the file and install what’s there, no resolve). They create an environment and install the lockfile. Great.
User A now modifies the environment, by adding some packages, and removing others. They ship the new lockfile.
Users who delete their environment and recreate it using the new lockfile are fine.
Users who try to install the new lockfile on top of their existing environment are not fine. They will have old packages that got removed from the environment. They may even have a broken environment, if those “orphaned” packages conflict with the new environment.

This is the scenario I want to set expectations for. There’s no way that a simple installer, installing an arbitrary lockfile into an arbitrary environment, can accurately replay the history leading up to the current situation. The only safe way, if reproducibility of an environment is your goal, is to create a new environment and install the lockfile into it. And if reproducibility isn’t your goal (for example, you want to track the state of a base environment in an environment derived from it) then lockfiles may help with that, but you will need a dedicated tool, not a simple lockfile installer.

And with my pip maintainer hat on, I want to be clear that pip will probably only ever be a “simple lockfile installer”. Because I feel people will expect more than that - as with the scenario I describe above.

pf_moore · September 7, 2024, 10:03am

Please don’t misinterpret my comment. I was talking solely about requiring build backends to be shipped as wheels, because that avoids an “infinite recursion” issue with building from source. Installing from sdist is a completely separate topic, and I don’t agree with you at all that standards should be used to somehow make sdists into a second class distribution format.

I agree that wheels are a much better distribution format for almost all situations, but that’s a social issue, not a standards one.

And I said this was a side issue, so please let’s not continue this digression. If you want to debate this with me, take it to private messages.

notatallshaw · September 7, 2024, 1:11pm

This is the scenario I want to set expectations for. There’s no way that a simple installer, installing an arbitrary lockfile into an arbitrary environment, can accurately replay the history leading up to the current situation. The only safe way, if reproducibility of an environment is your goal, is to create a new environment and install the lockfile into it.

Okay, but this would be a big, and worse, departure from existing tools and workflows that don’t require the user to create a new empty environment, and just allow them to sync their environment with the lock file in this scenario.

Perhaps pip wouldn’t actually be an appropriate tool to adopt this standard. If it was to only treat the lock file like a novel requirements file.

pf_moore · September 7, 2024, 1:22pm

Maybe. But conversely, what is it that allows uv, PDM and Poetry to handle this better that isn’t captured in the lockfile spec, and why is it OK to omit it from the spec?

Personally, I view lockfiles as being precisely a novel form of fully pinned and verifiable requirements file, intended to allow the user to accurately reproduce the state represented in the lockfile in a new environment. Is that not how you see it?

mdrissi · September 7, 2024, 2:53pm

My view of installing from lock file is that at end of installation all requirements in lock file will be installed with same version/hashes as specified by lock file. Other dependencies may exist in library and are to be ignored. If any dependencies specified in lock existed in environment before lock was installed then they should be replaced as needed by version specified in the lock. Or another way is I see installing from lock as roughly same as pip install —no-dependencies -r lock_file.txt

Edit: As a more complete example let’s say lock file has only 3 entries in it if package A==2.2, package B==3.1, package C=4.7.

If you have an empty environment great install all 3. If you have an environment with dependencies like D/E and no overlap with lock still just install all 3. If you have environment of day B==3.1 and C==7.2 then install A with 2.2, B is already there with same version/target hash so leave as is and C has wrong version so replace with 4.7.

This may in some cases lead to broken/inconsistent environment if you have package D that depends on C different from lock. That’s fine. Installing a lock to an empty environment should always lead to consistent environment. Installing a lock to non empty environment should lead to exact dependencies specified by lock but other dependencies not specified by lock may become inconsistent.

My main reason for preferring this view is most of other people I work with do not constantly recreate environments. They do occasionally delete/make a new one sometimes but tend to reuse environments when updating dependencies.

pf_moore · September 7, 2024, 4:39pm

Is it OK to leave B alone based on the version? Or should a reinstall be done anyway in case the user manually edited the installed file? It’s very much an edge case, and I’d be 100% happy with “leave it alone” (pip has --force-reinstall to override that decision, other installers could do similar). But it’s worth asking, because one of the goals of a lockfile is to replicate the desired environment exactly.

I’m not convinced it’s fine. Pip will always check the environment and complain if it’s inconsistent. I don’t think we want to suggest it’s OK to have an inconsistent environment. Having said that, I’m fine with it being an installer quality of life question whether it checks the consistency of the final environment.

My main concern is that that practice will result in a gradual drift of environments if the user keeps installing from refreshed copies of a lockfile. And that’s not something users will expect to happen.

But to be fair, I’d assume this will only happen if people are sharing lockfiles but not using the same installer/workflow tool. My impression is (based on the comments @notatallshaw has made) that if you stick to a single tool, it will have tool-specific ways to track incremental changes (at least ones managed by the tool’s add/remove/update commands). So hopefully this will be a relatively rare situation.

groodt · September 7, 2024, 5:52pm

Feels like the conversation about environment management can be left as tool specific behaviour? I don’t think the PEP mentions environments or really needs to take a position on them? Installers and workflow tools can?

Many tools have existing concepts of install vs sync.

install: take lockfile and install into target environment (optional tool UX can be used to check environment, check that lockfile is up to date, check venv exists, which dep group to install and various things)
sync: bring environment into sync with lockfile (optional tool UX can be used to control how aggressively the target environment gets cleaned to match lockfile, which dep groups etc)

A lot of the time a person will want sync behaviour, but sometimes a person will want to shoot their feet off and install without checks. I think it’s fine to let them.

Indeed, a naive version of sync can be implemented by:

delete everything in environment
run install from lockfile

But as before, I think all this is tool and project specific UX. A lockfile doesn’t necessarily need to take a position here.

ncoghlan · September 9, 2024, 4:55pm

As @groodt noted, “install” and “sync” are different commands for a reason:

“install” means “add what’s missing, upgrade or downgrade things that are at the wrong version”
“sync” means “run the install, and then remove anything not specifically mentioned in the input”

The input file is the same either way, but the latter command can go extremely wrong when the top level requirements and the fully resolved transitive requirements share a potentially confusable file format (as they do for requirements.txt files).

Defining a standardised lock file format won’t magically add a sync command to installers that don’t offer one, but it eliminates one of the biggest UX risks in adding the feature (people won’t readily be able to try and sync just the top level requirements rather than the full transitive dependency set).