In case it wasn’t obvious, I’m struggling deciding between encoding the graph or a set (I’m using “set” instead of “linear list” to emphasize that the ordering of packages doesn’t matter). So, I’m going to write out my thoughts and see if something obvious comes to mind (I doubt it since I have been thinking about this for a while now), or to see what you all have to say to see if we can reach consensus as a group.
Set
This is when the locker writes out a list of packages that an installer evaluates each listed package one-by-one on their own (this is what PEP 751 originally proposed, what PDM currently does, and what Poetry is slated to do). The way you control what to install is primarily via environment markers and requires-python
. This isolation of information facilitates auditing as you can look at any one package and understand whether it would get installed or not in any scenario.
The way a locker would most likely work is you provide a set of requirements and then it works out which packages are necessary for various marker requirements (basically if a marker requirement is met of not). The locker then propagates those marker requirements through the dependency graph so that when you write it out you get a collection of markers for a package that encapsulate any and all ways you could have reached the package through the dependency graph.
One potential drawback to this approach so far is this propagation of markers can lead to very large markers for a package. That can hurt the benefit of being able to read the package details. It also simply leads to bigger files.
There’s also the question of how to support a single lock file having multiple entry points into the dependency graph. This comes up w/ extras, PEP 735 (aka dependency groups), and monorepos. Typically this is handled via group labels. That way, for each package, you check if it’s a part of the group and if it is whether it should be installed via more markers. If you imagine lockers creating synthetic groups for anything that isn’t necessarily user-specified, you seemingly can get a subset of the dependency graph via some group. This does require, though, knowing all of your entry points into the graph upfront to create a group for them.
It has been suggested you could record enough data for the set of packages to recreate the graph, but I will admit that starts to feel redundant. It also means you are recording more information than necessary, so someone is probably doing work that they don’t have to. It’s not drastically bad, but it then becomes a question of whether keeping the edge details is optional or a requirement (which affects whether installers can rely on it), in which case we have encoded the same information twice.
Graph
This is when the locker writes out each package like it’s a node, listing the direct dependencies per package (this is what uv does). For the locker this is effectively writing down the graph they came up with during resolution. Installers then evaluate the markers of the dependencies to decide what other package(s) to install. This is not a resolution as you wouldn’t have backtracking.
The pro of this is it directly encodes what the locker came up with. It’s also a simpler and smaller lock file. This also means you can enter the graph from anywhere without necessarily encoding groups down to each package. The drawback is you can’t tell if a package would be installed simply by looking at its entry in the lock file.
So it seems the question is how much any of this matters:
- Simplicity
- File size
- Ability to enter the dependency graph arbitrarily
- Knowing whether a package can get installed just by looking at it
- Complexity entirely in the locker or shared a bit between locker and installer