PEP 665, take 2 -- A file format to list Python dependencies for reproducibility of an application

It is a conscious effort (well, my conscious insistence) to avoid Lock File in both the title and abstract because I observed some people have a preconceived perception to this term (that’s different from the definition used by PEP 665), and tend to want to force what they think the term means onto the PEP. I want to make the title and abstract as unambiguous as possible by only describing the idea with better defined terms instead.

Would “PEP 665 – pylock.toml: A file format …” work?

1 Like

Allowing for sdists and source trees is an open issue (and there are other open issues to discuss as well).

But the trick is how do you pin build dependencies? All you can actually pin is the builder, but not what it may need to do the build (i.e. you can pin what is specified using PEP 517, but that isn’t the complete story). It also then makes it so the installer has to have complete resolution logic as you can’t know ahead of time what the source tree will require to be installed once built, and this is very likely true for an sdist as well (unless they fill out their PKG-INFO, use core metadata 2.2, and don’t have Requires-Dist marked as dynamic). As of right now the installer’s logic is much simpler on purpose so as to not require a complete SAT solver.

That was an explicit ask in the last round to make it easy to know when the lock file may be out of date based on someone’s personal experience with requirements files.

Good point. Unfortunately there’s no way to resolve that conflict, so we will have to choose which use case is more important.

Sure, happy to figure out a way to pull it in.

That’s the point that @frostming made and it seems like a good point, so I will change the PEP to say the date time must be in UTC but otherwise leave the format requirements to the TOML spec.

1 Like

If you’re doing reproducible, you generally override all the timestamps on everything. So you’d override this one too - tools (that want to be reproducible) should use SOURCE_DATE_EPOCH or whatever their input is.

Having the field is important. When everyone involved consents to lying about it, then it’s okay to lie about it.

2 Likes

I just pushed PEP 665: address feedback (#2134) · python/peps@4b865b9 · GitHub to address the following:

  • Wording/clarifications from @bernatgabor , @encukou .
  • Drop formatting restrictions for created-at (still must be UTC); thanks to @EpicWink , @frostming , @encukou .
  • Listed pip and PDM as supportive of the PEP (and implicitly pip-tools since the current ideas floating on the pip issue tracker would somewhat obsolete pip-tools :sweat_smile:).

Should already be rendered and available at PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org and updated above.

2 Likes

In my experience, support for relative paths in file URLs is inconsistent. Some tools seem to support them, other tools don’t. For example, is file://c:/some/path relative or absolute? On Unix, it’s “obviously” relative, and on Windows it’s “obviously” absolute. I wouldn’t trust how a tool would interpret this without experimenting, so IMO it’s risky to assume that a locket and an installer will interpret the URL the same way.

file://c:/some/path is a file URI where the hostname is “c:” and the path is /some/path. It’s absolute both on Unix and Windows. I’m not aware of file URIs being able to express relative paths.

1 Like

My understanding is that relative file: is never really a thing. There are parsers out there that interpret various “URLs” as relative paths, but those are caused by either parsing some URL forms incorrectly or allowing technically invalid inputs. None of the RFCs actually allow file: to contain a relative path. TBH I thought we were only covering absolute paths in the current draft, leaving relative paths out the same way as we leave out building from source.

2 Likes

For installers to select a dist from a multi-dist version array do they have to parse the URL then decompose the filename? Would it make sense to have a tag field on version array objects to complement metadata.tag or maybe require that the filename is always present?

Does metadata.tag need to be a scalar, wouldn’t that cause issues? For example if a lockfile was created on an M1 Mac under Python 3.9 and the “compressed” tag is, say, py3.py39.cp39-none.cp39-any.macosx_11..., the complete tag set is the product of all the sub-tag combinations, which implies that all combinations are valid even though e.g. py3-cp39-any might not be (?). Copy your lockfile over to a Windows machine with Python 3.8, does the locker need to populate the lockfile with metadata for CPython 3.9 on Windows to satisfy the expanded tag set?

Comparing this to the first draft the tag field used to be an array, why was it changed? Also, with the addition of requires-python, what should happen when it conflicts with marker?

Yes. If filename is not present, the locker promises url.rsplit("/")[-1] is the filename. Maybe the current wording does not make this clear enough? I consider decomposing a wheel filename a trivial operation and does not warrant an extra redundant field.

The locker should not produce this value. It can still produce multiple lock files with different tag values that combine to cover all platforms, but personally I am skeptical anyone would ever bother to specify platform requirements in such a fine-grained fashion. If this use case really becomes somewhat popular, we can easily amend the spec to also allow an array.

Then no platform can satisfy this lock file’s requirements, and the installer should reject it.

1 Like

I would argue that use cases for absolute file paths are very scarce… It requires the application (and its lock-file) to be placed inside a bigger, non-relocatable, file structure that is replicated in every installation. And that is very tricky to maintain…

On the other hand there are 2 common use cases for relative file paths:

  1. The application “vendorises” patched versions of libraries in subfolders (either as folder/submodule or sdist/wheel file – e.g. ./_patched/unmaintained_lib_that_needs_patching)
  2. The application is developed as part of a “monorepo” and dependencies are in “sibling directories” (e.g. ../mylib)

If support for relative paths is inconsistent, maybe a good solution is to define a “string replacement”,
for example ${__LOCKFILE_DIR__} (or any name really) that expands to the absolute path of the directory where the lock file is placed, and allow people to do:

  • url = "file://${__LOCKFILE_DIR__}/_vendor/lib" or
  • url = "file://${__LOCKFILE_DIR__}/../mylib".

This kind of string can be solved via

  • os.path.normpath(string.Template(url).safe_substitute(__LOCKFILE_DIR__=...)),
    or even os.path.normpath(url.replace("${__LOCKFILE_DIR__}", ...))
    (could be defined in the PEP)

which is doable even if the installer is not implemented in Python.

1 Like

Regarding the URL, may the scheme be anything supported by the installer? Or are you expecting only HTTP(S)?

By allowing installers to go elsewhere to retrieve the package (as long as hash matches), does that implicitly allow package repository mirrors?

Anything the installer can support is OK. In practice though, since the locker can’t know what installer will be used ahead of time, I imagine we would eventually come up with a set of “common enough” schemes that tools are expected to support (e.g. at least http[s] and file, a few more if we extend support to building from source).

Yes, that’s exactly the intention.

Oh no you’re spoiling my next PEP idea. This is almost exactly what I was going to propose once this PEP is accepted. But I think introducing this right now would muddy the discussion too much since there are still much in the air around the general structure of the file—let’s get that agreed on before we get into supporting more semantics for value variants that can be done within the overall data structure.

p.s. But please feel free to start drafting that PEP before this one is finished. The earlier it’s started, the earlier it can be discussed and potentially have a chance for acceptance.

1 Like

I misread the description of tag and naively assumed the locker would combine all the wheel tags from the package array into tag. Could you explain how this field might be used?

It’s like tags on a wheel, but the lock file is that wheel (and only has dependencies). If a platform matches the tags, the lock file can be installed on it, otherwise the installer should reject it.

1 Like

Absolute paths are useful in build pipelines that place artifacts in a shared artifact directory.

2 Likes

I’m also OK with this. @pradyunsg , what do you think?

And if we change this should we keep it the key name of url or change it to uri?

1 Like

I just pushed a small update that lists Pyflow as supportive of the PEP.

1 Like

Yeah I agree with @pf_moore here, file:// AFAIK doesn’t support relative paths, e.g file URI scheme - Wikipedia calls out a couple of examples that show that the path after the host points at an absolute not a relative path on Unix / POSIX systems.

Separating out local paths for editable installs from the URI / URL would be great.

1 Like