Add archives in Git to Direct URL schema

packaging.python.org PR: Add archives in Git to Direct URL schema by konstin · Pull Request #1798 · pypa/packaging.python.org · GitHub, reposting here for visibility.

Many packaging tools, such as pdm, hatch and uv, extend over PEP 621 to allow relative paths in their pyproject.toml (pdm, hatch, uv). This means you can have a dependency on a git repository, and the package in that git repository has a dependency on another package in a directory or file in that repository.

Currently, direct_url.json can only represent subdirectories in a git repository, but not files. I’m proposing to fix this asymmetry and make this transitive source representable. This adds a new key file to direct_url.json that contains the relative path to a source distribution or a wheel. This key is mutually exclusive with subdirectory, which does the same except pointing to a directory instead of a file.

Unlike other entries in direct_url.json, this can not be translated back to PEP 508 lacking both archive-in-git and relative path support in PEP 508. This change allows to correctly represent the installation for tools that do support relative paths in dependencies.

The name of the key is up for bikeshedding (as is other terminology that we want to align to existing usage).

1 Like

I’m not sure I follow how PEP 518 is relevant here? The only requirements in PEP 518 are build system requirements. Did you mean PEP 621?

Regardless of that, I’m a little uncomfortable about having something in direct_url.json which can’t be interpreted by a standards-compliant tool. Is that what you’re saying when you refer to not being able to translate it back?

I’ll also note that this is an interoperability change, and as such I think it will need a PEP and not just a PR to the specification.

So you want the commit ID as well as a file path to an archive in direct_url.json instead of only allowing for a source tree? I’m assuming the use case is people committing wheel files and such to their Git repo to version their dependencies.

Sorry got the numbers mixed up, fixed in the text.

Currently, direct_url.json is lacking the capability to express a feature supported by at least uv (depending on a file by relative path in a git dependency), and the PR aims to close this gap.

As I understand them, PEP 508 and PEP 621 is the shared consensus of what all tools need to support, while other common features, such as relative path dependencies and editables, need to be expressed in tool specific metadata. direct_url.json for example has an editable field, which can’t be expressed in PEP 621, but is required to properly express a common installation mode and allow tools to check whether it matches their resolved requirement.

yes and yes :slight_smile:

Ah, I see. My understanding of direct_url.json is a little different. As part of the installation metadata, it is recording where the installed project came from. That is independent of PEP 621 and dependency specifiers, as those are about how to find dependencyes in order to install them. So I don’t think there’s an issue with being able to convert back to PEP 508 as such.

Where there might be an issue is with using this data. The original motivation for PEP 610 (which introduced direct_url.json) was operations like pip freeze or more generally environment locking[1]. The important question is whether tools like pip freeze can reconstruct a way of building the environment from direct_url.json.

If you install the case you mention in the OP:

This means you can have a dependency on a git repository, and the package in that git repository has a dependency on another package in a directory or file in that repository

right now, using uv, then what happens when you run pip freeze on the environment that’s created? What do you expect to happen if this proposal is accepted and uv writes the new data? The key thing here is that environments aren’t tool-specific. You can install stuff into an environment any way you like, and not be tied to a specific toolset.

To look at this another way (and invoke the spectre of the lockfile discussions) I would expect that when we get standardised lockfiles, one key use case would be a tool that reads an environment and writes a portable, standard lockfile that can be used by any tool to recreate that environment. Will this still be possible with this proposal? How will the new data be translated by such a locker?

Broadly, this proposal seems fine to me, but I’d like to see a bit more covering how the data is expected to be used in a tool-agnostic way.

(On an unrelated note, do our existing standards allow installers to add their own tool-specific data files to the installation .dist-info directory? I can’t think of anything that explicitly prohibits it, but the lack of any form of namespacing in .dist-info probably makes doing so a bad idea. At some point, we really do need to assign .dist-info/tool/<tool name> for tool-specific use and reserve all other names for future standards - but that’s a separate PEP).


  1. I hope I haven’t doomed this proposal now by linking it to locking :rofl: ↩︎

It should be (whether it’s with an update to PEP 751 or this idea lands before my PEP and so I have to make an update). PEP 751 purposefully covers what direct_url.json records and I don’t see this being any different. Basically packages.vcs would need some archive-path or some such key to represent that the VCS checkout isn’t a source tree itself but a delivery mechanism for an archive, or packages.wheels, packages.sdists, and packages.archive need to be able to specify that the file is coming from a VCS. Otherwise I loosen the wording around packages.archive to not only be source trees and pull in VCS details there. So it’s definitely representable, it’s just a question of what the preferred way is.

2 Likes

I would expect pip freeze to fail, since it can’t represent the file-in-git case in its format. I think that is fine, because otherwise we’d either forbid package installation that isn’t supported by pip, or we write incorrect direct_url.json information and make pip freeze capture something not matching the installed state.

I’d expect this specific use case to be supported by a lockfile standard (with the implementation that Brett described). For freezing from an installed environment, the locker needs to support all the features that the installer used. I see the main use case in tools that go directly from requirement specification to lockfile without an installation step in between, otherwise it’s hard to maintain that installer and locker match.

Sorry if I wasn’t clear. I was focusing too much on pip freeze because that’s the original motivation for direct_url.json. I should probably have asked the broader question of what this proposal means for pip.

Pip is not only a consumer of direct_url.json (via pip freeze) but also a writer of that file (when we install a wheel). Both of these cases need to be considered.

I’m not particularly comfortable with breaking the expectation users have that pip freeze gives a file that can be installed (with pip) to reproduce an environment. It’s not a hard requirement, as the freeze format is neither a standard, nor a full lockfile. But I thought it was a long-standing expectation of our users. You’re saying that people can, right now, use uv, pdm and hatch in a way that direct_url.json doesn’t correctly support, suggesting that that expectation wouldn’t stand. I’m trying to understand how often that happens, right now, and what failure users would see at the moment. I’m not particularly happy with the suggestion that pip freeze should fail in the future if it encounters this new data, but to evaluate my position properly, I need to know what the alternative is - i.e., what pip users see now.

Beyond that, pip writes direct_url.json. Specifically, this code is how pip currently handles direct_url.json, and I want to understand how it would change. I want to know what the wheels that uv, pdm, and hatch generate for this situation look like, so that I can understand the input that an installer sees. Because to be perfectly blunt, I don’t understand how a standards-conforming installer would know that it needed to write this new data.

I can only speak for what uv is doing, not for pdm and hatch; I pinged them too because they too are extending PEP 621 with relative path support so i expect they have similar problems.

I’ve made an example repo: GitHub - konstin/project-in-git. In this repository, we use the wheel for prebuilt from the vendor directory in the root project project-in-git. This has to happen through a non-PEP 621 mechanism, since PEP 621 doesn’t support relative paths (similar for editable installs, but i’ll keep it to an MRE here):

[tool.uv.sources]
prebuilt = { path = "vendor/prebuilt-0.1.0-py3-none-any.whl" }

When using this git repository in another project usercode, we want to read this file from the git repo just as if we were in the repo itself, and not go to pypi for prebuilt. I’m aware that this is a uv-specific extension, but we currently don’t have a way to record this in the [project] table.

uv add git+https://github.com/konstin/project-in-git

This puts us in a dilemma for writing the venv in usercode: We can’t record prebuilt in direct_url.json, since there’s no schema-compliant object to describe it. We’re currently using direct_url.json to determine whether a package installed in the venv is fresh: If its contents match our resolved requirement, we keep it, otherwise we have to remove it and install the correct package (i expect it will be the same situation for PEP 751 installers that want to determine what they need to change in the venv). Without support in direct_url.json, we would need to start an out-of-band recording of prebuilt’s provenance for uv sync, breaking other tool’s provenance process in the process.

While i see the motivation of direct_url.json for pip freeze, i’ve seen its usage shift towards being used to identify whether the package version installed matches a resolved requirement in a cross-tool, interoperable way, and i’m trying to improve its coverage to avoid non-interoperable extensions [1].


  1. the only non-standard file we currently write to the venv in uv is our .lock multi-process advisory lock to avoid parallel writes ↩︎

Thanks for providing an example. It’s much clearer to me what you’re saying now.

But what I don’t understand is how I’d install this project using pip. I think your answer is that you can’t - which is where I have a problem, because if a project isn’t installable using a standards-compliant installer, then I don’t think we should be discussing how the standards record the installation of that project.

I tried cloning your project and running pip wheel ., and I got the following error:

❯ pip wheel .
Processing c:\users\gustav\appdata\local\temp\tmp-1\project-in-git
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
INFO: pip is looking at multiple versions of project-in-git to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement prebuilt (from project-in-git) (from versions: none)
ERROR: No matching distribution found for prebuilt

The same if I try to do pip wheel <sdist> on the sdist produced by uv build.

So I’m sorry, but I really don’t see this as something we should support. Our packaging standards are explicitly about avoiding tool lock-in, and source distributions (or project source trees) that can only be built with a specific tool are something that IMO we should be strongly discouraging. How would someone publish this sdist on PyPI, for example?

If we want this type of workflow (and I completely understand that it’s a reasonable suggestion) then we should be looking at how we capture the sort of dependency on a vendored wheel that your example uses in the metadata standards first. Once such projects are buildable using any standards-compliant tool, then we can look at whether we need any changes to direct_url.json (and at that point, we may well not, as the sdist’s metadata might handle it just fine on its own).

If you really don’t want to standardise this workflow, then I think what you really need is a way of storing tool-specific metadata in the installed project’s .dist-info directory. That’s been discussed on a number of occasions, but no-one has ever been motivated to take it any further. But I’d be concerned if your motivation for doing so was simply to avoid facing the issue that uv seems capable of producing unpublishable sdists, without warning the user of that fact.

I would be very interested to know if the support in hatch and uv has the same problem, of ending up with a sdist that cannot be built with pip or build. @frostming @ofek - is this purely a uv issue, or do you have the same problem?

Yes this is something that should be supported. See this issue on packaging’s repo. Hatch had to come up with a way to insert the project directory in order to create a full path because relative paths are not supported by PEP 508.

Hatch does not have that problem exactly because it uses pip (by default) for dependency resolution so in that case all problems are inherited.

edit: it goes without saying but to be clear this use case is for non-libraries i.e. not projects on PyPI, uploading a package where a dependency uses a direct reference produces an HTTP error

edit 2: Hatch has an option to allow such non-compliance but in hindsight I probably shouldn’t have added that and instead forced users to define environment dependencies rather than broadly allowing other environment managers for that specific case