Substituting file links with VCS links in the Simple Repository API

Hi! We are using a local index based off PEP 503 to serve our local packages.

It is not mentioned in PEP 503 (so it’s not officially part of the spec), but replacing links to dist files in the index with VCS links works if the #egg fragment is used to specify package name and version.

i.e. replacing:

<!DOCTYPE html>
<html>
  <body>
    <a href="https://example.org/group/package-1.0.0.tar.gz">package-1.0.0.tar.gz</a>
  </body>
</html>

with:

<!DOCTYPE html>
<html>
  <body>
    <a href="git+https://git@example.org/group/package@1.0.0#egg=package-1.0.0">package@1.0.0</a>
  </body>
</html>

This allows us to install source-based distributions directly from our git repos. However, I am wondering if it might be by chance, or even somewhat of a hack, that this works since not part of PEP 503. Indeed, it states:

The href attribute MUST be a URL that links to the location of the file for download, and the text of the anchor tag MUST match the final path component (the filename) of the URL.

An issue is that since egg distributions are now deprecated (The Internal Structure of Python Eggs - setuptools 74.0.0.post20240830 documentation), the #egg fragment, although currently supported by pip (pip install - pip documentation v24.2), seems a bad idea to use (or is it in plans to continue using egg_info?)

Therefore:

  1. Is it conceivable to officially allow the index to point to a VCS?
  2. Is there a replacement for the #egg fragment? i.e. what can I use to specify package name and version when linking to VCS? Looking at pip’s source code, there doesn’t seem to be.

Naively, I tried to have our index set the anchor’s href value a PEP 508 URL:

package@git+https://git@example.org/group/package@1.0.0

but this fails with pip complaining that the trailing .0 is not a supported file format.

So while I can do from the CLI:
pip install package@git+https://git@example.org/group/package@1.0.0,
I cannot use this link in my index since pip expects a file if an #egg fragment is not present.

Conversely, if an #egg fragment is present, pip doesn’t check if the link is to a file, and retains it as a candidate for installation.

You could propose a PEP to update the index spec to allow for it.

@brettcannon Thank you for your response. This topic has not gathered much interest. I may make a proposal but it seems a little intimidating to me.

1 Like

I am surprised this works. Have you tried other installers besides pip?

I am not sure I like the idea. In principle this seems helpful of course, but I am not sure it belongs in the Simple Repository API, it feels a bit against the spirit of this API.

Just a random opinion, don’t let this discourage you from pushing the idea further.

1 Like

I’m similarly surprised this works, and I would bet a nominal amount of money that it’s an implementation quirk :slightly_smiling_face:

Egg uploads to PyPI have been disabled since PEP 715 last year, and there’s been some intermittent discussion about killing off #egg fragment support entirely (since it’s unstandardized + functionally overlaps with the standard PEP 508 syntax): Killing off the `egg=` fragment once and for all?

I’m absolutely certain that if it works, it’s an implementation quirk. The simple API spec says

The href attribute MUST be a URL that links to the location of the file for download

While it doesn’t give any details, I’d be perfectly comfortable assuming that “links…for download” implies something that urllib.request.urlopen or requests.get can fetch a file from - to hte extent that I’d support that as a proposed clarification to the spec. We certainly can’t assume that all simple repository API clients are able to handle git+https URLs…

2 Likes

Thank you for all your responses.

After reading the PEP 503 and some of pip’s source code, my impression is that it’s not by design that it works.

@sinoroc

I am surprised this works. Have you tried other installers besides pip?

It doesn’t work with uv. uv does not support the #egg fragment in the URL and removing it from the URL basically removes the package name and version information (which now both pip and uv cannot find).

It’s certainly useful for us at my organization. We use it with an internal self-hosted Gitlab instance to serve our packages via a package index. I’ve begun to use Gitlab’s package registry feature to serve wheels and do it the proper way.

I’m hesitant to formally propose it as I don’t know just how many people would actually find it useful.

If deployed on the web, such VCS links could, I suppose, prove to be a security risk. I don’t know if such links could end up on pypi or not.

I’ve found a few github issues/discussions that are somewhat related, although none of them talk about installing packages via a package index. They just consider installation of package by explicitly passing the URL to pip.

@pf_moore

something that urllib.request.urlopen or requests.get can fetch a file from - to hte extent that I’d support that as a proposed clarification to the spec.

Indeed, if I had seen such a clarification in the PEP, I would have had my questions at least partially answered from the start.

I think that in principle it could be standardized and package index server implementations could choose to offer it or not. Seems obvious that PyPI would not, but as you said it might be great for private internal servers. If I am not mistaken there is precedent for such a situation.

You’d also need to standardise the meaning of the git+https URL scheme (as well as hg+https and bzr+https, if you want to support all the schemes pip supports) plus the meaning of the #egg fragment. Furthermore, you’d have to make support of these optional, as you cannot require every client that wants to fetch distribution files to include VCS support. So you need to define the required behaviour if a client that doesn’t support such URLs encounters one when scanning an index.

You’d also need to mandate that the file referenced by the URL won’t change - which will be hard to enforce, and in general isn’t even true of VCS URLs - otherwise clients that cache files fetched from the index will be broken.

It’s a lot of work, and I’m not sure the benefits would justify the effort. But if someone wants to put together a PEP and propose it, I guess it could happen.

3 Likes

You’d also need to standardise the meaning of the git+https URL scheme (as well as hg+https and bzr+https , if you want to support all the schemes pip supports)

Isn’t such a scheme already standardized in PEP 508? I assume that most installers, if not all, already support this scheme. PEP 508 states that the URL form for specifying dependencies is taken from PEP 440. In turn, PEP 440 says:

This PEP is a historical document. The up-to-date, canonical spec, Version specifiers, is maintained on the PyPA specs page.

Looking at the PyPA spec page, we can find this topic: Direct References Version specifiers - Python Packaging User Guide

And in there, there is a clear statement:

Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.

which leaves the impression that direct references could be used for private indexes.

So, I agree if such a feature would exist, it would have to be optional. Not to be turned on on public indexes, but possibly on private ones.

You’d also need to mandate that the file referenced by the URL won’t change - which will be hard to enforce, and in general isn’t even true of VCS URLs - otherwise clients that cache files fetched from the index will be broken.

This a good point. Not sure how to reconcile this with VCSs. Maybe if the use of the @<commit-hash> or the @<tag> notation in VCS URLs is enforced, but that supposes nobody actually edits the repo’s commits or tags. For a private index for internal use purposes it shouldn’t be an issue.

At least, if your tags are also your versions, using PEP 508 / PEP 440-style URLs would allow for specifying both package name and version without using the #egg fragment.

'SomeProject@git+https://git.repo/some_pkg.git@1.3.1'

It’s mentioned, but its behaviour isn’t standardised. If I understand what you’re suggesting, you want to tighten up what’s allowed and reqired.

Direct references are a different matter. The important point you need to consider is that a direct reference is not a URL - the name @ is a key part of a direct reference. You can’t serve a direct reference as a link target in a webpage.

Anyway, these are all questions you’d need to sort out if you are interested in writing a PEP. It’s probably not worth getting into this much detail unless you are going to do so.

Thanks for the clarification. Yes, that wouldn’t work as part of or an addendum to PEP 503 with HTML link targets, but one can imagine other ways an HTML index could work. For example using data- attributes, or storing the information clients need in other tags than <a>.

But you are right, it’s just an exercise in thought at this point.

@leducvin

Out of curiosity, can you give a bit more details on how you do this currently?

Who or what generates the PEP 503 pages? Did you modify an existing tool? Do you generate the HTML directly? You mentioned something about GitLab, right?

Sure. We wrote a small flask app that uses python-gitlab to fetch repo data from Gitlab’s API.

We configured pip to point to our local index URL instead of pypi. When pip makes a request to our flask app for somepackage==1.0.0 it generates on-the-fly the list of links for that project.

<a href="git+https://git@example.org/group/somepackage@1.0.0#egg=somepackage-1.0.0">somepackage@1.0.0</a>

As I said, one link for each version/tag of the repo. If the tag 1.0.0 exists in the repo, pip will find it and install it from VCS.

If there is no such project in our Gitlab instance, then the connection is redirected to pypi.org/simple.

It’s been pretty convenient, as we can do pip install somepackage and if that package depends upon anything form pypi, pip is still able to install the dependencies. It also freed us from building wheels or sdist archives for our local package ecosystem.

I did run into name collisions a couple times, where I thought our local package was being installed but instead a remote package from pypi was being installed. This occurred whenever I would have omitted to set the index URL properly in special environments (mostly CI, docker).

2 Likes

Nice, seems like a reasonable setup. : )

It brings simpleindex, back to my mind. I wonder if maybe a similar result could be achieved with a “custom route type” for VCS packages. But I guess that would require that all your internal packages have a name with some recognizable pattern (same prefix for example, which might also help a bit with preventing name collisions).

1 Like

Interesting, looks like we could have used simpleindex. Oh well, not the first time we are guilty of reinventing the wheel. No pun intended.

1 Like