What is the correct interpretation of path-based PEP 508 URI_reference?

In PEP 508, the grammar allows for the URI_reference part of a url_req to be a path, like

pip @ localbuilds/pip-1.3.1-py33-none-any.whl
pip @ /localbuilds/pip-1.3.1-py33-none-any.whl
pip @ ./localbuilds/pip-1.3.1-py33-none-any.whl
pip @ ./localbuilds/pip

Currently pip doesn’t support this (see pypa/pip#6658), possibly because packaging doesn’t support this (see pypa/packaging#120). In order for pip to support this, we need to decide how to interpret a path given in the URL reference.

I see two possibilities:

  1. We assume this is always a filesystem path and interpret it relative to the parent directory of the unpacked filesystem resource we consider as “containing” the direct reference
  2. We interpret this relative to the input for the package which has the given requirement

Some examples:

  1. pip install ./pkg, where ./pkg contains a setup.py and which has an install_requires=['dep @ ./dep']
    1. pip installs ./pkg/dep (a directory containing a setup.py)
    2. same as above
  2. pip install ./pkg-0.1.0.tar.gz, where ./pkg-0.1.0.tar.gz is an sdist containing a setup.py which has an install_requires=['dep @ ./dep-0.1.0.tar.gz']
    1. pip installs <req-build-tmpdir>/dep-0.1.0.tar.gz, where <req-build-tmpdir> is the temporary path into which pkg-0.1.0.tar.gz was extracted
    2. pip installs ./dep-0.1.0.tar.gz
  3. pip install "pkg @ git+https://github.com/user/pkg.git@abcd1234", which refers to a git repository containing a top-level setup.py, which has an install_requires=['dep @ ./dep']
    1. pip installs <req-build-tmpdir>/dep, where <req-build-dir> is the temporary path into which the git repository was cloned
    2. pip installs git+https://github.com/user/dep

Another possibility is that we pick what “makes sense” for the use case, but that may not apply across the board and leads to packages that may only work with tools that interpret these references the same way.

I didn’t see anything that clarified this on the distutils-sig threads I could find on the subject here and here. PEP 440 seems to leave a lot of decisions for tool developers.

I vague recall there was a discussion somewhere (in Poetry’s tracker?) about how tools should infer version from a URL specifier, but can’t find it right now. @ncoghlan do you remember?

I’ve updated the examples in case they were not clear. The issue is not with version (which I would assume could be given like dep==1.0 @ ...), but with finding dependencies. Sorry if I’m misunderstanding!

AFAIK this is not possible. PEP 508 gives the following rules:

name          = identifier
extras_list   = identifier (wsp* ',' wsp* identifier)*
extras        = '[' wsp* extras_list? wsp* ']'
urlspec       = '@' wsp* <URI_reference>
url_req       = name wsp* extras? wsp* urlspec wsp+ quoted_marker?

There’s no place for version specifiers, which is what started that previous discussion.


Indeed this is different from the problem you mentioned. I should have read the post more carefully, sorry.

IMO the natural choice would be to interpret it as relative to the URL of the parent artifact, i.e. your option 2. This would make deploying artifacts without an index more natural.

2 Likes

Thanks for pointing this out!

I agree for the index and local artifact use case this makes the most sense. Any build system that generates (and uploads) one package containing multiple archives should be able to do just as well do it for multiple archives.

For the VCS-originating use case the right answer is less obvious to me.

With option 1 we would be able to express dependencies within a single repository succinctly in a way that should work whether the VCS repo is local or remote. The alternative is that we specify the full VCS URL with the relevant subdirectory in the fragment, but this would not work locally.

With option 2 we would be able to refer to sibling repositories independent of the actual remote VCS host.

As an alternative we can just say there’s no natural way to resolve relative refs from VCS URL-originating distributions and reject them if encountered, leaving the tying-together of multiple projects to higher-level tools until there’s a clear winner.

Are there security implications if we allow relative URLs to navigate up the root of the parent artifact?

Another interesting topic will be the behavior of pip list and pip freeze with relative URLs. I suppose it should make them absolute first?

If locking dependencies with specific hashes, which I assume security-conscious deployments are doing, then only artifacts matching the hashes would be installed. Does that mitigate the concern?

Not just the behavior of pip list and pip freeze, but what should be stored in the direct_url.json in the first place. I made a comment here with specific related questions.

TBH, the whole idea of installing from VCS never makes any sense to me, personally :stuck_out_tongue: If I were to design pip from scratch today, I would never involve it in VCS interpolation, and instead require the user to manually checkout the code and install from that instead. This is not an option now, alas. In the same sense though, I would have zero problems if we go with the “alternative” approach, and simply reject relative paths in a VCS package.

If we must accept relative paths for VCS schemes, I would still go for option 2 since it is more consistent. And again, users can always checkout the repo themselves if they want to refer to packages in the same repo with relative paths.

1 Like

The manual equivalent is to download pkg-0.1.0.tar.gz, unpack it, change to the directory and run pip install . In that case (back to example 1.) install_requires=['dep @ ./dep-0.1.0.tar.gz'] refers to dep-0.1.0.tar.gz in the current directory, i.e. inside pkg-0.1.0.tar.gz.

So that would indicate that option 1. is actually more natural, and would produce the same behavior when downloading and unpacking the artifact manually or when providing the artifact URL to pip install.

To have option 2., it would need dependencies such as dep @ ../dep-0.1.0.tar.gz (if allowed at all).

Similarly when doing pip install ./pkg-0.1.0.tar.gz#subdirectory=subdir, relative paths in install_requires would be relative to subdir. So it would mean that (if allowed) relative paths must be considered relative to the location of setup.py / pyproject.toml. And I would disallow escaping from the artifact / VCS root.

Not really. Relative URIs are resolved based on directories, not path components (unlike Python’s os.path.abspath), so by extracting the file into a directory you’re changing the semantic meaning of the URI.

Given a relative path ./bar, on a page /foo/ it is resolved into /foo/bar, but on page /foo (notice the lack of trailing slash) it is resolved into /bar. So ./dep-0.1.0.tar.gz should resolve to a file besides pkg-0.1.tar.gz, not inside it. Which is why option 2 is correct IMO.

It depends what you consider the base URI is in the first place. Let’s talk about install_requires first:

  • option 1 considers the base URI is the file where the requirement is written, e.g. setup.py, setup.cfg or pyproject.toml
  • option 2 considers the base URI is the artifact in which the project is packaged

My point is that option 1 is more natural for developers. I argue it is also more predictable, by considering the relative URI points to something local in the project. It will then work the same with different ways to install the project:

  • pip install . from within the project directory
  • pip install git+https://.../pkg.git
  • pip install https://.../pkg-1.0.0.tgz

With option 2, the exact meaning of install_requires depends on where the artifact has been located, and the three above ways to install will likely break or give different results.

The other way relative URLs may be used, is as top level requirements, i.e.

  • provided on the pip command line, in which case the base URI is the current working directory
  • provided in a requirements.txt, in which case the base URI is the requirements.txt file

Using relative URLs in top level requirements is necessary and the behavior does not seem controversial.

Are there compelling use cases to allow relative URLs in install_requires?

I think we should go down the route of 1, since it avoids relative references to out-of-tree dependencies, which makes it easier to reason about wrt behavior of a package.

IMO an sdist should not be rendered not-installable if I change the directory it is kept in, or move a different file.

What is the use case for this type of URI reference? Unless I’m missing something, this discussion is purely theoretical. It would be easier to judge the options if we had at least some real world examples where this would be used…

Judging on issues inquiring this enhancement, most people are looking for a way to put multiple packages (either as a source tree or vendored wheel/sdist) in a project, and be able to run a single pip install . in the project root to set them all up.

OK. I’m not actually sure that’s something that I even think we should support in pip, TBH. Why not just use --find-links or something? But as long as we do have some real-world use cases, that’s the main thing, so thanks for the information.

I would support option 1 as well.

At the moment I’m very interested in this topic as we face a problem with poetry if a dependency, that should be installed, has a directory dependency (https://github.com/python-poetry/poetry/issues/1689). To fix this, we need to know what pip expect from get_requires_for_build_wheel / get_requires_for_build_sdist as a return value.

Further thinking about this, my opinion now lends toward disallowing relative paths in Install-Requires metadata.

The main reason is that such relative paths would only make sense when installing from sdist (with sdists inside sdists) or VCS. Building a wheel with correct Install-Requires would not be feasible in such case.

From what I understand of the use cases, there are viable alternatives (such as --find-links or vendoring).

Ditto. [more-characters-to-make-discourse-happy]