How to specify extra-index in a pyproject.toml for pip and pip-tools?

moehrenfeld · February 6, 2023, 12:57pm

I’m currently trying to wrap my head around using pyproject.toml for a python application. This application has a dependency that is not found on pypi.org. I’m using pip-tools for version pinning and as far as I understand it this tool can also use pyproject.toml as source for dependencies.

The only problem is I can’t find out how to specify an extra-index url that pip/pip-tools can use to download the dependency from our internal repository? I’ve searched but it seems there is no answer to that. Is this just not possible right now?

Regards

Sebastian

pradyunsg · February 6, 2023, 1:04pm

The way to configure that with pip/pip-tools is to have it live in your pip configuration. This configuration is not project-specific. Configuration - pip documentation v23.0 has the details on how pip loads the configuration.

pip config set global.index-url foobar or pip config set global.extra-index-url foobar would be how you’d set this via the CLI.

moehrenfeld · February 6, 2023, 1:17pm

But this index is only relevant to this project, not globally. Also users that install the project don’t necessary have the index set globally.

fungi · February 6, 2023, 2:09pm

Years ago, Pip and PyPI intentionally removed the ability to make a
package automatically install a dependency from somewhere other than
PyPI. This was done for security reasons, so that users have to
intentionally and knowingly opt into installing packages from other
places. What you’re trying to do is unsafe and the tools we have now
make it impossible for very good reasons.

pradyunsg · February 6, 2023, 2:18pm

That’s not true – this isn’t inherently unsafe.

Yea, this isn’t currently possible to be specified via pip/pip-tools.

fungi · February 6, 2023, 2:48pm

Sorry, I probably used the wrong word. What I meant is that automatically installing packages from somewhere other than PyPI without the user’s consent or knowledge violates security expectations (at least it violates my security expectations). When the ability to have pip automatically recurse to packages hosted off-PyPI and corresponding ability to have projects on PyPI refer to external downloads were removed, securing the package ecosystem was among the reasons cited.

barry · February 7, 2023, 5:19am

This overlooks for example corporate environments, where CI systems simply do not have access to the internet (and thus pypi.org) for their own very reasonable security constraints.

I do wish there was a standard way to specify this in pyproject.toml. E.g. PDM supports it directly but hatch only does so indirectly. I find the PDM way much more natural and would like for its syntax to eventually be standardized.

pradyunsg · February 7, 2023, 7:40am

Are you referring to PDM Tool Settings - PDM in PDM?

moehrenfeld · February 7, 2023, 8:33am

Poetry also has the ability to specify this in pyproject.toml.

barry · February 7, 2023, 5:26pm

Yes

Great! So, we can standardize this now, right?

sinoroc · February 7, 2023, 5:37pm

I feel like these things do not belong in the pyproject.toml file. This file is pushed to the shared source code repository, so it has to work for all developers. But it is not rare for some developers to want to use different indexes than other developers. If I recall correctly, there are for example cases of developers in some parts of the world who have to use different indexes (access to PyPI is not possible). So you can not have a setting that satisfies all members of a team.

At the very least, if extra-index-url (or similar setting) ends up in pyproject.toml, then I would recommend to make it very easy to override with a user-local setting. Override via environment variable is probably good enough, but not comfortable. I guess some file that is not git-pushed would be best.

dstufft · February 7, 2023, 5:41pm

I’m not really a fan of putting repository information in pyproject.toml.

That is not package configuration, and shouldn’t be shipped to a repository at all, packages do not define where to get themselves from, hard stop. It is configuration for the installer, and the person invoking the installer must have ultimate control over the repositories, not random packages that they’ve downloaded.

You could argue that if you’re say, invoking pip (or another installer) in a directory that has a pyproject.toml file in it, then pip could read the repositories from that specific file for configuring the repositories, but it should never read that information from packages it’s installing.

The fact that it’s only supportable when it exists as a file on your machine, at the top level, is why I’m not a fan of having it in pyproject.toml. I think it confuses people because they’ll make sdists that have that in their pyproject.toml, which pip etc should 100% ignore, but it’s non obvious to people why that is the case. This is already a problem with people confusing the difference between requirements.txt and setup.py/pyproject.toml, I think it will just make the problem worse.

barry · February 7, 2023, 6:19pm

I think that’s a good point, and I’m mostly thinking about this from an controlled environment point of view (i.e. all packages are internally developed, so we know where pip should be pointed to). Such a facility could be documented as narrowly relevant for those use cases and “public” packages shouldn’t use it. It’ll likely get abused though, and that does mean override files and/or pip options. Maybe it’s just better for the internal package use case to guarantee that there’s a pip.conf file in the right place with the appropriate settings.

A different approach would be some kind of hierarchical tower of configuration files, where like things could fall back to, or override to, a build system or global file system location. But then it gets complicated. So yeah,

pf_moore · February 7, 2023, 9:47pm

We already have a hierarchy of pip configuration files. Adding another “per project” layer isn’t technically hard, just complicated to manage Or you could use a .env file (I don’t actually know how those work, as they aren’t a thing on Windows, but per-directory env variables, I believe?) to set pip config via environment variables.

We have plenty of options. And the pharmacy can supply the headache tablets

uranusjr · February 8, 2023, 4:15am

I think it’s a good idea to have a standard (read: not specific to pip) way to define this (plus other things), but pyproject.toml is the wrong place. A separate file feels more correct to me.

encukou · February 8, 2023, 10:40am

They’re not an OS thing, just a convention some tools/libraries use.

sinoroc · February 8, 2023, 11:03am

It might be direnv that was meant. It sets and un-sets environment variables (such as PIP_EXTRA_INDEX_URL) automatically when cd-ing in and out of directories. It would require no change in pip (or pip-tools or whatever), but it does not seem to be usable on Windows (or at least not straightforward), because it is based on the shell.

On the other hand a dotenv approach seems like it would require some changes in pip (and others).

pf_moore · February 8, 2023, 11:55am

Yes, I was thinking of direnv. I don’t personally see much point in trying to support .env files in pip, as we have our own config mechanism, and it certainly wasn’t my intention to suggest that.

pradyunsg · February 8, 2023, 12:18pm

Some relevant context below without my opinions…

Fundamentally, the want for the ability to specify the index URL comes from the need to carefully control how “abstract” requirements (eg: urllib3) are converted to concrete ones (eg: https://files.pythonhosted.org/packages/fe/ca/466766e20b767ddb9b951202542310cba37ea5f2d792dae7589f1741af58/urllib3-1.26.14-py2.py3-none-any.whl). Typically, wanting to use an index server that’s not pypi.org or things in addition to pypi.org.

A thing that’s relevant is that the workflow tools like Pipenv, Poetry and PDM, are currently operating with a different model than pip is.

The whole situation with Pipfile and pipenv is [redacted opinion] but the want for a better format to specify dependencies to tooling, along with where to pull them from, is something we’ve known about for a while and haven’t made much progress on (in pip) over the last few years. Notably, Pipfile was intended to be requirements.txt 2.0, and, how/why Pipenv went down the route that it did with that format is a separate conversation.

omnipy · May 27, 2023, 3:41pm

You imply that packages hosted on pypi are secure. We all know that’s not the case. Not even close. There are countless examples of malicious packages. If anything it gives people a false sense of security: it comes from pypi, so it must be trusted, right? well: no.
AFAIK suppying another index is still stupported in requirements.txt and requirements.txt can be dynamically linked from pyproject.toml. So, it’s already possible.
The use case and demand is there as you can clearly see in this discussion (arguements such as corp repos have already been named). If there is no build in way to do it ppl. will just add bat and bash scripts to their projects (seen that quite often in projects) that execute pip with alternative index locations. What’s the result? Safer projects? I doubt it. If you execute files you better know what you’re doing. “We are all consenting adults here”, right?
With integration to pyproject / pip a possible integration could look like this:

alternative index(s) are supplied with pyproject.toml
for every package pip first checks the indexes it already knows and trusts
if the package is not found, the alternative indexes are iterrated
for every alternative index ask the user if he wants to trust that index and if so if that index should be added as a permanently trusted index, effectively adding it to the pip config of the user (otherwise it’s only used this single time, same as if user would have supplied the -i option to pip)
the alternative index is check for the package, if found, the package is downloaded and installed, otherwise continue with the list of alternative indexes

All this would do is streamline current behaviour:

Instead of having 20 ways of adding alternative indexes where packages can be found (requirements.txt, bash or bat scripts and so on) there would be one universal, system independend method.
it argueabley adds security since the user is explicitly asked to trust the index instead of it just happening when he executes a bash script. if a “–trust-all” flag in pip that would auto-trust all alternative-indexes that are supplied in the pyproject.toml, makes sense for automated installes is to be discussed. but likely it’s required because of ci pipelines. but even with that flag: the user has to explicitly set it, which imho is more secure then bindly executeing bash scripts.
its more user friendly because the process of adding new indexes is now automated. the user doesn’t need to remember the places dependencies come from and keep updateing and syncing his pip config across machines. installs will “just work”.

This all follows the Python Zen.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.

But of course with such a high priority on alternative indexes chances of package name collisions - and malicious exploitation of those - will increase. universal globaly unique package names should be discussed. the most simple method I could think of is to give pip a explicit method of telling which index to use for a specific package. package name collisions then could be handled on a package registry level. myindex.com/package-a would be something different then pypi.org/package-a. And a dependency could use any of them by explicitly adding the index to use, rather then the current “round robin” behaviour of pip that just cycles all indexes until it finds a first match (hopeing it’s the package we want, and not a different of even a malicious one).