Dependency notation including the index URL

The problem is that even if your private index software allows you to mask packages, pip will happily “unmask” it if any other index also provides a package by the same name. That can only be fixed by avoiding referencing any indexes you don’t control, which is currently not “official” advice, and so people are unlikely to discover it by themselves until they notice their private package has been superseded by someone else’s public release.

Any form of more explicit masking would help here, as well as letting us document a mitigation or best practice, rather than simply talking about a risk. That’s a much more positive message to send. But first, we need to agree on how to explicitly constrain pip to only check certain feeds for certain packages.

Suggestions so far:

  • Name prefix: co:package matches co:package (and no colons allowed on PyPI)
  • Index prefix: co:package matches package if the index matches co
  • Constraints file: allow restricting look up of package to particular indexes
  • Prioritised indexes: ignore “lower priority” indexes if package is found on a higher pri index
  • Server-side name prefix: co-* packages can only be published on PyPI by authorised co users
  • Disallow multiple indexes in a single “install” command

Those are roughly in descending order of how much I like them as solutions, and they’re not all mutually exclusive. But if we want none of them, then I expect we’ll start seeing unfortunately negative-sounding advice coming out and have no way to counter it.

So would the best solution to promote this, or are there downsides or issues it cannot solve? devpi’s index inheritance is (from my understanding) the exact solution for this, and it is also not at all difficult to develop more limited (but still working) alternatives if you don’t need much of the other stuff devpi offers. It would be much easier if we can push this approach to cover the use case, and may even be able to reduce complexity and source of confusion in other areas, e.g. remove the need of --extra-index-url entirely (!)

One file for dependency specifications is a simple, reproducible approach.

Existing dependency-specification files: setup.py/setup.cfg
requirements.txt
pyproject.toml
Pipfile.lock
requirements.lock.txt
environment.yml

The need is to also specify per-dependency index urls in at least requirements.txt files.

The more I think about it, the more I feel that actually I’d be quite happy with that message. “If you want to exert control over exactly what you install, don’t reference an un-curated repository on the open internet”. That actually doesn’t sound that negative to me… The worst we can say is that pip’s defaults are intended for casual users, not for people with strict security policies.

3 Likes

Yes, I agree on that. I was not arguing against it. I was trying to figure out how come people don’t use this kind of private servers already, since that is apparently the major, already-available solution for this kind of issues.

I think that is a legitimate way to approach this. Above all else users need guidance, since there are already viable solutions to this range of issues. In the absence of clear guidance, users tend to invent hacked up solutions that turn out being bad practice and bite them later. I feel like it would be a great step to write down in pip’s documentation or in a PyPA guide what are the do’s and don’ts:

  • be careful when mixing public and private indexes (or multiple indexes in general)
  • extra indexes that are strictly mirrors or proxies are fine (apart from yanks, maybe)
  • be aware of possible name clashes and its consequences on dependency resolution
  • be careful about the content of pip’s cache
  • and so on
5 Likes

At work we’ve been using devpi happily for more than 5 years. We’ve discovered quite after the fact that (at least) one of our private package names had been picked up by a public project without impact (we also deter employees from using --extra-index-url for performance reason, which might also explain that) so I completely support the “devpi” answer :slight_smile:

Nevertheless, I agree that it would certainly feel “safer” to know that our package names can’t conflict with a public one. From the mentioned solutions, a Name prefix: co:package matches co:package (and no colons allowed on PyPI) would have my preference as I don’t see major hurdle in the implementation.

An other solution I did not see mentioned here and that might be worth exploring would be a lightweight PEP-503 private index/proxy that developers could launch on their computer and that could implement all kinds of routing/filtering between multiple private/public indices, package directories, S3 buckets, etc. Developers would simply point pip to this local server with --index-url and be setup.

1 Like

Maybe, we should just remove --extra-index-url from pip.

I’m always in favour of better defaults, but I fear this may be too impactful on some (legitimate) scenarios.

If we simultaneously made --index-url additive and prioritised (i.e. first index that can satisfy a package by name, wins[1]), that would cover it.

1: There’s a lot of scope for bikeshedding over how the resolver could handle mixed versions across indexes, and I’m ignoring that because I think it just makes things worse. If an index claims to provide a package, that should limit the range of versions available to precisely what that index offers, and if that can’t satisfy the whole thing, the user has to go fix their index. An auto-updating index shouldn’t hit this problem in a way that would be solved in pip’s resolver anyway, and a non-updating index is probably non-updating for a reason.

I think that is what pywharf aims for.

Isn’t it useful for local fail-over mirrors/proxies?

Thanks I did not know this project :+1:
However, this isn’t what I had in mind :slight_smile: there are apparently no configuration for filtering/routing packages.

1 Like

I developed proxpi a while back to reduce our requests to PyPI, however it also assumes that packages with the same name and version are identical. I haven’t determined what the behaviour of that’s not the case

Not for fail-over, because any extra indexes are searched at the same priority as the main one. So a higher version from your “fail-over” will be preferred over the primary (which if it’s a legitimate mirror, should never happen).

1 Like

It can’t be used as a fallback (in addition to what Steve said) since it won’t work. If I remember correctly, pip expects a proper response from all indexes to determine what to install, and errors out if it doesn’t.

pip needs to be able to ignore an index if it returns 404. I believe it safely handling more status codes (without failing) than that. SSL errors would also cause an index to be ignored, instead of failing.

So extra-index-url can indeed be used to specify fall-back sources. Although I’d argue that it only really makes sense if the alternative index is a mirror for all packages needed by the user (otherwise the behaviour would be erratic and likely not desirable), which means that making --index-url multi-use (plus explicit calling it out as order-less) would work as well. Also, the current name --extra-index-url is pretty bad if we want to brand it as a fall-back configuration.

Just want to keep the thread somewhat on track. Removing or renaming --extra-index-url to make it more obvious that you shouldn’t use it for extra indexes doesn’t really help anyone (apart from forcing them to go and find tools that can combine indexes for them, such as npm or conda).

I’m not saying it’s a bad thing to do, but it’s perfectly scoped to the pip issue tracker as it really doesn’t impact any more of the ecosystem outside of pip’s users (which, yeah, is most of the ecosystem, but it’s still scoped).

We have valid options that can get us to a much better place, as I listed above (and quoted below), and I don’t want those to be ignored because we tweaked one pip feature and decided the whole thing is solved.

Agreed yea. It’s definitely much more scoped than everything else being proposed here (which affect more tools than “just” pip).

I’ve not seen anyone go “WTH NO” to the idea yet, so I’ll go ahead and move it to pip’s issue tracker. I do think it helps with the issue, by pushing users away from expecting packaging tooling to support this use case out of the box (which it doesn’t yet, that’s literally our entire conversation here!).

1 Like

Fair point. To be clear, I’m not ignoring those proposals because I think removing --extra-index-url solves the problem. I’m ignoring them mostly because the situations that have been described where a solution like this might be needed are so far out of my experience that I don’t have an opinion. So if someone wants to develop any of those ideas into a full proposal and PRs for the various tools impacted, that’s fine by me. I’ll try to restrict my involvement to dealing with the implications of specific pip changes, and that sort of thing.

I’m keen on the idea of removing --extra-index-url because it simplifies pip, both the internals and the user interface, as well as removing the possibility of a certain class of pip issues that are better solved with non-pip tools (devpi). However, I’m not particularly optimistic that we’ll ever do it, because the breakage to existing workflows is probably too high. But I can dream :slightly_smiling_face: And like you say, that’s a discussion that’s purely scoped to pip.

1 Like

Another somewhat related tool: pypicloud. Still not what you really asked for, but might be useful to someone. @uranusjr’s simpleindex announced here is what you really had in mind, right?

Yes, simpleindex is the closest thing to what I had in mind :slight_smile: