PEP 759, External Wheel Hosting

jamestwebber · October 8, 2024, 9:23pm

The closest equivalent is an environment.yml file:

name: my-complex-env
channels:
  - pytorch
  - bioconda
  - conda-forge
dependencies:
  - python
  - numpy
  - pysam
  - pytorch::pytorch
  - pip
  - pip:
    - something-thats-only-on-pypi
    - git+ssh://git@github.com/me/my-private-repo.git
    - -e .  # local editable directory

I’m not saying that they’ve solved all of the issues here, but they defined a solution that people can and do use.

pf_moore · October 8, 2024, 9:41pm

Thanks for that research - that’s the sort of thing we need here. I think that some form of per-package source specification may well be a good way of improving the UI for multiple indexes (with wildcards, it could even help with namespaces - “microsoft-* should be retrieved from index.microsoft.com”).

I completely agree.

I don’t think this follows. Giving the user the control and choice isn’t a bad thing. Making it complex to exercise that choice is. But we should find a user-friendly solution to the general problem, not keep adding special cases that solve individual parts of the problem.

I don’t want this to be seen as an either/or conflict. At the moment we’re exploring solutions, and PEP 759 is a very specific solution targetted at one specific problem, whereas improving the ergonomics of multiple indexes is a much broader change which would add a lot more flexibility to solve issues, including the one PEP 759 targets, but not limited to it. In theory, we could implement both, neither, or either one of the approaches. What matters is that we make the best use of our limited resources to end up with a good experience for users.

itamarst · October 15, 2024, 2:19pm

The security concerns brought up so far are modifications to the wheels (mitigated by hashes) or someone deliberately uploading malicious code, where the issue is taking it down when identified. This PEP does introduce an additional security risk though: transport-level attacks.

Right now pip/uv/etc mostly just talk to PyPI or Fastly servers for most users. With this change it might be talking to arbitrary servers.
OpenSSL has a history of security exploits, some client-side (e.g. a client-side security problem was fixed in 3.0.15).
Until now, even if the user had an out-of-date OpenSSL, this wasn’t likely to be an issue in practice for package installation, at least, insofar as most communications was with trusted servers. (Third party libraries that rely on cryptography/pyOpenSSL aren’t tied to the version of OpenSSL Python is using.)
With ability to host wheels elsewhere, there is far more scope for an attacker to exploit client-side unpatched OpenSSL vulnerabilities (and more broadly, client-side vulnerabilities).

I don’t think this is an argument against this PEP, more that:

Acceptance of this PEP raises the stakes for ensuring client-side packaging tools stay up-to-date on security fixes (both the base Python install but also e.g. uv).
PyPI maintainers conceivably may have to someday deal with diagnosing much more sophisticated attacks. If you wanted to target an attack at a specific organization, for example, you could have a completely legitimate Python project hosting legitimate wheels on an external server that has a HTTPS server that is behaves just fine… unless it gets a connection from an IP known to be tied to the organization it’s attacking, in which case it tries to trigger an OpenSSL buffer overflow.

barry · October 15, 2024, 3:35pm

Can you follow through on that concern? Is it that some malicious server will try to return content that exploits vulnerabilities in out-of-date client-side OpenSSL implementations?

If so, I think we’re already there because of --extra-index-url and other functionality that already triggers connections to potentially untrusted servers.

itamarst · October 15, 2024, 3:41pm

This is true, but the difference is in visibility and scale. Right now users are explicitly opting in to talking to non-PyPI servers, and not that many do, so it’s visible and uncommon. This PEP makes it happen much more opaquely, behind the scenes, and potentially much commonly. Again, not an argument against the PEP, just a thing to keep in mind in terms of secondary effects.

atalman · October 18, 2024, 1:41pm

Hello. I am from PyTorch Dev Infra team.
I am fully supportive of this Proposal. I believe this can solve some of our current issues:

PyTorch PyPI channel size constantly growing due to large wheel size of PyTorch binary. Currently around ~900MB for 2.5 release.
Allow us to build better test/staging PyPI channel to test binaries before publishing to PyPI. Currently we use download.pytorch.org/whl/test to test our wheels.
Potentially give us ability to host binaries with different CUDA versions on PyPI (Currently we only host CUDA 12.4 for Linux).

I have couple of questions :

As Per this PEP: Organization accounts do not automatically gain the ability to externally host wheels; this feature MUST be explicitly enabled by PyPI admins at their discretion.

Maybe we can elaborate a little bit more here, what will onboarding process be like ?

We are currently migrating our current CDN solution to Meta hosted CDN. Our PyPI Index is hosted in our regular download.pytorch.org but wheels themselves will be hosted either on our current CDN solution or Meta CDN solution (via redirect to different URL). Hence we are planning to have 2 CDN solutions wending wheels. Will this use case be supported in this proposal ?

Thank you,
Andrey

barry · October 18, 2024, 4:31pm

Thanks for the feedback @atalman ! To answer some questions:

In my mind, it would be as simple as either a) requesting the addition of external hosting bit flip at the time you request your org (and including your support URL) or b) filing a PyPI support issue for an existing org. I think this will be pretty straightforward for a PyPI admin to review and approve and I don’t think it will be any more time consuming than org approval in the first place. But if PyPI admins feel otherwise, I’d like to know!

I think they’re orthogonal. Externally hosted wheels just need download URLs so they aren’t tied to an index. If your package namespace for your PyPI index is separate from your wheel download repository, then you’re fine. Based on your question, it occurs to me that the download URLs in your .rim file could easily be the same download URLs for the associated packages in your PyPI index. That’s an interesting use case I hadn’t thought about before, so it might be worth adding to the PEP.

barry · January 31, 2025, 6:49pm

After discussion between @emmatyping, myself, and a few others, we have decided to withdraw PEP 759. We wish to thank everyone who contributed to the constructive discussion, and
especially those who showed their support for this PEP, both in public and private.