I have been doing some research and had to create a bunch of python packages for it and upload them to Pypi. When I logged in today, I noticed that all my projects were deleted and I didn’t get any notification to my email regarding the deletion. Can someone please explain the reason behind it?
They are not even stale projects either. I just created them in the last month.
Additionally, PyPI is not a research platform. If you want to test whether it is possible to install packages unexpectedly, we’d suggest setting up your own controlled environment and demonstrating it there.
If your “research” requires them to be installed by unsuspecting users, we consider you malicious, not a researcher, and will remove/block packages and/or users without notice.
totally understand, I was not trying to push any code to the platform. I was only registering the package names. Mainly to understand the number of projects we are using which has internal dependencies that are not published on pypi and wanted to claim them so no one could register them.
The proper solution for a company who has internal packages is not to register all of those package names on PyPI. The proper solution is to ensure that your company’s systems block packages from PyPI which have the same (or even similar, as in typosquat) names as your internal packages. This is simplest to do when when your internal packages are consistently named with a common prefix (e.g. the company name).
Filling up PyPI with millions of package names that have no content is not going to be maintainable.
Hello Kevin
I saw your post on the proper way to prevent Supply Chain attacks on PyPI.You mentioned " your company’s systems block packages from PyPI which have the same (or even similar, as in typosquat) names as your internal packages". Is there a way to natively implement this with PIP only? From what I have read, it seems there is quite a few other things that would have to be done to achieve the blocking results you talk about. This seems like quite a bit more work for less results. I realize the namespace is flat and only so many combinations exist, but can you blame someone that wants to be sure to stop the attack with minimal effort?
I’m not aware of any way to do that with pip; it would need to know what the names of your internal packages are in order to do the comparisons, but it doesn’t have any mechanisms which can be used that way. The most straightforward way to do this is to have some sort of centralized system (which could be just a proxy with some scripting, or a full-blown caching package repository) where that data can be stored and compared, and then configure pip to use it instead of pypi.org directly.
Thanks for the response Kevin. I am looking into a centralized proxy at the moment to try and follow the best practice. It just seems like a lot of work compared to registering a few internal package names on PyPI to make sure someone else doesn’t register those names (maliciously). I want to do the right thing here, and I realize publishing packages to a flat namespace with little functionality (letting a dev know that they have the wrong config) isn’t great for the PyPI community. I truly hope that PyPI will soon support namespaces and give another avenue here.
By Davidsyckle via Discussions on Python.org at 08Jun2022 12:41:
Thanks for the response Kevin. I am looking into a centralized proxy at
the moment to try and follow the best practice. It just seems like a
lot of work compared to registering a few internal package names on
PyPI to make sure someone else doesn’t register those names
(maliciously). I want to do the right thing here, and I realize
publishing packages to a flat namespace with little functionality
(letting a dev know that they have the wrong config) isn’t great for
the PyPI community. I truly hope that PyPI will soon support namespaces
and give another avenue here.
Has it not occurred to the OP to just put all their internal packages
inside a distinctive private prefix, like
internal.initials-of-company.packagename?
It doesn’t prevent use of these names in PyPI, but it does make it very
unlikely that there will be accidents, and also makes it obvious which
packages are internal private (and will never be published for others to
reuse) once and therefore which packages are likely from a public
location?
It just seems both simpler and less polluting of the public (visible to
the world) namespace.