Enable hash-checking mode by freezing my setup.py dependencies to a requirements.txt (possibly using pip-tools cc @matthewfeickert) so that I can force pip to use the version of B that I decide, by specifying the hash. If I understand correctly, this would make my installation procedure immune to PyPI squatting of internal dependencies. I would not immediately detect incompatibilities with new versions of my dependencies though (but perhaps it’s a good thing that my pipelines won’t break inadvertently!)
Provision a dedicated machine with a public IP, install devpi-server on it, and use it from CI as an --index-url.
Are there other options I’m potentially missing? Any tips are appreciated. If my assessment of the available options is at least correct, I might contribute some docs to pip as suggested in the issue tracker (although I have other contributions waiting for me to look at them…).
I personally use pip-tools, as it’s fine for my workflow (although I wish I had put in the time to learn pipenv or Poetry for their advanced functionality). I only pin dependencies for my application deployments however, and both developers and libraries get the latest compatible.
This is in conjunction with --extra-index-url. We also prefix our project names with a disambiguation to reduce the likelihood of name collisions; perhaps you could ask PyPI admins to black-list that prefix, if you’re a big enough organisation.
Another option is to use a service like Azure Artifacts, which (like devpi) will cache public packages, but it’s managed.
As you’ve already pointed out @astrojuanlu, this is my suggestion based off of what I’ve learned from @brettcannon’s pip-secure-install recommendations. pip-tools makes this pretty easy. As @brettcannon points out in his comment too, there isn’t a formal lock file spec yet — though I’ve personally taken to calling the output file of what comes out of pip-compile --generate-hashes a lock file (and even taken to calling it requirements.lock though the pip-tools team calls it requirements.txt). Importantly though, if you are doing as you point out with
# requirements.txt is the lock file
$ python -m pip install --no-deps --require-hashes --only-binary :all: --requirement requirements.txt
then you are only able to install from the wheels that match the hashes in the requirements.txt.
I would not immediately detect incompatibilities with new versions of my dependencies though (but perhaps it’s a good thing that my pipelines won’t break inadvertently!)
In my mind this once again comes down to are you developing a Python library or an application? As @EpicWink has already pointed out, if it is an application, then you really want to be using a lock file anyway and then you can carefully ease up restrictions in your requirements and rebuild your lock file and rerun your tests to understand what you can update and when. If it is a Python library then the best you can do is to test your dependencies’s lower bounds with a constraints.txt file that has them pinned with the oldest Python you support, and also test at the latest releases or at HEAD of your dependencies (yay for nightly wheels!). (I’m writing this for completeness as Juan and I have already discussed this and he knows my views.)
A slight twist of option 3 would be to provide a pass-through server instead, e.g. a server that receives requests to various packages and simply redirects to one of your actual sources. That should be much cheaper (in many ways) than a full devpi setup. As a further improvement you could even run that server as a part of CI and just use localhost.
Thanks all for the comments, they’re really helpful!
Glad to see the efforts are still ongoing, been following this on and off for the past two years - I’ll see if I’m articulate enough to emit an opinion there.
Notice that I’m developing a library - that’s why I brought the question up, since the primary use case of many of these locking workflows is application development instead. However, pip-tools is good enough for me and keeping setup.py and the resulting requirements.txt in sync is not necessarily a huge pain.
Good to know, thanks!
Thanks @matthewfeickert - however if I understand correctly, constraints.txt files won’t save my users or my CI from pulling malicious packages from PyPI, am I right? I’d either need locking or a PyPI proxy for that. Therefore, I see constraints files as a nice addition (even though I’m still wrapping my head around them) on top of the other solutions.
This sounds interesting @uranusjr, and "it should not be too hard"™ are you aware of any open source implementations of this idea? Sounds like it could be potentially useful for a lot of people.
For more complex routing you could register custom route classes as described in the documentation. Although depending on the service you’re running CI on, sometimes just starting an nginx instance might even be easier, and definitely allow more configurability.
The documentation actually directly advises against using --extra-index-url:
In these commands, you can use --extra-index-url instead of --index-url. However, using --extra-index-url makes you vulnerable to dependency confusion attacks because it checks the PyPi repository for the package before it checks the custom repository. --extra-index-url adds the provided URL as an additional registry which the client checks if the package is present. --index-url tells the client to check for the package on the provided URL only.
In other words:
--index-url checks the Gitlab registry first, and then PyPI.
--extra-index-url checks PyPI first, and then the Gitlab registry