I can check the “name”, “source” but I can’t do sha256 in my head.
I’m relying on packages/prefix/prefix/hash/bcrypt-... to actually come from bcrypt.
Is that guaranteed?
What happens if someone pushes somename-vvv-cpvv-abi...whl to PYPI where metadata indicates some-other-name as package name? Does PYPI reject those and guarantee that every existing wheel name matches metadata?
Obviously such features require a very cool workflow tool, but I think that’s realistically the only way.
For example, what if you want to consider it an error for a lockfile to be missing wheels for supported targets? (Assuming those wheels exist on pypi.) I can’t check the contents of pypi just in my head anymore than I can hash a wheel in my head.
These two questions are tied together. I know that pypi allows some filenames which aren’t normalized perfectly (e.g. you can have an sdist with foo-bar rather than foo_bar), but I’m nearly certain that the normalized name parsed from your filenames is required to match the normalized name of your project.
That protects against both potential issues.
It would be good to confirm this if you are concerned – we have specs, friendly pypi maintainers on this forum, and you can always just try it out with a dummy package and see what happens.
No, but you can download the file and use a hash calculating utility of your choice. Or if you trust PyPI, you can check the hash displayed on the PyPI webpage. You can also check the size, which is a quick sanity check rather than a guarantee.
You can check if that is the case by reviewing the package. The metadata standards say that it’s not valid for the filename and metadata to fail to match, but someone can easily create an invalid file by renaming a valid one. PyPI might validate, I haven’t checked the PyPI docs or code, so I can’t say for certain - for a wheel they might, but it’s unlikely they will for a sdist as that would require them to execute untrusted code.
Ultimately, you need to decide for yourself what threats you want to defend against, and how much effort you want to put into checking. The lockfile ensures that the installer will only install the file you expect it to (based on URL and hash) but it’s up to you to check that the file it will install is the one you want.
So basically, you need to:
Review the file at the URL given to make sure it is what you think it is.
Check the hash manually, to ensure that it’s not possible for someone to change that file later without needing to change the lockfile.
To my mind the threat of concern here is the PR author replacing bcrypt in the target project, by a malicious project also on PyPi, that has had a wheel uploaded to it, that has been renamed to look like a bcrypt one.
It’s good to know the names not matching is possibly rejected by PyPi or pip, but I wouldn’t rely on that.
Checking the hashes for the wheel are the same as whatever the bcrypt authors uploaded, limits the risks to those originating there. Which don’t just affect projects using lockfiles. And which would cause much bigger problems for the world than pulling in malicious code into a single downstream PyPi project.
How do you imagine that a non-malicious contributor would literally update the lockfile?
Presumably they should be running some command to do it and what you need to validate here is that running that same command gives the same lockfile. I would have thought that the part that you should be auditing carefully is where this contributor has edited the inputs to the locking tool whereas the outputs are easily verified by running the tool.
Trust is always a balancing act. If you don’t trust any external contributor at all, then yes, you have to do all the work yourself. But very few, if any, open source projects offer an absolute guarantee to their consumers either - so you do a reasonable level of checks, run your test suite, and assume your users will either trust your processes or do their own deeper checks if they want to.
Absolutely. Dima’s basically asking for tools that could help with that. Such suggestions would be of use to all project maintainers, and needn’t require new features to be added to pip or PyPi, or have any impact on those at all.
I think we don’t understand each other. At least you haven’t understood what I meant but also it is possibly because I haven’t understood how lock files work.
I’m imagining that the diff in the PR is like this:
The PR author has updated the version pin in requirements.txt. They have then run a tool like:
lock-tool update -r requirements.txt lockfile.txt
Now you as a reviewer of the PR should look closely at the manual changes in requirements.txt. The other change should be verified by a CI job that runs lock-tool and verifies that it gets the same lockfile with same hash etc. Perhaps the tool provides this as a command for your convenience:
That’s very clear - thankyou. If anyone knows of such a CI job or Action, that can automatically run lock-tool update and throw an error if the file produced isn’t identical to the same file in a PR, it would be super helpful if they mentioned it. Ditto if anyone knows of something that supports lock-tool check already.
Until then, I stand by my original point. The PR requires just as much work to check, as it did from the author to create it. There is a point to raising such PRs - we maintainers may need a nudge to bump a dep. But other than encouraging the PR author and fostering community engagement with the project, there’s little point in merging them.
If you’re using a lockfile for security reasons, I wouldn’t allow 3rd party contributions to touch the file. In any case, even that of just using it for reproducibility in CI, I would look at any change that introduces or changes dependencies with significantly higher scrutiny as it’s asking your project to extend the trust your users have for your project to the added dependency.
If the Warehouse ensures that every https://files.pythonhosted.org/***/bcrypt-***.whl file comes from bcrypt project, which I trust, I only need to manually check the new libraries, that is projects I don’t yet trust. That’s manageable.
If the Warehouse doesn’t provide such guarantee, then manual validation doesn’t scale, and I must build out CI to e.g. auto-validate lock file changes using tools, and the tools aren’t allowed to run sdist setup.py in that scenario, as that would be a great injection point to override tool output and retval.
But that just means there’s no point in merging such PRs
I was thinking if I should even allow my CI to run on this PR.
Ah, I see your concern now. No, I don’t think PyPI guarantees you can assume this just from the URL. But you can look at the file information on the PyPI page for bcrypt, and confirm the PR refers to a URL for the project and the correct hash.
That is what I’d do in a case like this. I wouldn’t bother automating it (I doubt it happens often enough to be worth it), I’d just make it part of the manual review of the PR.
Firstly, I think it falls under the type of PR like a bug report, for which it’s reasonable to ask for steps for reproduction.
Secondly, What was the goal of pinning versions in the first place? Maintainers aren’t obliged to bump dependencies on a simple innocent request. Users can upgrade a dependency to the latest version themselves easily enough with pip. Maintainers may reasonably prefer to be late adopters.
If you do want to adopt that version, the first and last 3 characters of the SHA256 you posted match what’s on PyPi for bcrypt. I’d just double check the whole thing is as below, and sleep easy:
I believe attestations with trusted publishers can give you that guarantee if the project is hosted on a platform that trusted publishers supports and they use it. With that you can then verify that a file came from a specific workflow (e.g., a specific GitHub Action from a specific repo).
I think if the lockfile can only be generated by a machine then it should probably also be for a machine to verify it. There should be some way for a human to be confident that the machines are doing their job though.