Constraints files with hashes

adriangb · August 30, 2022, 11:57pm

This is sort of a followup to https://github.com/pypa/pip/issues/8792 since that was locked. I don’t fully understand the outcome, so maybe this is just user error.

Where:

app1/requirements.in

boto3

app2/requirements.in

starlette
uvicorn

requirements.in

-r app1/requirements.in
-r app2/requirements.in

And constraints.txt is generated by pip-tools: pip-compile --output-file=constraints.txt --strip-extras --generate-hashes --resolver=backtracking constraints.in

And when I want to install dependencies for app1: pip install -r app1/requirements.in -c constraints.txt.

This generally works great for my use case, but if I add --generate-hashes when I generate constraints.txt I then get an error when I do pip install: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. From the discussion linked above it sounds like this is because my .in files don’t have any hashes and the valid hashes are computed as an intersection. I guess this makes sense at some level: if you have pytest<7 in a .in file and and pytest>5 in a constraints.txt file the valid versions are the intersection (pyetest>5,<7).

So a couple of questions:

Is my conceptual understanding of what’s going on correct?
Are there any easy changes to this workflow that might enable me to have package hashes?
Would it be possible to add an option to say “consider lack of a --hash equal to any hash instead of no hashes” or something like this?

pf_moore · August 31, 2022, 9:06am

I don’t know why the bot locked that issue. @pradyunsg any insights there?

As regarding your questions:

More or less, yes. As I said in the issue, my model of constraints files is that they restrict the files pip can “see”, before resolution starts. My model of hash checking mode is unclear (and I’m not the only one who thinks that) but one thing that is clear is that it’s “all or nothing” - you can’t have some files with hashes and not others.
If you want hashes, you have to specify hashes for everything. I don’t know enough about workflows involving pip-tools to say anything more detailed, but that’s the basic rule you need.
I don’t think that would be acceptable. The core idea of hash checking mode is that everything is checked - by design.

pradyunsg · August 31, 2022, 9:47am

The bot locks any closed issue that hasn’t had any activity for over 30 days.

That’s mainly because people would randomly comment on closed issues with thanks or generally unhelpful comments, and to also reduce the moderation surface area for pip’s maintainers — all this goes toward reducing the notification overload if you are subscribed to pip (which is the case for most maintainers and many contributors).

If someone wants to argue against the bot’s behaviour or suggest changes to it, please do so in a separately (possibly on the existing discuss.python.org topic, where this was discussed before introducing that bit of automation).

pf_moore · August 31, 2022, 10:43am

Oh, sorry, my bad. I’d missed that the issue was closed (I was looking at the tags rather than the big “closed” marker at the top ) I shouldn’t read issues before my first cup of tea…

So @adriangb the issue is locked because the agreed behaviour in that discussion has been impemented now. Any further queries/proposals should be raised as new issues, but in this case there’s no need - I think I’ve covered everything in my response above.

adriangb · August 31, 2022, 3:03pm

Thank you for explaining Paul.

I think the “all or nothing” approach makes sense, but enforcement should be aware of constraints files. A workflow with constraints files is not the same as a workflow with multiple requirements files. In my case, every dependency has a hash specified in the constraints file, just not in the .in files. The .in files are not intended to specify the versions to install, instead they are just a list of the top level dependencies that are needed. The constraints.txt file is what should determine the valid versions (including hashes). I think that when dealing with constraints files we should be allowed to have hashes only in the constraints file and not the requirements file, still enforcing that every dependency must have hashes. I’d be fine with every dependency must be specified in the constraints file and have hashes since I don’t have a workflow where my requirements files also have hashes. Even this strict special casing of constraints files would be a huge win given that constraints files with hashes seem to be fundamentally useless at this point.

I do want to push back on this a bit. This behavior was surprising to me because when you have version specifications lack of a version specification means “any version”. When you have multiple version specifications the result is their intersection. I thought hashes would work the same way and that the “all dependencies must have hashes” enforcement would be applied after doing a union of all valid versions/hashes.

pf_moore · August 31, 2022, 3:31pm

They don’t, though. You can specify some versions and leave other requirements unversioned. With hashes you must specify hashes for everything. So expecting hashes to work like versions in other ways is probably incorrect.

I should add a disclaimer here - I don’t use hashes myself, so I don’t know why it’s important to require hashes for everything, but it’s definitely a deliberate decision, so I can only assume it’s important.

They definitely aren’t useless. The linked issue is from someone who wanted constraints in hashes to be respected, so at the very least, what we now have is useful for them. And we’ve had no-one else complain that hashes in constraints files are useless, or even that they don’t work for the user’s workflow, which suggests that nearly everyone is OK with the current behaviour (or that almost no-one uses hashes in constraints files, which I guess is a possibility ) Either way, you seem to be the only person trying to use constraints in a way that’s incompatible with the current behaviour.

You could try to request a change solely on the basis that you need different behaviour. You’d almost certainly have to create a PR yourself, and there’s no guarantee it would be accepted in that case. I’d be against it, because I don’t think addressing one user’s requirement is sufficient to justify the maintenance burden. But you may get other pip developers to support you.

Or you could try to argue on principle, that a different approach gives a cleaner mental model of what’s going on, and hence a more understandable and maintainable design. That’s certainly possible. But you’d have to start from the existing design, which I’ve explained here, and describe how your new design is better - and how it will continue to support existing workflows (including ones that you might not even know about). That’s difficult, but if you succeed you’ll have a much better chance of getting support. You’d still probably have to write the PR yourself, though - there’s only a certain amount of resource in the pip team, and our (volunteer) time is usually quite full.

To be honest, I think you’d be better looking at how to adjust your workflow to work with the current design, rather than hoping that you can continue the way you are and pip will change to make it work. At the very least, you’d be able to sort something out now, rather than waiting for a PR to land and a new release of pip.

ericvsmith · August 31, 2022, 3:42pm

I extensively use hashes in constraints.txt files, so they’re not useless! I use them in combination with requirements.txt, which also contains hashes (obviously).

That said, I wish I didn’t have to do this at all. All I really want to say is “do not install anything unless I’ve specified it in requirements.txt or constraints.txt”. The only way I know to do that is with hashes. I’m not using PyPI, so I completely control the packages that are being installed. I’m unconcerned about their hashes (because only I can modify them), but I’m forced to use hashes anyway. This requires a bit of a dance when I deploy new packages.

Anyway, sorry for sidetracking.

adriangb · August 31, 2022, 3:57pm

Sorry for the word useless folks, that was way too strong of language. Perhaps broken for my use case would have been better.

Yeah I hear you. I realize changing behavior is hard to impossible. Unfortunately project layout I proposed and use of .in files in combination with constraints files is by far the easiest way to handle a large Python monorepo that I’ve found without resorting to Pants/Bazel/a lot more complexity. And it’s so close to being perfect (for me), the only thing missing is checking hashes. If you have any alternative suggestions that still achieve the same thing, I am all ears.

I may do a bit more reading of the related issues. It seems to me like summarizing use cases for constraints files would be useful for further discussion. Not sure I’ll have the knowledge and energy required to compile this but I may try.

The “or” here is confusing to me. You’d want something installed if it’s specified in constraints.txt but not requirements.txt?

The fundamental problem for my workflow and having hashes in the requirements files is that the constraints file is the output of “compiling” the requirements files, I’d have some sort of circular logic if I take the requirements file, create a constraints file with hashes from it and them am forced to put those hashes back into the requirements file. I’ve done this before, basically you have to create the constraints file then use that to create requirements.txt files for each requirements.in file. Unfortunately pip-tools does not support that and the workarounds get super ugly and broken.

ericvsmith · August 31, 2022, 4:15pm

I want something to be installed if it’s in requirements.txt or it’s needed as a dependency (direct or indirect) of something that’s in requirements.txt. If such a dependency is not listed in requirements.txt or constraints.txt, I’d like it to be an error.

I currently achieve this by using hashes in both files, but I don’t really care about the hashes, I’m just using them as a means to my goal.

I should mention that in both files I pin specific versions, which is my goal: to install specific wheels. Again, using hashes achieves the same thing, but are an additional hassle to manage (especially when I’m targeting multiple platforms or Python versions with the same requirements.txt and constraints.txt).

pf_moore · August 31, 2022, 4:37pm

So you want to say “Install what’s in requirements.txt. I’ve noted in constraints.txt what I expect to be pulled in as dependencies, and if anything I didn’t mention is pulled in, shout and stop the install.”

Is that correct?

If so, the new pip install --dry-run --report option might well let you replace your constraints.txt with a new expected.txt and then do something like

pip install --dry-run --report -r requirements.txt | check-expected expected.txt`
if (previous command succeeded)
    pip install -r requirements.txt

Whether it’s better than adding hashes that you don’t really care about is something for you to decide

ericvsmith · August 31, 2022, 5:00pm

Yes, that’s correct. There might be other stuff in constraints.txt that doesn’t get pulled in, but I don’t that’s germane here.

That’s for the tip on --report. I’ll check it out once I’ve updated my client to a newer Python version.