Proposal: overrides for installers

NeilGirdhar · January 16, 2024, 10:23am

To mitigate the effort, it might be easier to add a feature to tomlib whereby a toml override file can be used to override the data loaded from a toml file. Thus, for all such projects, they would simply replace:

data = tomllib.load(fp)

with

data = tomllib.load(fp)
data = tomlib.update(data, fp)

So, this feature would boil down to writing one update function, and then tools can opt in to using that function.

ajoino · January 16, 2024, 11:53am

Maybe I’m wrong or misunderstanding something but isn’t

data = tomllib.load(fp)
data |= tomllib.load(fp)

equivalent and works already?

NeilGirdhar · January 16, 2024, 11:57am

Cool! Is that all that’s needed for libraries to implement this proposal? Sorry I don’t have time to investigate, but if so, that would be a huge step.

ajoino · January 16, 2024, 12:03pm

Dunno about the proposal, but dicts can be updated either with the dict.update() method, using the {**a, **b} pattern, or the | and |= operators since 3.9 I think?

Sorry for the OT.

NeilGirdhar · January 16, 2024, 12:06pm

Oh, a dict update is probably not good enough. It needs to recursively update sub-elements

pf_moore · January 16, 2024, 12:27pm

Well if it is enough, and you don’t have time to implement it, then I guess that demonstrates that even this small amount of work is too much to attract contributions…

Sorry if that sounds snarky, but the reality here is that none of the tools needs someone to point out a simple way of implementing the requested feature - they need someone to actually create a PR that does so. A lot of discussions around packaging tools and features fizzle out at the point where it’s been established that a bunch of people want something, and that “it looks easy to do”, but no-one ever actually tries to actually do it. And the reality is that it’s only when you try to implement the feature do you find out why it’s not as easy as you’d hoped.

So the maintainers get a reputation^[1] of blocking good ideas, users get frustrated that tools don’t care about their use cases, and everyone gets burned out until the next idea comes along. And we make no progress.

If you (or anyone else) genuinely want to progress this issue, then I’ve described most of the points that I think are relevant in the post that contained this one comment that you quoted:

If you only care about pip, I already pointed out the existing pip issue for this. Feel free to look at the history on that and propose a solution there.
If you care about a standard mechanism that all tools will use, there’s also poetry and pipenv issues linked in that post. You’ll need to get consensus on all those projects for whatever solution you’re suggesting.
You’ll need to look at the other practical points I mentioned. Maybe I’m being pessimistic (if your idea of simply merging 2 dictionaries is enough, then at least one of my points was overstated) but the only way to know that is to try some ideas out.

Again, my apologies if this sounds like I’m dismissing your suggestion. I suppose I sort of am, but given that this thread had been dormant since last March, I think it’s likely to need something more than this to have any chance of actually moving forward.

which we don’t like having, to be clear! ↩︎

NeilGirdhar · January 16, 2024, 12:47pm

Perfectly fair points.

In the spirit of moving this thread forward, I think a first step would be to make a list of all of the requested types of overrides. That way we can ensure that whatever mechanism is implemented satisfies them all.

So far, we have the ones mentioned at the top (adding a section with a requirements; adding a section with an index-url), and the two I mentioned.

Adding sections seems easy. My suggestions require replacing elements, which should also be easy. There may be suggestions that require adding things to lists? Will order matter? What if someone wants to delete something from a list?—or delete an entry? In that case we may need a way to specify deletions.

I think if we can narrow down the scope, we may be able to get something rolling.

pf_moore · January 16, 2024, 12:56pm

Honestly, I think the best way to start would be to simply implement @sinoroc’s original specification, linked in the first post on this thread. It may not get accepted as it stands, but working code is a far more compelling argument than any sort of debate. And even if the proposal there doesn’t do what you want, implementing it would give you a much better understanding of what’s needed for any other proposal than simply talking about it.

For a somewhat easier starting point, just produce an implementation for one tool. While doing that won’t give you a good feel for how standardising a cross-tool capability differs from implementing a tool-specific feature, it’s still a great start.

NeilGirdhar · January 16, 2024, 1:37pm

I think it’s something like this. I haven’t tested it or anything so please don’t take this to be a proof of concept. It’s just an illustration of how I imagine this would work. The idea would be to bake the override process (apply_overrides in the code) right into tomllib and tomlkit, and then there isn’t so much code for libraries like poetry to add to their code.

pf_moore · January 16, 2024, 2:28pm

I can’t comment on what the poetry maintainers would think of this, but
For pip, you’ll need something that works with older Pythons, so relying on a new feature in tomllib is a non-starter for the immediate future.

NeilGirdhar · January 16, 2024, 2:32pm

Couldn’t you just vendor the single function?

pf_moore · January 16, 2024, 2:42pm

Sigh. As I’ve said, prepare a PR and we’ll see. Honestly, I don’t know but I doubt it (our vendoring process vendors packages from PyPI, not bits of code from the stdlib). Copy and pasting a function definition is a possibility, but I’m not at all sure I would be happy with taking on that maintenance burden (and I can’t speak for the other pip maintainers).

Anyway, this is exactly the sort of thing I don’t think it’s productive to endlessly discuss here. Create a PR and we can see if we can resolve any questions. Otherwise, the reason this proposal hasn’t been implemented is simply “no-one has written any code”. Everything else is incidental.

NeilGirdhar · January 16, 2024, 2:53pm

I understand what you’re getting at, but from the other side, no one wants to invest weeks writing code only for the answer to be “no”. I remember when I worked for months on PEP 448’s implementation and the initial reaction of the core developers was to reject it. Thankfully, it was eventually merged, but it was looking for a while like all of that work was going to be wasted.

So I think it’s the same thing here. It helps to get an idea of whether that work will be accepted. If we had a clear positive or negative indication of whether something was likely to be accepted, it would motivate the time investment of implementing something. On the other hand, I see your point that you don’t want to answer questions would code that you can look at. Ideally, there would be some middle ground. E.g., working from a design document.

You could vendor the entire tomllib with the changes in place if that makes you more comfortable.

pf_moore · January 16, 2024, 3:13pm

Your best starting point is to look at the issues linked above to see what the various projects think of this idea. And getting a consensus for a multi-project standard is even harder to assess.

What I will say (because your comment is certainly fair) is:

For a proposal just implementing something in pip, I can’t even offer certainty that I will review it. My volunteer time is seriously limited, and whether I can review a PR is strongly affected by how complex it is. In this case, I think a PR will be complex, so I’m hesitant. You think it’ll be simple, so you don’t understand my reservations. I can claim that I know pip’s codebase better than you, hence my instinct is likely to be more accurate, but honestly, I have no wish to be that negative. If you can produce a simple PR^[1] I’ll try to find time to review it. But if my review consists of “you haven’t thought about X, Y, Z…”, and as a result your simple idea becomes complex, I’m sorry but I’ll have to drop it at that point.
If you want to write a standard that will be supported by all tools, then I’d probably be the PEP delegate. And in that context I can say that I will review and decide on it, as that’s my job. But for it to be a success, it would need to be a detailed design, with clear support from the various tools that would implement it (as well as from the community in general). Plus, at least some workable plan for how it would be implemented in one or more tools. I may comment in the discussion as an individual or a pip maintainer, but whether I do or not isn’t particularly important - I’m OK with accepting a PEP that I have personal reservations about, as long as there’s clear evidence that the community supports it and it’s of benefit to the ecosystem (which should have been captured in the PEP anyway).

(This discussion has probably taken up most of what was left of my open source time for today, so I’ll say no more for now).

Remember, it’ll need docs, tests, etc - just a proof of concept isn’t enough ↩︎

NeilGirdhar · January 16, 2024, 3:15pm

Fair enough, thanks for the explanation. I was just trying to move this idea forward in a small way.

sinoroc · January 16, 2024, 7:57pm

I do not recall if it is mentioned explicitly, maybe indirectly, but if I understand correctly what you mean, then yes it is definitely what I have in mind with this.

No, not what I had in mind for the scope of this, but maybe it could be in scope. At least not what I would consider a priority.

That is not what this proposal is about. That would not work.

At this point, my attitude towards this idea is:

keep gathering possible use cases
hint at it here and there when I see a related discussion
in the hope to garner some interest and maybe find someone to champion this and get some implementation going or whatever

I encourage people to unsubscribe from this thread if they do not want the noise, which is perfectly fair. I do not think there will be any breakthrough anytime soon. If there is any breakthrough then assuredly there will be a new thread. We can also close this thread and redirect discussion to the gist instead.

NeilGirdhar · January 16, 2024, 11:19pm

Would you mind elaborating on what you need in order to implement this idea?

pradyunsg · January 17, 2024, 8:31am

Not quite. As far as I can tell, this plugs into and requires changes to how the installers’ package “concretising” logic works (i.e. going from a name to a resolved distribution) as well as dependency determination logic. The recursive merge of a mapping isn’t a thing to worry about in terms of the implementation work involved IMO.

I think this would be useful.

That said, I want to caution that we avoid doing too much design work “up front”, before (at least) considering implementation complexity in the existing tooling.

Could we? Yes, absolutely. Would we? I don’t know, depends on the function. I wouldn’t call it vendoring at that point tho.

As mentioned, most of the complexity of this lives in a fairly-coupled-with-core-logic parts of the codebase. A function to do a recursive dict merging isn’t really a chunk that moves the needle on implementation story much IMO.

If I’m reading the room correctly, @pf_moore is more concerned about how the core logic (that has a lot of nuance associated with it) would be updated, rather than the specific function for merging dicts. I share that with him – how feasible this idea is depends on the use cases, the specific proposal design and whether someone is willing to put in the effort to implement this in an existing tool.

And, @NeilGirdhar was looking to help move this forward a bit by volunteering to do so! I appreciate the interest however this isn’t really a piece that moves forward in small pushes sadly.

NeilGirdhar · January 18, 2024, 4:46am

Okay, I read through the gist more carefully and I think I see why the naive combination of toml files doesn’t currently work.

My proposal would be to do the work in multiple PEPs:

A PEP to provide additional indexes in a pyproject.toml and connect them with dependencies (poetry already has this feature, which they iterated on multiple times; it may be worth comparing any proposal with their solution),
A PEP to allow (possibly editable) path dependencies to pyproject.toml (poetry has this already),
Various other PEPs (I don’t understand everything in the gist), and finally
A PEP to interpret a very simple overrides.toml that simply modifies a pyproject.toml.

This has a few benefits:

it breaks a large change into multiple smaller changes that can be
- reviewed individually (greater chance of passing review),
- implemented individually (easier on reviewer, easier to find implementers), and
- scrutinized individually (potentially better design),
it motivates the poetry people to finally adopt PEP 621, which they are resisting because it doesn’t support everything they do, and
it’s easier to understand the pieces than a giant PEP.

sinoroc · January 18, 2024, 9:32pm

Yes, this needs to be split, no doubt about it.

I am against this. Abstract dependencies belong in pyproject.toml, concrete dependencies do not. Maybe you meant overrides.toml in which case, yes, that is one of the main drivers for this proposal.

Sure, why not. Not what I have in mind, but we’ll see, maybe it comes naturally as an extension of what I have in mind.

That is just one aspect of what this proposal is about (But again, NOT in pyproject.toml! From my point of view Poetry got this wrong.). For me right now it is hard to see if it is a good aspect to use a starting point.

To be fair, I also kind of lost track of all the use cases I considered in this. I would need to find time to get back on this topic.

There were a couple of categories I thought of:

Make it possible to use installation modifiers that are currently global, per dependency instead. For example pip’s --pre option (which kind of triggered me reviving this thread). See also how it has been added to PDM. And the big one would be if it were possible to specify one --index-url option per dependency.

Having all these as CLI flags would possible become quite unreadable, so maybe a separate file used as input to the installer would be better.
Override faulty package metadata. Typically offer a way to override eager upper caps on version constraints.