Maybe. The existing rules in packaging were created by working out what worked in practice, so they probably include a lot of obscure knowledge that’s not documented anywhere. From what I recall, @brettcannon did most of the work, and it was fairly tricky, so it would probably be worth getting his view.
Generally I’m in favour of standardising whatever we can, but I’m concerned that if we start to try to standardise this, we could end up down a pretty large rabbit hole 
Yep. There’s likely uv at least.
I’m not at all sure, to be honest. The behaviour implemented by packaging is in effect a de facto standard at this point, and tools could be relying on any part of it. There’s also a number of public “low level” APIs in packaging, documented as:
These functions capture the precise details of which environments support which tags. That information is not defined in the compatibility tag standards but is noted as being up to the implementation to provide.
Tools could quite legitimately depend on any or all of those functions working as documented.
Yes, that’s the sort of arcane knowledge I was alluding to above. I have no idea why that is, and it seems to contradict your idea that abi3 is compatible with earlier versions. Or maybe it’s significant that this is CPython 3.3, when abi3 was introduced? Who knows?
Sorry, yes, that was an oversimplication - how tags are considered is based on what tag sets the system declares it supports, which comes from packaging (or whatever other implementation you’re using) - here. I tend to think of that as implying “each tag must match independently” because that’s the situation in the simpler cases, and it’s hard to explain the real rule 
It’s not the installer’s job to apply any rules other than “run down the list of supported tags, and pick a wheel that matches the earliest tag set on the list that it can”. It’s the interpreter’s job to say what tag sets it supports - we do that in packaging because getting packaging-specific details like this into the stdlib has generally been too difficult (plus, we’d have no way of getting anything added to older Python versions).
So packaging is essentially a place to store information about the various interpreter implementations, in this context. And tools that can’t (or simply don’t) use packaging need to construct that set of information for themselves somehow.
I also don’t see a problem with that, other than the combinatorial explosion of tag combinations we have to return from packaging.tags.sys_tags, and the need to decide on the correct order of priority (is a py32-abi3 wheel better or worse than a cp33-abi3 wheel, or a cp33-cp33 wheel?)
A big part of the issue here is a lot of the questions that need to be answered are basically theoretical, until someone actually publishes a set of wheels with the particular set of tags that we need to decide between…