I am not wholly convinced by this argument. For instance, pip has to talk to the network anyway, so why couldn’t it update its lookup table based on some canonical file maintained inside PyPI?
I am totally prepared to believe this; however, in an offline conversation, @pf_moore said he thought pip’s auto-self-upgrade mechanism ought to be sufficient to deal with it. So I’d like to ask both of you to respond specifically to this point.
@njs, you have concrete evidence that users don’t get upgraded to new pips? Can you say more about that? What sort of work are those users doing with Python, what vintage of CPython and of Linux are they using, and is anything specifically known about why they are still using old versions of pip?
@pf_moore, when you told me that pip’s automatic self-upgrade ought to deal with this, was that based on anything more concrete than your intuition as a pip maintainer? If so, can you talk about that? If not, can you say something about what kind of evidence would convince you whether @njs is right to be concerned about this?
I am looking into these crashes in collaboration with the TensorFlow maintainers. I don’t have anything to report yet; however, my working hypothesis for the root cause is what I said in the perennial PEP PR:
If there is more than one copy of libstdc++ loaded into a process, the behavior of the entire program becomes undefined, in the sense in which the C and C++ standards use that term.
If this is correct, the fix will involve the manylinux spec. It will go something like this:
The core interpreter will need to be linked with -lgcc_s. This needs to happen anyway, for unrelated reasons, so I went ahead and filed bpo-37395 for it.
The core interpreter may need to find a way to load libstdc++.so.6 in the global ELF namespace, if and only if any extension module requires it. I hope this part won’t actually be necessary, because I don’t think there’s a good way to do it right now. Even if I put my glibc maintainer hat on and add one, that won’t help with older distributions.
libstdc++.so.6 and libgcc_s.so.1 will need to be added to the list of libraries that are not to be included in a manylinux wheel. (As a consequence, wheels using C++ will fail to load if the C++ runtime is not installed as a system package. This is unavoidable.)
Each future version of manylinux will need to specify a particular version of the C++ compiler, to be used for all extensions containing C++, directly or indirectly.
The last bullet is the most important one and, unfortunately, I fear it may wreck the perennial versioning scheme. There is no reason why “glibc 2.34” should necessarily imply “g++ 11.0” or vice versa; the mapping will have to be maintained by hand, and now we’re back to what I understand is your (@njs’s) most concrete objection to continuing with manylinux_YEAR.
(In case anyone is curious: no, you cannot mix C++ code compiled by LLVM with C++ code compiled by GCC, either.)
I don’t feel I have standing to raise a formal objection in this group, but in my opinion, the C++ issues above must be resolved before perennial can move forward. Adding “all C++ code must be compiled with G++ [version that shipped with CentOS 7]” to manylinux2014 is a one-line edit. The analogous edit to perennial would need to explain how to choose the appropriate version of G++ for each new tag, and I don’t think we even know where to begin with that yet.
And given that, I am in agreement with Nick when he says
although I have an additional caveat, which is that I think manylinux2010 has dropped the ball on rollout, and specifically on packager takeup (see here). So I would ask the people working on manylinux2014 to present a concrete plan and timeline (not as part of the PEP) for rollout.