You have to differentiate a bit more. SSL is not only used for HTTPS requests. It’s also used by e.g. IMAP, POP3, SMTP, FTP, etc. Many protocols that initially did not support TLS received a way to convert a regular connection into a TLS protected one (usually via some sort of STARTTLS). Just talking about HTTPS in this context is too high level.
Now, when it comes to API, most of these implementation only need a bit of context setup and then wrap the socket. After that, they don’t really care much about the OpenSSL details anymore. So, in a way, we already have the necessary abstraction: pass in a few setup parameters, then wrap a socket provided by the application to talk TLS.
Many of the other APIs in the ssl module are not needed by these protocol implementations. They do have their use cases, but this would be something where we could say: please install pyopenssl instead.
Things are a bit different for the hash algorithms we pull in from openssl, but I guess most of these can be provided by platform libs in a compatible way - or we simply fall back to the plain C implementations.
Edit: here’s a link directly to a “full” Schannel sample (not a full HTTPS sample, it uses a trivial “length+message” protocol for demo purposes, and I can’t even tell what kind of auth it’s using without diving into a few more API calls to see what their defaults are, but it gives a better taste of what the API shape is than the first link above): Using SSPI with a Windows Sockets Client - Win32 apps | Microsoft Learn
Of course they’re going to vary, this is why nobody wants to write a widely compatible web server implementation anymore But in theory, the protocol should be the same ↩︎
Let me try … If all you want is to retrieve web content via HTTPS, you can often use higher level APIs provided by the OS to achieve this. Steve pointed out how this would work on WIndows.
However, OpenSSL goes a level below this by working directly at the socket level. Those OpenSSL APIs don’t know anything about HTTP as a protocol, nor any other protocol. They work at the transport layer of the network stack.
I don’t believe this low level of socket wrapping starttls API belongs in the standard library frozen with every CPython release anymore.
Do all of the WASM platforms even have a concept of sockets? (Surely nobody wants to suggest OpenSSL should be compiled to WASM and trusted for non side channel attackable crypto).
Yes, this also means things like IMAP, POP3, SMTP, and FTP-TLS do not either. Those are drained batteries, still charged and used by a few, but they’ll all be much better served by PyPI packages that can be securely fetched. (tying in with that ongoing recurring “removing dead batteries from the standard library” thread)
… Yeah don’t worry at all about those. We’ve got non-openssl fallback solutions built in already in hashlib and can change those under the hood quite easily.
It may seem ironic that I created hashlib in large part for the purpose of using OpenSSL to get world class performant implementations. It remains a world class source of those, but it isn’t the only way to get them and it would not even be a problem to keep _hashopenssl around for that even if the ssl module were gone. That set of C APIs never changes.
It’s certainly a very simple way of providing TLS to higher level protocol implementations and “simple is better than complex”
Given that PEP 543 suggests using a wrapped socket as one of its core feature as well, I suppose this can be had with all native OS TLS libraries. I only know OpenSSL, so can’t comment on other OS level implementations.
This was the point where I started pushing back on PEP 543, despite being initially in favour, because it’s a simple requirement with incredibly complicated implications. I assume it’s easier to do on macOS though, so the authors decided it was a fine requirement to have.
If it gets resurrected, I’ll have to continue to oppose it on these grounds until that requirement is replaced with something higher level. Otherwise, we end up with churn but still have to bundle OpenSSL on Windows even for basic requests (and users probably still have to manually inject configuration that the OS already knows but OpenSSL doesn’t handle properly).
2020-06-25: With contemporary agreement with one author, and past agreement with another, this PEP is withdrawn due to changes in the APIs of the underlying operating systems.
IIRC what this is referring to is macOS dropping support for wrapping sockets in the platform TLS support. Secure Transport is flexible enough to implement PEP 543, but Apple deprecated it. The new, supported TLS implementation is part of the Network framework, which is an entire new networking API that you have to buy into wholesale – you can’t use the BSD socket API and macOS platform TLS together.
Asyncio’s TLS support also requires use of some lower-level APIs that I’m not sure are available on Windows. (In particular, it wants both “sans IO” TLS where it controls the underlying network transport, and on Windows certificate validation can block, so if using the platform validator it needs to be able to push that off into a thread.)
I can definitely see the appeal of trimming down the stdlib to the point where it only has enough TLS support to run pip, and putting everything else on PyPI. But that’s a long way off. A more achievable goal would be to spin out ssl (and asyncio and everything else that depends on it) into third-party packages that ship with CPython, but can be upgraded from PyPI.
Beyond what was discussed in the SC meeting this past Monday, I believe Christian explicitly said he did not want to ship CPython with OpenSSL 3.0 and I have not heard him changing his opinion on that.
If you count Thomas and Greg, last Monday.
I vote +1 on a new module or dropping the entire module in favour of a fetch-like API for HTTPS.
I don’t think any of us are suggesting it is, but we don’t have to directly wrap it and expose it either. I mean how many people other than Christian know this stuff? That’s not a great bus factor to have for this sort of security-related stuff.
I think one of the questions is whether that’s useful enough to continue to have in the stdlib?
Depends. WASI has the equivalent of sock.accept(4). Browser-based stuff is at the the mercy of the browser, which means only web sockets and HTTP(S). The discussions around sockets are considering not even offering raw sockets and instead just going straight to HTTPS. Otherwise the runtimes could choose to provide some sort of socket support if they wanted.
I figure it may be useful to provide some thoughts on what we’re doing over at pyca/cryptography. For those of you not familiar, its the most widely used cryptography library on PyPI, and it uses OpenSSL. We ship binary wheels for macOS, Windows, and Linux that statically link a copy of OpenSSL, and those wheels represent 98% of our downloads.
In April of this year we moved our wheels from building against OpenSSL 1.1.1 to OpenSSL 3.0 (users are still able to build their own copies with OpenSSL 1.1.1).
Since then, OpenSSL has had to yank two releases:
One due to a buffer overflow in their RSA implementation on x86-64 CPUs with certain features enabled. This error was causing their CI to crash intermittently, but they missed it.
The other due to changing the API contract of a set of utility functions used internally to OpenSSL, which inadvertently broke innumerable internal callers. This was not caught by OpenSSL’s own tests, but pyca/cryptography’s tests caught it immediately.
Further, the OpenSSL 3.0 development process was characterized by innumerable regressions due to poor testing. Search the OpenSSL issue tracker for issues filed by myself (“alex”), “reaperhulk”, or “sfackler” (the author of Rust’s OpenSSL bindings) to see a great many of them: which were almost all found by running our own test suites.
In the immediate post-heartbleed period, significant effort went into improving OpenSSL’s quality, and it was successful. However, the 3.0 process has regressed this significantly.
As a longer term goal, we hope to find a solution that incorporates more verified cryptography, and cryptography implemented in memory safe programming languages.
My bad vibes around OpenSSL come from things like:
#7948 / #7967 – OpenSSL decided to implement TLS 1.3 session tickets in a way that can cause deadlocks or data loss. I filed the bugs ~4 years ago; they’re still collecting reports from users hitting them today (e.g. Aymeric Augustin just posted on one of those). The official resolution is that 1.1.1 will never be fixed, and 3.0 will remain broken by default, unless you explicitly use unbreak-me APIs (which Python doesn’t currently wrap). BoringSSL not only got this right from the start, but also tried to explain the issue to the OpenSSL devs at the time, and were kind enough to review the elaborate janky workaround I had to implement in Trio.
#18625 – OpenSSL tried to port over a change from BoringSSL, but in the process introduced an attacker-controlled heap corruption; their own test suite detected the problem but they shipped it in a point release (!) anyway. It took >2 weeks for them to ship a fix, and in the mean time they posted an incorrect workaround, and only corrected it after the main BoringSSL maintainer pointed it out.
oh it looks like Alex just cross-posted with me, see his post for more.
I absolutely agree that switching to BoringSSL would be risky and need some careful planning if it makes sense at all; it’s just that the alternatives also suck enough that we might want to keep it on the list of options to consider.
I would be super interested to hear more about the missing features.
Hmm, fair. I was imagining that since CPython is one of the projects that BoringSSL actually does support internally, it might be possible to work out some amount of upstream changes etc. to make it supportable. But I guess with CPython supporting multiple releases simultaneously, this could get tricky once Google internally drops support for older CPython.
It would definitely be a lot more reasonable if we could update ssl separately from the CPython release cycle though.
I don’t have a horse in this race (and I’m certainly bemused by certain decisions of the OpenSSL project), so my question was not about defending OpenSSL but about what a good trade-off is for CPython.
I’m however still willing to grant some leeway for the volume of regressions as a consequence of the sheer scale of their 3.0 refactor (which was like ~30% of the commits out of a 20 year project history at the time of release); the recent embarrassment with the punycode function CVE aside, perhaps things will stabilise a bit now that this refactor has been digested… But I realise hope’s not a strategy, haha.
But OpenSSL runs the pyca/cryptography test suite in its CI…? Obviously I wasn’t following this nearly as closely as you, just surprised to read that.
The discussion sounds like there’s no optimal solution to the problem.
OpenSSL 3.0 still needs to get more stable
BoringSSL is not a reliable replacement in the sense that they only support Google internal releases of CPython and not necessarily all actively maintained public CPython releases
Platform APIs each do their own thing and don’t necessarily provide functionality needed to implement the simple “wrap a socket” API; they are also not necessarily stable enough to bet on long term as can be seen with the deprecation on macOS
Switching to a different 3rd party lib such as NSS would just bring along the same issues we have on Windows and macOS (having to bundle it with CPython releases), in addition to also adding this requirement to Unix platforms
The best bet at this point seems to be to hope that OpenSSL 3.0 stabilizes in time for the 1.1.1 EOL. The OpenSSL team is looking for more (paid) help and does seem to be better funded at this point, so things may well get better soon.
In the meantime, we could add a new simple http.fetch() API which interfaces directly to the OS platform APIs for fetching arbitrary URL data and really only does this (without all the bells a whistles of defining ciphers, certificate stores, authentication, POST parameters, callbacks, etc. - keeping things simple and leaving all of the config and maintenance to the OS).
pip and friends could then bootstrap using this API and pull in externally maintained libs as needed.
I just wonder what we could use on Unix platforms for this. My best bet would be interfacing to libcurl, adding a new dependency to CPython On the plus side, libcurl is widely used.
I very much agree with that. My suggestions (if I had a lot more hours in a day to do them myself):
Maintain compatibility with both 1.1.1 and 3.0. Use 1.1.1 for binary Python distributions for another while.
See what it’d take to add semi-official BoringSSL support to upstream CPython. What we do in cryptography is run against a fixed git SHA of BoringSSL in our CI, and we have a bot that bumps it regularly.
Revitalize the PEP for an OpenSSL-agnostic TLS API.
That’s historically been the suggestion of how to handle Unix platforms. And since curl is probably in the top 5 most widely shipped pieces of code in the world, it’s probably pretty safe to use.
I’m pretty sure libcurl has its own maintenance tire fires? Regardless it is an example I’d have given as well for posix systems. There are presumably others. Channeling Alex, something verified cryptography based might be ideal. It’s a matter of availability, long term viability, and trust. This part does not have to be decided now.
Despite running our test suite, it turns out they ran it against the wrong version of OpenSSL (system rather than the one they built in CI) from the date it was added until just a month or two ago when I debugged and fixed it.
Given that Alex and I work on the same project it’s probably unsurprising to discover I share similar feelings. I will also add that the OpenSSL project’s choicesregardingQUIC support have resulted in a quictls/openssl fork (which node.js ships) and no real timeline for useful support in OpenSSL itself.
I think we should keep the OpenSSL dependency on Linux (or: non-macOS Unix-like platforms, in general).
As a distro maintainer, I’d prefer keeping _hashopenssl rather than relying on vendored code for cryptographic hashes. The usual reasons against vendoring apply: the inevitable CVEs should get fixed once by the distro’s crypto experts, not by multiple maintainers in everything that needs a hash function.
Even if CPython makes some deal with BoringSSL, I don’t think it’ll extend to distros with longer maintenance periods. For redistributors it seems it’d be safer to build with OpenSSL.
(And with my red hat on: In RHEL, where alternate cryptography is exceptionally costly, we’d probably need to patch OpenSSL support back in. But I don’t think CPython should take RHEL issues into account, they’re more about compliance than good crypto.)
AFAIK, the OpenSSL 3.0 API needed for “fetch” works fine. And it’s already wrapped.
Or is the suggestion to use libcurl on all platforms?