Pre-PEP: Rust for CPython

True… but it does mean that you don’t have to be constantly on guard for the same kinds of spooky action at a distance as long as the interfaces into your modules with unsafe enforce the invariants properly.

forbid might “solve” memory safety, but not memory management.

Preventing memory leaks in Rust ultimately relies on the C-style protection, i.e. “hey, the docs warned you not to do that”. The forget and leak functions are safe, in Rust’s sense of the term.
That’s categorically better memory management than C (where leaving a dangling pointer is the default). But it’s no better than C++ with a linter.

We still need eagle-eyed expert reviewers. There’d just be vastly fewer places where they need to know to focus.
I’d not use forbid. It’s enough that Rust makes you explicitly write unsafe, or export_name, down in the code itself.

1 Like

True. Just before the v1.0 release, there was a realization that they couldn’t guarantee leak-free properties using the type system they built. They called it the “leakpocalypse” and rushed to exclude memory leaks from what they were guaranteeing.

I’m afraid that even Modern C++ with smart pointers does not let you completely safely manage memory. I’ve been bitten several times by this. I’ll also echo the blog post by Alex Gaynor discussing this that @ssokolow linked.

The examples in this blog post demonstrate that it is still possible to write code that looks safe but is subtly unsafe in C++. I think Rust’s approach to explicitly delineating where unsafety occurs is one of it’s best selling points. Being explicit about where unsafety occurs makes it much easier to review code. This is in effect what Petr says here:

Totally agree with this point. Unlike with C/C++, I can know exactly where I need to focus a code review on (i.e. the part with unsafe) to make sure new code uploads memory safety invariants. This makes it much easier to trust the rest of the code’s safety and reduces the amount of mental load needed to review code, which you pretty much say here:

I think it should not be understated how significant this can be. I expect a large reason why Android found Rust faster to ship with is because of the reduction in review burden. It also likely leads to the lower reversion rate of Rust MRs that they saw.


Separately, I also want to point out that we can look at the experiences of projects that have chosen to move to Rust: Firefox, Chromium, and Android are all massive C++ projects already. Chromium for example, has started adopting libraries to enforce spatial memory safety to prevent issues like buffer overflows. However, they describe these efforts as lower-cost, but also lower-reward:


(source: Memory safety)

So while we could introduce C++, the benefits are lesser than introducing Rust for the goal of increasing memory safety.

Finally, I would additionally point out that Rust has other benefits that are not possible in C++, such as compile time thread safety. With free-threaded Python becoming more important, eliminating data races will become more and more critical. Having language support to enforce thread safety would be a significant improvement to writing thread-safe code.

10 Likes

I came here to say this. The whole argument for Rust seems to be “because Rust”. That is not a good reason to change a foundational technology in any project, and certainly not when the new technology directly threatens a project’s well established wide-spread use.

I suggest any attempt to change C in CPython(!) shall be subject to an extensive evaluation of at least 3 alternatives, along with options to achieve the same objectives in C, across a cleary identified & agreed upon set of criteria.

3 Likes

For what it’s worth, the majority of CPython’s security issues appear to be problems in our dependencies or oversights in pure-Python code, not memory safety problems in our C code.

If we were to rewrite some of our Python code (such as the email module) in Rust, would we avoid those security problems?

14 Likes

Without commenting on the rest of your comment, I want to make one point extremely clearly: there is no pathway to achieving the same safety guarantees as are provided by memory safe languages in C. Lots of people have tried, including people with really strong financial incentives to ensure that their existing C code does not become a liability.

The uniform conclusion is that there is no way to make existing C code safe, with the same degree of confidence, without trading off substantial performance, or using a substantially unique dialect of the language (e.g., Apple’s firebloom).

I’ve been writing about memory safety issues for a decade now, and one of the mistakes people make in understanding my advocacy is that they think I care about this issue as a way to advocate for Rust. That gets the matter backwards (at best), Rust is an important language because it changes the paretto frontier for languages, replacing the status quo ante that assumed that performance and safety were inevitably in conflict.

25 Likes

There’s no such proposal to “change the C in CPython” under discussion.

1 Like

I have a few thoughts related to this, it’s definitely an important point to consider!

First, I wonder if our current labeling process means that use after frees, buffer overflows, and other memory safety issues that might be considered security problems are labled type-crash instead of type-security because they manifest as crashes. It’s easy to to label an issue type-security when it is a CVE in a dependency, because it’s clear there’s a security concern there. It is not as clear whether to label a crash as type-security. This might bias the current tagging.

Second, even if only a small portion of these bugs are security issues, memory safety has benefits beyond just security. A large number of the type-crash bugs look to be buffer overflows, use after frees, data races, etc. Making code memory and thread safe would greatly reduce the occurrence of these bugs. I think reducing the instances in which CPython can crash is also a substantial benefit. I hate to sound like a broken record but this is likely exactly the benefit that the Android project saw from introducing Rust: eliminating memory unsafety bugs brings more confidence in new code and fewer reverts or follow-ups. If you look at the issues for type-crash, many are in the new JIT code. Frequently changing, complex new C code is exactly where you’d expect to find memory unsafety issues. In a hypothetical timeline where the JIT were written in Rust, these bugs would be much less common and less time would need to be spent fixing them.

Third and finally, we could evaluate moving some of our dependencies to Rust, to address the security issues therein. For example, one appealing target for this would be the bz2 module. There is a pure-Rust re-implementation of libbzip2, which is not only safer, but also 5-15% faster! There’s also a zlib-ng compatible zlib implementation zlib-rs, which we may want to consider vendoring eventually. I have already opened an issue to allow these to be found via configure. There are likely many other areas where we could consider replacing dependencies with safe Rust implementations, such as the xml module with expat. But we should start with a limited scope first, which is why I think libbzip2-rs is appealing.

Potentially! I think Rust’s strong typing can help in making bad state impossible, which should help in many such cases. But then again there are things we could do in the Python code to prevent these such as introducing Python typing :slight_smile: (a whole other discussion)

Just like for C code, if there’s a good motivation to re-write Python code in Rust we can consider it. One potentially appealing opportunity I could see would be to use the Servo project’s browser-grade HTML5 parser to replace html.parser. But even there I think we should be pragmatic about whether that change is worth the benefits.

9 Likes

I think the large majority of type-crash issues don’t really occur in practice and are just the result of fuzzing or someone messing around.

4 Likes

Let me moderate that: it is the majority of CPython’s detected or reported security issues.

And apart from that, there’s also possible dysfunctions such as memory leaks due to having to release resources by hand (a Sisyphean task when writing CPython code).

4 Likes

But how much mental effort does it take any of us to review our C code to meet that high bar we have set for ourselves? Now imagine that it was written in Rust; do you think it would take more, the same, or less mental effort compared to the C code? I would argue it would be less effort with the Rust code, which means getting to spend that mental energy on other things.

I know I won’t get back the time I have spent manually tracing through all of the gotos in a PR to make sure there wasn’t a missed Py_DECREF or some other issue over my decades of doing this, but I would like to not have to do it ever again if we could help it as I have better things to do with my life than focusing on something a compiler could do for me to help make Python work appropriately.

21 Likes

I’m pretty sure, many such cases could be checked by a linter doing static code analysis on the Python C (extension) code.

Given how many Python C extensions are out there, wouldn’t MS want to spend some effort on creating such a tool as a VS Code extension or Github review tool ? :slight_smile:


In any case, I think the discussion here is going way off-topic. The topic is not to rewrite CPython in Rust (that’s what RustPython is for), but only whether we want to accept modules written in Rust in the stdlib.

Given that extensions can happily live on PyPI and use Rust, I don’t see much incentive of adding a new major dependency to CPython, turning it into CrustPython :wink:

If you want to rewrite a stdlib module in Rust (or even all of them, similarly to what is being done for binutils), that’s also possible via PyPI. And if you want to use the Rust version(s) because you deem this more memory safe, you are free to do so as well.

And the same is true for extensions written in C++, Go, Zig, Nim, etc.

I really wonder why we are having this discussion for CPython.

12 Likes

Perhaps so! But there are also a lot of other issues (temporal, spatial and thread safety) that people have found impossible to implement linters for to check in C without turning the language into a dialect. A big advantage of Rust is that you get all of these bundled up into a nice package with good ergonomics :slight_smile:

On the contrary, the topic is to re-write parts of CPython in Rust where it makes sense, but do so gradually and in a manner where Rust is not (yet) a hard dependency. I expect a future PEP will propose making Rust a hard dependency, but we need experience working with Rust as an optional part of CPython to know when we can make it required.

I think the incentive is that we the developers of CPython can take advantage of the ergonomics, robustness, and features of Rust. And I hope to attract new contributors who are otherwise scared off by C. Anecdotally, I’ve seen a number of people mention they would contribute to Rust code but not C code in CPython.

13 Likes

As previously discussed, if CPython requires Rust, there are many problems that we see no hope of solving without breakage, such as the bootstrapping problem, the long-tail platforms problem, and the fork problem. They aren’t about “our experience”, they should be solved in Rust itself instead of CPython, and there is little we can do about them in CPython. Unless these problems get solved in Rust, I think it’s too early to talk about making Rust required.

3 Likes

It’s definitely too early to require Rust, but it’s not too early to begin experimenting and understanding what “Rust for CPython” actually looks like in practice.

15 Likes

Well, consider that CPython has existed for 30+ years, has been used by many large companies such as Google or Microsoft, yet nobody has ever spent the effort of writing a “static code analysis” tool [1] that would flag incorrect reference counting in C extension code.

No, because users would still get the C version by default, while using the Rust version would require changing their code and adding a new dependency.


  1. Assuming that is possible, which is an open question ↩︎

4 Likes

Note quite nobody: GitHub - davidmalcolm/gcc-python-plugin: GCC plugin that embeds CPython inside the compiler. That project does seem to be inactive though.

1 Like

Yes, what I mean is that “Rust for CPython” should focus on real benefits from optional Rust parts, for example optimizing operations on memoryview, rather than rushing to make Rust required. In my view, the goal shouldn’t be to figure out when Rust should become required, but to provide a better experience for users.

2 Likes

First, I would like to highlight that I said a future PEP. Making Rust required is not currently being proposed. At most I want to express we hope to make it required eventually.

Additionally, I definitely see hope for solving these, I just think it will take some time! Which is one reason to wait and propose making Rust a hard requirement in the future. If things go horribly we can always decide not to require Rust and pivot. However, I think that’s pretty unlikely.

The Rust team has been extremely kind to reach out and offer to discuss what CPython needs to make Rust work for this effort, similar to Rust for Linux. I expect Rust platform support to improve when projects like rustc_codegen_gcc mature. That should enable platforms like m68k, PA-RISC, and others to build Rust.

I agree it is too early to talk about making Rust required, at least right now. I think we should make it clear that we hope to make Rust required in the future, so that maintainers of platforms where Rust isn’t available can let us know and we can fully understand the fallout of making Rust required.

Well, I think this can be a goal. Nothing wrong with being forward looking. Of course, providing improvements to users will be the primary goal!

4 Likes