Pre-PEP: Rust for CPython

Sure, and the fact that we wouldn’t take this proposal seriously should also indicate that the lesser proposal of “we really need … Rust” isn’t going to be any more persuasive.

On the other hand, “our drop-in replacement for json or re is 10x faster than the stdlib, can we merge it” is a far more interesting discussion to have.

Probably not :wink: But my interest would be piqued by a production-ready replacement for an existing module, or a new module that we want in the stdlib anyway and the best way to get it is to bring in a Rust implementation.

5 Likes

I think PEP 793 would disagree with this and probably be simpler from a Rust point of view because it doesn’t involve C macros (although that wasn’t the specific intent of it). (No comment on the larger context of the discussion…. just the specific paragraph)

2 Likes

Fair point, and perhaps we should adopt PEP 793. I hadn’t realized the PEP was accepted, exciting!

1 Like

Slightly tangential but I find optional stdlib components to be a menace and would really like their proliferation to be minimised.

For pre-built Python installations like those on python.org or python-build-standalone, there’s nothing optional about them – someone will need them therefore they must be included therefore everyone gets them whether they’re needed or not. For those installing from source, it’s a footgun with remediation steps so platform specific that they can rarely even be written down in any detail. For those building Python for other people, it’s both. For distro-provided Python, it’s a surprise extra system dependency that rarely follows any predictable pattern and doesn’t correlate to anything in a lockfile.

I’ve spent way more time debugging tkinter installations than even monster PyPI packages like Qt.

8 Likes

I have heard from a number of people a similar sentiment. I think therefore it’d be best if a new Rust module was a replacement for an existing C module that could act as a fallback, rather than an entirely new module with no fallback.

1 Like

In that case, perhaps Python’s build system would have to be pushy and fail without rust by default unless some --use-legacy-c-{module}-implementation flag is given? Otherwise, I suspect that many of the people downstream we want feedback from aren’t going to notice it.

2 Likes

I feel the current proposal is so conservative that it doesn’t really get us there.

I think that’s okay. This is going to be a many-year, perhaps on the order of decades, project. The first step is to add Rust as an optional feature of the CPython build. That will allow us to solve some tricky problems about how we can make Rust and C co-exist. And, we can find out how many platforms are going to be affected by this (e.g. they can’t get the optional features to build) and fix that for as many platforms as we can.

This initial step doesn’t necessarily help with code safety, API cleanliness, or performance aspects. I’m sure others will disagree but, to me, the benefits that Rust can bring on those is clearly demonstrated in other projects. We don’t need to prove those specifically for CPython.

Edit: maybe it’s useful to enumerate what I think are the clear benefits that Rust could bring. I’m not a Rust programmer so knowledgeable people will need to correct me if I’m wrong (as I’m sure they will gleefully do).

  • array bounds checking: in the year 2025, it’s crazy we are using a programming language that doesn’t do this. Yes, there can be extra overhead. No, you are not smart enough to get it correct. :wink:
  • use-after-free and other memory lifetime bugs. Rust’s borrow checker avoids most of these at compile time.
  • integer overflow: Rust doesn’t prevent it but at least it defines what happens. Undefined behavior is bad.
  • scope based cleanup: I’ve heard that the C language standard might eventually be getting something like __attribute__((cleanup(...))). It’s probably going to be decades before CPython could actually rely on C compilers supporting that. This pattern comes up so often and it’s painful we don’t have a built-in language feature that supports it.

There are other benefits and conveniences but these are the big ones, IMHO.

20 Likes

Adding more details about the points.

  • array bounds checking is on by default, but when the overhead matters, explicitly skipping check is possible for each element access..
  • integer overflow panics on debug build. it has both checked operation and wrapping operation. checked operation is default on debug build. wrapping operation is default on release build. Of course each operation can be explicitly marked as checked if it is useful for runtime. Otherwise explicitly marking as wrapping is also useful when it is intended.
2 Likes

This is the defer proposal, which currently exists as a TS (like a branch of the C standard). Here’s an overview of the thing (and the procedural reasons for the TS) in blog post format.

Clang has a mostly ready PR for this[1]. The GCC patch set is at v5 AFAICT. Long story short, this should be available in compilers relatively soon[2] and may make it into C2y if people like the feature and let people on the committee know.

However you’re likely right about the timelines. Anything before 2040 for broad availability (i.e. defer as part of the standard, not a TS, and implemented across all major compilers present on LTS distros) would be wildly optimistic IMO[3]. Who knows how the world (and CPython) looks like by then… :sweat_smile:


  1. check out the number of reactions as a barometer for people’s interest in this ↩︎

  2. MSVC’s C standard conformance has been horrible since C99; you’d be better off using clang-cl on windows than wait for MSVC on anything. They still haven’t gotten C99 or C11 finished, and ~zero for C23. ↩︎

  3. On the other hand, if people were willing to replace MSVC with clang-cl and rely on -fdefer-ts, you could probably do it in 5 years, assuming the benefits are so substantial that this would be worth it. ↩︎

5 Likes

You probably meant “borrow”. :wink:

19 Likes

Ok, here’s a potential exercise if the optional module story isn’t appealing.

The memoryview object is written in C and the performance of memoryview.index is currently horrid due to being written in a very naive way. Accelerating it in C involves generic programming in a language that doesn’t provide any decent metaprogramming. But if an alternate implementation of memoryview was written in Rust, then perhaps those accelerations would be much easier to implement (of course, you also need to develop some basic infrastructure for that: for example, a facility to dispatch to different generic specializations depending on the memoryview format).

So we could have an optional implementation of memoryview in Rust, and fallback on the C one on Rust-less platforms.

35 Likes

Thank you for bringing this up! I think memoryview could be a very appealing option for demonstrating Rust. Rust’s generics could make index a lot easier to implement as you say.

I also think it has made me reconsider where to draw the line of where it is okay to re-write things in Rust. I want to start this by re-iterating that we do not want to go and just re-implement everything and anything in Rust to start with. At least until we have a lot more experience with Rust in CPython, and given contributors time to learn and try working with Rust, we should select up to a few experimental changes to make, and review our experiences after making those changes. Areas re-written in Rust should have a justification for doing so, like memoryview. We should also consider if the most active contributors to that portion of the codebase are comfortable with Rust being introduced.

That being said I think we could expand where Rust could be implemented to include built-in objects, or really any part of the interpreter, if we also keep a C fallback of the implementation. For memoryview, we could keep the current implementation as a compile-time fallback and introduce a Rust version that takes advantage of it’s meta-programming (if needed) and generics.

This may mean we’d need to maintain two versions of certain parts of the interpreter for a little while, but I think getting more experience across the code base would be worth the extra effort. The maintenance effort of keeping two implementations should be considered as part of evaluating where to introduce Rust.

With C compile-time fallbacks, there should be no cause for concern on portability or bootstrapping.

This seems like a nice compromise between the very limited “Rust only in extension modules” and “Rust can go anywhere and we need to figure out bootstrapping”. Obviously the trade-off is maintenance burden, but I think if we’re smart about what parts we implement in Rust, we can minimize the effort.

Curious what others think!

26 Likes

Sure, but if we are going down that line of argument you could also argue there are other Python interpreters as well.

I don’t think it “necessarily” does either. But if the concern is for e.g. Gentoo and some of the platforms it supports, then knowing a major dev tool is looking to require Rust to compile it is another data point that we won’t be the sole program that could cause issues for those not supported by a Rust compiler.

I’m good with that.

4 Likes

Just want to throw my 2c in. I haven’t seen the following discussed. I’m an accomplished C++ programmer, and would want to note that in standard (Modern) C++ we have a number of ways to write memory-safe code. Smart pointers (both unique_ptr and shared_ptr) allow us to correctly and completely safely manage memory. unique_ptr allows us complete control over memory ownership. shared_ptr allow us reference counted pointers. Memory release, even when exceptions occur, is guaranteed by the language. const pointers allow us to safely manage what can be done through a pointer. So both memory leaks and use-after-free errors are completely fixed.

Buffer overflows can be overcome using std::get (for compile time checked) and .at (runtime checked) accesses on a std::array (amongst other types), overcoming buffer overflow issues.

4 Likes

Modern C++ certainly has more tools, but it’s categorically different from Rust.

See, for example, Alex Gaynor’s Modern C++ Won’t Save Us. Unless Sean Baxter’s Safe C++ Proposal (or something which provides similar guarantees) gets revived, that categorical difference in how much can be enforced at compile time isn’t going away.

(What makes Rust special is that it embodies a will to take drastic action to ensure that another team member working on the opposite side of the codebase can’t pull the rug out from under you without realizing it.)

2 Likes

IMHO, things are more nuanced than that. Really, if the issue was purely “CPython wants to introduce a language that’s safer than C in the codebase”, then C++11 could have been introduced ages ago and improved a lot of things (for example automating most/all decrefs through RAII constructs), all this without needing any additional tooling or creating any new bootstrapping problem.

There are cultural issues around C++ that have made this unapproachable, though. Those have to do with:

  1. C++ is a huge language, which makes it intimidating for plain C programmers, and also can make it more difficult to restrict language usage to a specific subset
  2. C++ is too close to C, which creates the possibility that it will slowly creep into the codebase; Rust is decidedly a different language (a C file cannot be valid Rust and vice versa), so plain C programmers feel less threatened as they can simply ignore the Rust files.

Rust solves those cultural issues as much as it is better technically than C++, and that is why it is being considered for CPython and e.g. the Linux kernel.

11 Likes

I don’t think we need new ways to write memory-safe code.
We need ways to prevent (in some way) the memory-unsafe code.

Modern C++ is great, but, how do you avoid the non-modern parts of C++?
Use a linter to highlight the use of raw pointers or C-style reinterpret casts? That’ll flag roughly all of CPython. For compiler-verifiable memory safety we’d need to rewrite everything, no good way around that.

As I understand it, Rust makes it much easier to ensure you stay in the “safe” area of the language. It’s certainly not perfect (you can use unsafe code at any time, and functions that crash or leak are “safe” in Rust’s technical sense of the term – as are functions that return a wrong result). But it’s vastly different from how legacy C (and thus legacy C++) basically defaults to undefined behaviour.

9 Likes

To me, that’s like asking “how do you avoid ctypes in Python”? Well, you just avoid using it. :person_shrugging: If you’re worried that a non-fluent contributor unnecessarily introduces unsafe constructs, just review their code carefully. Given the proficiency of the CPython core team and how serious our review process is, I’m confident this wouldn’t have been a problem.

Rust is significantly better than C++ and I don’t dispute that. My point is that Rust isn’t the first opportunity we had to move away from tedious manual memory management in C.

(FTR, Unladen Swallow was using C++ internally, and probably not just because of having to interface with LLVM)

In any case, the C++ ship has failed sailing long ago for us so this part of the conversation is a bit futile. :slight_smile:

4 Likes

Stick #![forbid(unsafe_code)] at the top of a module and it’ll lock out use of unsafe in that module and any module under it within the same crate.

(Using forbid instead of deny locks it in so you can’t use allow to make exceptions.)

EDIT: (For those not familiar with Rust, #![…] is the “applies to the containing block instead of the block which follows” form of Rust’s #[…] attribute syntax, conventionally only used for annotating the top level in a file, and allow/warn/force-warn/deny/forbid/expect are annotations for controlling compiler warnings and lints.)

EDIT 2: To be clear, I mean use of the unsafe keyword. Obviously, invariant-enforcing abstractions around unsafe which expose safe APIs (eg. the Rust standard library) are still callable.

EDIT 3: I realized I forgot to mention warn and looked the levels up… and I just learned force-warn exists as a counterpart to forbid.

1 Like

@pitrou: yep, I have nothing to add :‍)

Yeah, Rust has a built-in linter. That’s great (unironically), but not much of a selling point here: for any other language we can run an external linter.
For C++, the problem there its that if we forbid unsafe code, we’d get next to no benefits from C++'s compatibility with C.

Also, what forbid doesn’t help with is the fact that the term “safe” has a specific technical meaning in Rust, and the meaning is (necessarily!) a bit different from a given person’s idea of safety.

2 Likes