Pre-PEP: Rust for CPython

I would quantify it in terms of the timeline a codebase will still run. 1,2,5,10 years.

Yes, I moderate it. I’ve been burned by changes, and I guess this is why a foundational change like moving to Rust causes concern. Why not defer this to Python 4?

1 Like

Because there’s no plan to ever have a Python 4.

18 Likes

I don’t think CPython makes a hard and fast guarantee that your code will run unmodified in 1, 2, 5, or 10 years. But even if it did: the presence of Rust inside the runtime doesn’t seem material to that property, or at least is no more material to it than everything that happens on each minor release of Python 3 anyways.

6 Likes

Given that this thread reached 100+ posts in a day (!), you might want to edit the OP to make this clear. This thread is getting some broader coverage and I expect more people will be jumping in without reading the whole thing.

16 Likes

I mean, there’s already +12,777,745/-9,190,109 total changes in a project with (currently) 2,791,005 lines of text (git diffs include non-code so this count does too). Arguably, cpython has been rewritten at least 4 times now.

Good idea, I updated the OP to mention and link to the updates to the timeline section, but left the original text for posterity.

7 Likes

Hi, thank you for being willing to take on this project!

The proposal overall seems great, with the proposal scope and bootstrapping story being the only potentially major issues I see. I am only gonna comment on that, as other folks are already addressing remaining concerns. This discussion is already big as-is, and I have no doubt it will still grow much larger :sweat_smile:


Proposal scope

Unless I am misinterpreting it, the proposal foreword seems to imply that this is a phased proposal that will eventually add Rust as a required build dependency of CPython, but the presented PEP only covers making it an optional build dependency.

This proposal declares Rust will at some point will stop being optional. While I understand the motivation, I feel like deciding this right now is too early.

I don’t think the PEP provides strong enough supporting evidence to outweigh the potential implications of such a major change. The downstream impact is still pretty up in the air, as is the impact on development, etc., while the potential benefit of introducing Rust into the core code-base is still very unclear.

When I say the benefit of using Rust in core is unclear, I am not questioning Rust’s benefit over C. It’s the quantity of opportunities where it would make sense to use it that is unclear. I strongly believe we should not be rewriting existing code in Rust without a specific reason, and that’s what I think is unclear at this point.

Even the PEP is unsure of the timeline for promoting Rust to a hard build dependency, probably because of this.

I think this proposal would be much easier to drive forward if it were split into two phases, each with its own PEP.

  1. Introducing Rust as an optional build dependency
  2. Promoting Rust to a hard build dependency

This way, 1) would allow us to gather feedback from the development team, downstreams, etc., giving us a much better picture of the impact of 2), enabling us to make a better argument and design a more concrete plan.

With this in mind, here are my suggestions for the PEP text:

  • Add an “Abstract“ section explaining that the PEP is a first step into the adoption of Rust in the CPython codebase, introducing it in an optional capacity, allowing us to experiment, gather developer feedback, and better assess the technical implications of using Rust in CPython.
  • Add a “Goals“ section
    • Explicitly state that rewriting existing code in Rust without further motivation is a non-goal
    • Define a couple of goals, like the following
      • Evaluate how well the development team engages with Rust inside the codebase
      • Evaluate how well the CPython architecture couples with Rust
      • Evaluate the impact of first-party Rust APIs on downstream users (eg. PyO3)
      • Gather feedback from downstream users regarding the bootstrapping implications
      • Gather feedback from downstream users of platforms where Rust is not supported
  • Add a “Future“ section explaining that, contingent on the impact of this PEP, we plan to promote Rust to a hard build dependency as a next step
  • Remove “Keep Rust Always-Optional“ from “Rejected Ideas”

TLDR

I don’t feel the PEP provides enough justification to make Rust a hard build dependency, and I think that’s something that would be difficult to provide at this point. As such, I feel like this proposal would be better served by splitting into two phases/PEP — starting with adding Rust as an optional build dependency, and then promoting it into a hard build dependency.


Bootstrapping

I personally don’t think any of the given solutions are good enough at the moment, so it is very important to explore other options. IMO, even if no better solutions are found, the PEP authors should show they have exhausted all other sensible possibilities.

That said, I may be mistaken, but it’s my understanding that Python should only be needed to build LLVM for rustc. Perhaps it would be worth exploring the possibility of using the Cranelift backend instead, which AFAICT doesn’t need Python.

It may also be worthwhile to engage with the mrustc project, as they may have better insights on other possible approaches.


Another thing that I thought it would be pertinent to point out. If we were to introduce Python to the bootstrapping dependency tree (via Rust), that would greatly weaken the argument against moving CPython’s build system to Meson (discussion in What do you want to see in tomorrow’s CPython build system? ).

11 Likes

Good news, this was agreed upon somewhere in the next 100ish posts of the thread.

8 Likes

I claim that one of the major reasons for this failure is that cargo is almost unique among build systems in providing absolutely no structured mechanisms for:
(1) communicating with other package build scripts in the same dependency graph
(2) communicating with the downstream user who invokes cargo (such as a distro packager, or a github actions pipeline)

An update on work in this direction: The GSoC results page details progress on Prototype Cargo Plumbing Commands.

Not what you asked for, but evidence of how this sort of thing is on the radar.

I’ll chime in with a +1 on the idea of allowing Rust extension modules, -1 (at least for now) for the core interpreter and compiler (which is already the direction the descoped PEP has moved in).

For the core interpreter (at least the part which needs to be built in order for CPython to freeze its own frozen standard library modules), I think the bootstrapping and long tail platform support concerns are significant enough to at least postpone consideration of the possibility, and potentially even enough to block it forever.

For extension modules manipulating untrusted input data, I see huge potential value in having access to Rust as a fast low overhead statically typed language with rich data structures and implicitly thread local data access. (For a concrete example of that from nearly 10 years ago, here’s a Sentry post about migrating their JavaScript source map processing from Python to Rust, and the benefits of not incurring the per-instance overhead of creating full Python objects). There are some cases where platform compatibility may still be a concern, but extension modules will have more options for handling that than the core interpreter does.

16 Likes

My take:

if (len > PRECOMPUTED_CONST) return -1;
return (len + 2) / 3 * 4;

My point was exactly about the fact that crustaceans worry about overflow checking, borrow safety, traits, etc., and write verbose code that is beautiful, while pragmatic folk reduce the problem, put safety limits leaving the implementation very short.

After all, what’s the point of encoding a string that’s larger than a fraction of the total address space? The Rust overflow checker is only effective at ~3/4 RAM, at which point the argument and result cannot fit into RAM at the same time.

Fancy Rust safety is totally appropriate for user-defined or external input (e.g. networking code, cryptography or json.dumps argument where some inner dict-like object may have a custom dunder method that does some “caching” but ends up modifying sibling elements in flight or creates cycles), while simple concepts should in my opinion remain simple, so that the code remains maintainable, ideally also by contributors who are not Rust experts. Which is why I’m calling for the equivalent of PEP-7 for Rust use in CPython.

I’d go even further and decry “proven safe” as smoke and mirrors.

Remember the BAN logic proof for the Needham–Schroeder protocol?

To recap, every proof is against a certain fixed set of assumptions. Meanwhile what happens in practice is that software is reused in ways unpredictable a priori. RustBelt has proven something about Rust, but not about Rust use in CPython, or Rust use in 3rd party Python extensions and certainly not about Rust used within CPython when an arbitrary user program is run by the interpreter, with arbitrary additional extension, for arbitrary goals and with arbitrary thread model.

Here my call is to move most of Rust exultations into the footnotes, and focus on tangible direct benefits instead: safer refactoring / faster reviews, broader contributor pool / potentially more approachable to new contributors who grew up with safe/typed languages, specific CPython core bug classes (not generic C bugs), cleaner (more self-documented) internal APIs, safer norms for 3rd party extensions, possibly safer/faster backport story, potentially better tooling, possibly CPython guts (e.g. regexp) shared as crates for other uses…

8 Likes

This is definitely something I hope to work on with folks. We will need a standard style for Rust in CPython, but I also think that might depend as we adopt Rust and expand the current proof of concept. Regardless it probably should be it’s own PEP, in my mind drafted and published after this one is approved.

That’s certainly true, but in projects that have adopted Rust for a while, we see significant decreases in memory safety bugs. Here’s an excerpt from a blog about Rust adoption in Android:

We adopted Rust for its security and are seeing a 1000x reduction in memory safety vulnerability density compared to Android’s C and C++ code. But the biggest surprise was Rust’s impact on software delivery. With Rust changes having a 4x lower rollback rate and spending 25% less time in code review

Our historical data for C and C++ shows a density of closer to 1,000 memory safety vulnerabilities per MLOC. Our Rust code is currently tracking at a density orders of magnitude lower: a more than 1000x reduction.

From Google Online Security Blog: Rust in Android: move fast and fix things

I agree with you that it is important to highlight that Rust can increase developer velocity as well as provide memory safety. I think this is something we will highlight more in the PEP draft and is something a few people have mentioned. Here’s another excerpt from the above blog related to that:

For medium and large changes, the rollback rate of Rust changes in Android is ~4x lower than C++.

Rust changes currently spend about 25% less time in code review compared to C++.

These are definitely things we will be focusing on more in the PEP text itself.

9 Likes

While I agree that focus should be on other benefits (eg. I spent a decade in /r/rust/ and people coming from C++ loved the tooling most), I think it goes too far to call “proven safe” smoke and mirrors. That stance generalizes far too easily to things like “It’s a waste of time to make Python memory safe because import ctypes exists”, which makes it far too easy for people to dismiss… especially when the whole point of things like the safe/unsafe split and the way parts of Rust have been formally verified is to draw boxes around bits of code and say “assuming no external factor, such as bad RAM or abuse of unsafe violates the invariants, this code’s behaviour will meet expectations”.

3 Likes

I’m not a Rust expert; I only have work experience with C/C++ and Python. What advantages would Rust have over other languages ​​like Go? I’m just asking to understand Rust’s advantages and to make sure it’s for a real reason and not just to “Rustify” everything.

5 Likes

Go depends on having a garbage collector and garbage collectors are solitary creatures, which makes it unsuitable for writing extensions or rewriting components of a C or C++ codebase. (That’s one reason Jython and IronPython exist, instead of integrating CPython with the JVM and CLR. They live within the JVM or CLR’s existing GC instead of competing with it.)

Rust is noteworthy because it’s the only language to gain significant traction in this niche previously held almost exclusively by C and C++.

EDIT: To elaborate on that, Rust enables compile-time guarantees that C and C++ are incapable of without relying on a heavy VM to do it. That’s what makes it essentially unique. (Though other languages like D and Ada are starting to copy Rust’s innovations.)

15 Likes

Not a core dev, yet experienced in large scale sw development including changing of core tech. Spoiler alert: these efforts usually fail.

I always recommend to answer three key questions before embarking on changing foundational pieces of the stack:

  • Is the new stack introduced for its coolness instead of solving an actual problem?
  • Will the new stack introduce new problems that the old stack does not have?
  • Does the investment in time and effort to introduce the new stack compete with more worthwile work that delivers value to users?

If the answer to any of these questions is yes, and the change is pressed on anyway, the outcome will eventually land in one of two states:

  1. Efforts will stall and the resulting two-stack system is more complex than ever before, effectively meaning the change will be consuming ever more resources to no good cause.
  2. The complexities introduced by the new stack have a far higher blast radius than previously anticipated, triggering a complete rewrite, eventually reaching feature parity with no added value.

In short, I recommend to avoid the introduction of Rust as an alternative to C in CPython(!).

13 Likes

We’ve received a lot of messages from people who want to help, and even members of the Rust core team are willing to support us. As we mentioned earlier, there are two strong examples of Rust adoption done right: Rust for Linux and Rust for Android. Here’s the blog post from the Android team: Google Online Security Blog: Rust in Android: move fast and fix things

We’re fully committed to putting a lot of effort into this initiative, and to making sure we don’t fail :slightly_smiling_face:

We’re addressing a real problem: CPython like many other projects written in C or C++ suffers from memory-safety vulnerabilities. Rust can drastically reduce the number of these vulnerabilities.
From the Android blog post:

We adopted Rust for its security and are seeing a 1000x reduction in memory safety vulnerability density compared to Android’s C and C++ code.

I’m pretty sure this project will encounter at least one challenge: CPython contributors who don’t know Rust will need to dedicate some time if they want to contribute to the Rust parts. Other challenges will only become clear as we move forward, and that’s where members of the Rust core team may be able to help us. As I understand it, they supported the Rust for Linux project as well.

Sorry, but I’m reading that part of your message as if this were about business. CPython is an open-source project, and most of us volunteer our time for free. Because we’re volunteers, we’re free to choose whichever problems we want to work on. So, we chose this problem and here’s the solution we believe in: Rust :slightly_smiling_face:.

I’d like to quote Android blog again:

But the biggest surprise was Rust’s impact on software delivery. With Rust changes having a 4x lower rollback rate and spending 25% less time in code review, the safer path is now also the faster one.

4 Likes

Said every team ever. I don’t doubt that.

Just for reference, can you point to some of the memory saftey issues that would have been avoided if CPython were using Rust?

7 Likes

It is undeniable that Rust offers superior safety over C and can effectively prevent many errors that would otherwise occur. However, introducing Rust into CPython may inevitably lead to some divergence within the community, with some developers in favor and others potentially having reservations. Additionally, this would require developers to be proficient in C, Rust to effectively address related issues.

More importantly, as the proportion of Rust code in the project gradually increases, there may be growing calls within the community for a full transition of CPython to Rust. This could further intensify disagreements among core developers, somewhat reminiscent of certain situations the Linux community has experienced in the past.

If all proceeds smoothly, we might eventually achieve a RustPython that remains compatible with the C ABI. That said, RustPython already exists today, though it still lags behind CPython in terms of features and ecosystem. Wouldn’t steadily improving it be a more feasible path forward? This approach may prove more manageable than integrating Rust directly into CPython.

If there is a clear advantage to introducing Rust into CPython, it may lie in the ability to gradually migrate the official Python implementation from C to Rust while preserving existing functionality and compatibility. If the current path is maintained, CPython will continue to be implemented in C, and even if RustPython develops remarkably, replacing the official implementation would present significant challenges—since doing so would likely require the current maintenance team to undergo a major transition.

6 Likes

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler by serhiy-storchaka · Pull Request #129648 · python/cpython · GitHub and many other UAFs

5 Likes