Pre-PEP: Rust for CPython

Thank you @emmatyping @eclips4 for proposing this (and @ngoldbaum for the ping)! Very excited to see this initiative and as a proponent of Python, Rust, and the two together, eager to be involved.

As a longtime PyO3 maintainer I have been thinking about what this might look like for a while. I asked about exactly this kind of possibility at the Language Summit earlier this year to gauge the temperature for anyone wishing to experiment. The response I heard then was that experimentation towards a concrete proposal was welcome (I hadn’t yet found the time to explore myself, so thank you).

I have a number of comments so will try to keep each brief for now and we can expand later if needed.

I completely agree with both of these points. Depending on PyO3 as currently implemented within CPython will introduce unwanted friction. PyO3 also supports PyPy and GraalPy, as well as older versions of CPython (currently back to 3.7). Rust for CPython will presumably not need to support anything other than current CPython.

At the same time, CPython will need safe higher-level Rust APIs to get the benefit of Rust. PyO3 has a lot of prior art on the high-level APIs (and hard lessons learned); I think the right approach here will be similar to what attrs did for dataclasses - the Rust APIs implemented by CPython can pick the bits that work best.

gccrs is an alternative implementation of Rust for GCC backend, which is not yet at feature parity but an important target for Rust for Linux. If CPython was using Rust, I would hope that efforts for non-llvm platforms may be helped by this.

PyO3’s proc macros function a lot like argument clinic - we could potentially reuse parts of their design (and/or implementation); PyO3 might eventually even depend upon any implementation owned by CPython. I would recommend this choice as the more idiomatic way to do codegen in Rust.

I agree with both of these points; we may want some experience before deciding how CPython would want to commit to supporting these crates. At the same time, having a way to consume in-dev CPython versions immediately in PyO3 would really help with iteration speed for the ecosystem.

PyO3 has precedent of “experimental” features for things not yet stable, one middle ground might be to allow PyO3 to have an experimental feature which switches out pyo3-ffi’s curated FFI bindings for the ones generated by CPython.

There are a couple of takeaways from Rust for Linux that I think are most interesting here:

  • Social questions - inevitably not everyone will want to be using \<insert language X here\> instead of \<insert language Y here\>. Some current maintainers might churn from not wanting Rust, and other new maintainers may be attracted by the appeal of using it. The Python community places a lot of emphasis on inclusivity and I’d hope we would welcome everyone’s opinions as valid even if eventually a decision driven by the majority must be taken. (Not implying here whether the majority is for or against exploring Rust support; that is the purpose of having these discussions, after all.)
  • Technical opening - Rust for Linux has unsurprisingly become a major strategic focus for the language. I would hope that Rust for CPython would have justification for carrying similar weight in the focus of the Rust project should there be friction where Rust (and cargo etc) do not currently meet CPython’s needs.
15 Likes

Note: I’m working on replying to everyone but wanted to get some initial replies out, I appreciate patience with this.

I agree with @jacopoabramo on this, I like to think that the Python community will be better about being productive, amicable, and respectful when discussing these issues. So I think the experience will be altogether different.

I reached out to David Hewitt who has now responded in this thread and will contribute to the effort (thank you, David!) . Kirill has reached out to the RustPython team as well!

On the contrary, it should be much less effort as safe Rust is thread-safe due to the borrow checker. There will need to be some support added to handle integrating attached thread states, but it should be relatively easy.

Sorry, I’m not sure if you are asking if Rust extensions will work with the JIT or if Rust can be used to implement it? For the former they should work without issue. For the latter the JIT currently relies on a custom calling convention which is not yet exposed in Rust (but is being discussed to do so last I checked). I don’t suggest moving this code to Rust until it is reasonable to implement in Rust and the necessary calling convention features are available.

This is definitely something to be cognizant of, thank you for bringing it up! There are several strategies which will not affect performance, such as abort on panic, which we can explore. We will probably want to abort on panic anyway since unwinding over FFI layers is UB.

I think this is an interesting proposal, but orthogonal to the current one. We hope Rust will eventually become required to build CPython so it can be used to improve the CPython core. I would suggest splitting this off into it’s own thread to discuss it further.

Again I think it is important to highlight that this proposal is more than just _base64, and more than just optional modules. We’d eventually like to make Rust a hard dependency so it can be used to improve the implementation of the Python runtime as well.

Thanks Alex! We definitely want to approach this carefully and with thought, your expertise will be invaluable!

I think we necessarily need to make a plan for long term adoption so we can figure out when Rust can be a required dependency and ensure we plan ahead in advance well enough for it. I don’t want to get anyone caught by surprise when suddenly they need Rust to build CPython when they don’t expect it. I will say that the final PEP will probably be what you propose plus a timeline to make Rust a required dependency. Iterating on ergonomic APIs for Rust is definitely something I’d be working on if cpython-sys is approved.

Perhaps then we can ensure that releases build with an older nightly Rust to enable such bootstrapping? I expect these cases to be relatively uncommon - Rust supports a large number of platforms.

I would restate our goal as “slowly introduce Rust to carefully integrate it and make sure we get things right and give people time to adapt to the significant change.” _base64 is chosen as an example as it is easier to implement, easier to understand, and would only affect performance, so is entirely optional. I think there are several existing modules that could see clear benefits from being written in Rust, eventually. Especially for those that interact with untrusted input such as json, it would be a significant improvement security-wise if we implemented them in a memory safe language. But I also don’t want to rush in and cause breakage. These kinds of changes should be done carefully and when well-motivated.

Good point re PyPy requiring some bootstrap Python itself! I hope that the approach of using an older CPython will be workable.

I absolutely agree, safe abstractions over things like argument parsing and module creation make PyO3 a joy to use. I hope we can collaborate with the PyO3 maintainers and provide similarly pleasant abstractions in CPython core. It will certainly be a high priority. I do think even in this simple example there are examples of safe abstractions that provide benefits. Kirill wrote up an abstraction over borrowing a Py_buffer that automatically releases the buffer on drop, so that it is impossible to forget to do that and cause a bug: cpython/Modules/_base64/src/lib.rs at c9deee600d60509c5da6ef538a9b530f7ba12e05 ¡ emmatyping/cpython ¡ GitHub

Thank you Guido for your trust and words of support, it really means a lot!

17 Likes

Speaking in a broad way to answer a broad (“depends on many details”) question:

In C, the primary compilation unit is each individual file. In Rust, the primary compilation unit is each entire crate. Rust offers many conveniences over C (e.g. not needing to “forward declare” things in various files) that depend on the fact it can treat crates as individual units. As a result, if you were to, for instance, compare the two languages based on “files compiled” but one is a crate, the results may be surprising.

But with thoughtful (or just arbitrary) splitting of code, Rust does allow you to keep iterative builds relatively fast, especially if all the code you haven’t changed is in crates upstream of the one you changed. Paying attention to this is especially important for the -sys crates containing bindings, as bindgen essentially involves running a compiler to generate code and thus is not cheap.

It’s worth noting that C also enjoys faster compilation with careful management of inclusions and include-guards on repeatedly-included headers.

There are many caveats, exceptions, and and-alsos one can attach to what I just said when things get more specific. Some Rust idioms will not make it faster to build something, due to e.g. use of generics or #[inline] that requires multiple downstream instantiations in multiple downstream crates.

And of course there’s ongoing work here, such as relink, don’t rebuild, where some of us are attempting to make it so changes to upstream crates only cause rebuilds of downstream crates if the upstream crate actually changed its public API.

4 Likes

Thank you for addressing unsupported platforms where CPython can build/run today that would become untenable under this proposal, it’s an important thing to talk about up-front.

I think it’s worth being more explicit about this. I understand the general point, but having references to some recent issues that would have been avoided would strengthen the value proposition of the proposal.

Obviously any extension modules written in Rust will require knowledge of Rust to contribute to.

Since this proposal is looking towards using Rust in the core as well, I think it may be harmful to frame this point in terms of extension modules specifically. ”What will we do about doubling the number of programming languages in the core” feels important to address up-front.


Some other questions that occur to me:

  • It seems to me that an eventual PEP should address ”Why not put that development effort towards RustPython?” in the Rejected Ideas section
  • Does the interaction between PEP 11 support tiers and Rust support tiers merit adjustment of CPython’s policy? Having 2 additional dimensions to keep track of feels complicated.

The idea makes me nervous, but I am well outside the core team, so it’s possible I am not familiar enough with problems this would resolve. I do see the merit in general, especially for the part of this proposal targeting extension modules specifically. Doubling the tooling required to build CPython seems like a bad trade from where I’m standing, but maybe I am underappreciating the value-add.

It feels a little big for a single PEP, unless the ideas for integration in the core are “over the horizon” and the eventual PEP would be just about allowing this for extension modules.

9 Likes

Then you force me to factor in my experience in Rust, my experience in mixed-language codebases, and my experience in teams, and come down as a firm -1 on the entire proposal.

16 Likes

I completely agree with this; my experience of updating Rust code based upon PyO3 to support free-threaded Python has been relatively painless.

I would even go further and suggest that Rust for CPython may be able to assist with the implementation of free-threaded Python; if there are current stlib extension modules needing to be updated for PEP 703 which lack maintainers, migrating them to Rust may be a way to bring on new maintainers and get them thread-safe at the same time.

For what it’s worth, PyO3 has a mechanism for carrying panics as Python exceptions through stack frames, but I would agree that for sake of binary size and simplicity, an abort would be good enough (the Rust panic hook could be configured to call into Python’s existing fatal exit machinery).

2 Likes

In my previous job, I developed and maintained a large codebase mixing C++, Python, and Rust. I am very familiar with these 3 languages.

You can notify me if new Rust code or docs requires review. I support if we can start with some extensions with C/Python fallback.

Based on my experience, I want to highlight that certain common practice do not align with Rust. For example, fork() is not usable for many Rust libraries (states may be incorrect after fork). https://stackoverflow.com/questions/60686516/why-cant-i-communicate-with-a-forked-child-process-using-tokio-unixstream

8 Likes

It’s been such an exciting time for Python lately, with this proposal, lazy imports and frozendicts!

Another benefit that you might want to mention in the PEP is how IIUC this would open the door for the possibility of shipping with Python modern Rust-based tooling for dep management, formatting and linting, with all the performance, correctness and community enthusiasm that comes with it.

3 Likes

I’m not familiar with the risks you speak of. As David Hewitt points out, there is a work in progress implementation using gcc, so while there is currently one compiler, that will not remain the case.

I mostly will defer to @workingjubilee’s excellent answer and their expertise. I will add however that build times are dear to my heart, so I think good devguide documentation on setting up a dev environment that is configured for fast incremental builds will go a long way in helping.

This is a really interesting perspective, thanks Danny! I will say getting cargo to fit into CPython’s current build system was a little tricky. That being said I think the ABI question may be less important until Rust code is exposed to users, which will be after we’ve had a lot of experience working with cargo ourselves and can be planned independently of the initial integration, in collaboration with PyPA.

Very excited to have you join us! We greatly value your expertise on Rust and Python interop.

This is right along the lines of what I was thinking, so glad we are on the same page :grin:

Excellent point! We should make sure to note this in the PEP.

Absolutely, I think this would be a great path forward. I would love if the code could be shared across PyO3 and CPython!

Yes, there will likely be a few things upstream that may need some work but probably (hopefully!) less than Linux!

If you look at issues labeled type-crash you will see a number of issues, such as Use-after-free due to race between SSLContext.set_alpn_protocols and opening a connection ¡ Issue #141012 ¡ python/cpython ¡ GitHub or heap-buffer-overflow in pycore_interpframe.h _PyFrame_Initialize ¡ Issue #140802 ¡ python/cpython ¡ GitHub or JSON: heap-buffer-overflow in encoder caused by indentation caching ¡ Issue #140750 ¡ python/cpython ¡ GitHub. There are many more.

Absolutely, I hope that thorough devguide coverage and good tooling will go a long way in making this experience pleasant. I also think having a team of experts will be useful.

Thanks, will add this to the list of rejected ideas to add. I also want to cover other language choices, and a few other things.

Hm, what adjustment did you have in mind?

I’m sorry to hear that Steve. I’m happy to chat further about your experiences at some point, there are definitely wrong ways of integrating new languages into existing code bases.

That’s good to know! I think we will have to evaluate this (as with many things) and decide based on what our experience finds out.

Thanks Steven! It’s been great to see a number of people excited about contributing to CPython in Rust if it were added.

fork() + threads is sadness pretty universally, it has been an issue for CPython before, so it’s an issue I’m well aware of. Thanks for bringing it up!

That’s a good point, I’ll make sure to note that in the PEP. I misunderstood this post I think, see my comment below.

3 Likes

Nothing in particular. PEP 11’s requirements for each tier’s support are broad enough (language-agnostic enough) that it’s very possible that no changes are required, but I did raise my eyebrows at the intersection with Rust’s own support tiers.

1 Like

The risk, in short, is that rustc could easily include all kinds of code that isn’t obvious to an outside auditor. Ken Thompson demonstrated this using a hack that created a login back door; the same thing could infect a more modern system by secretly downgrading TLS in some way, making it possible for a third party to snoop supposedly-encrypted connections.

Perhaps in the future this won’t be as much of a consideration, but that would be then, and this is now. Right now, how can we trust rust? How can we know that a Ken Thompson-style hack hasn’t already been done? How do we ensure that one won’t happen in the future? These are not merely academic questions. Python is a well-trusted language used extensively across the internet; if someone with a strong agenda decided to target it, it would be an absolute catastrophe, not least because of how insidious it would be.

6 Likes

@davidhewitt Hi David, and thank you very much for maintaining pyo3!

I have a question that we should probably address in the PEP: Can we integrate MIRI into our workflow? Have you tried using it in pyo3, and what were the results?
TL;DR: MIRI is an undefined-behavior detection tool for Rust, so I’m guessing it could be helpful for us :slightly_smiling_face:

2 Likes

It seems to me that adoption of Rust for various other core system components offers some level of assurance here. While there are still risks, it seems likely that they would be discovered relatively quickly as Rust adoption increases. Particular cases I’m thinking of include:

  • Rust in the Linux kernel
  • The Rust reimplementation of coreutils being adopted for Ubuntu
  • Microsoft including Rust in the Windows kernel

While the risk involved in a single compiler implementation remains, the chance of detecting any consequent compromise seems like it’s going to rapidly decrease as projects like the above become more widespread.

6 Likes

That means that the potential attack surfaces are many. It doesn’t make the decision right for any other project. To be quite honest, I have these exact same concerns regarding the Rust rewrites elsewhere; but (for example) a Rust-based sudo would require that someone first gain shell access as a non-privileged user, and THEN be able to wield an exploit embedded in sudo. With something that is key to many web sites and other internet-connected services, the attack potential is far more direct.

And that would be a strong protection, if the only type of attack were one that hits everything all at once. Unfortunately, as Ken Thompson’s hack proved, this sort of attack can be extremely narrowly targeted. And Python is a juicy target.

5 Likes

Is any language really safe from this? Surely gcc alone is a hugely valuable target for such an attack–how do we know it hasn’t happened? Heck, how do we know it hasn’t happened in CPython?

It seems like this is a separate security discussion that goes far beyond Rust.

16 Likes

I want to add that this risk is real.

Besides the code in rust compiler, Rust support procedural macro which executes user-written code during compilation. It requires a lot of work to fully audit.

This happened in Rust community before. One previous event is a well-used library shipping pre-built binary to accelerate procedural macro without notice, causing many worries from users.

How about a switch to turn off all rust code?

1 Like

The biggest protection is having multiple compilers. Do you want to make sure your gcc hasn’t been infected? Download the source code for gcc, and compile it using clang. This could also be affected, but only if someone has infected BOTH compilers. The more diffferent options there are, the less of a threat this is.

The threat is, by definition, only relevant to compiled languages that compile their own compilers. It’s inherent in the bootstrapping. So this can’t happen in CPython as it currently is, unless there’s some way that Python code is being used to generate future versions of the CPython binary, in a way that’s independent of the source code (eg Argument Clinic can’t be that, because the output is right there for everyone to see).

2 Likes

I’ll briefly put my security team hat on and say that the security side of this is being way overblown. The risk of a supply-chain attack via the compiler is miniscule compared to the multitude of other options - adding Rust doesn’t make that part worse.

I’d rather see people discussing things like how Rust provides any protection/benefit at all when we have to interop everything with “unsafe” C code at a level below anywhere PyO3 could help (which is only safe because it relies on our public C API, which is the safety barrier with guaranteed semantics that can be mapped into Rust’s semantics).

35 Likes

Given that clang had to be bootstrapped by gcc, I think it is impossible to say that this will work for sure.

Thanks Steve! I appreciate you piping up on this.

I have a couple of thoughts on this, and hope @davidhewitt has more since he has thought about this problem probably a lot more than I have :grin:

First, we can build safe abstractions over unsafe operations which will reduce the amount of unsafe users need to interact with. Furthermore, I expect to start with, a safe core for extension modules can be implemented then exposed to CPython through unsafe FFI procedures. This is an approach that has seen wide success throughout other projects. One of Rust’s strengths is that it allows you to focus on where unsafety occurs. Finally, if more of the interpreter were to become Rust, these portions would presumably have safe Rust interfaces.

6 Likes

Steve, without more details your -1 has no weight. You can’t just argue from authority.

14 Likes