PEP 774: Removing the LLVM requirement for JIT builds

savannahostrowski · January 27, 2025, 10:27pm

Hey folks,

I wanted to share PEP 774: Removing the LLVM requirement for JIT builds.

The purpose of this PEP is to decide whether continuing to have a build-time dependency on LLVM for JIT builds is acceptable. The proposed path forward for removing this dependency is to host the pre-generated stencil files in python/cpython for non-debug builds for Tier 1 platforms (though many alternative approaches were considered).

Of course, any feedback is welcomed! Please let me know if you have any questions or concerns not addressed in the PEP, and I’ll be happy to incorporate.

brettcannon · January 28, 2025, 12:48am

I’m in support of the PEP, but I was also supportive at the core dev sprints when this approach was proposed. I don’t think the storage cost in Git is anything to worry about and making builds easier is a good thing.

ncoghlan · January 28, 2025, 12:56am

The general idea sounds good, just a couple of questions/suggestions on specifics:

I assume the debug stencils are left out because JIT-enabling a debug build is a relatively unusual thing to do. Will the dev experience be that enabling debug builds turns off the JIT by default, and trying to turn it back on without LLVM installed will give a build error?
GHA can make PRs back to the original branch when it detects necessary changes (see venvstacks/.github/workflows/update-expected-output.yml at 036d4e2629424fcd24e11cdec7d02110e39f6666 · lmstudio-ai/venvstacks · GitHub for one I wrote for work). Was that approach for stencil update automation considered and rejected, or is it still a potential option? (From personal experience: it didn’t take many iterations of patching expected venvstacks test outputs manually to convince me to invest the time in getting GHA to do it for me)

savannahostrowski · January 28, 2025, 1:47am

Yeah, that’s my understanding based on prior discussions on GitHub and at the sprint. I also wanted to be somewhat conservative about what stencils we’re proposing checking in, as folks expressed concern about overall repo growth. Of course, open to having that conversation if folks disagree and feel that debug stencils should also be checked in.

For contributors actively working on the JIT, I expect most would still want to install LLVM locally for faster iteration and testing. Without LLVM, waiting for CI to generate your platform’s stencil would be brutal. The workflow is designed such that if you pass a flag (e.g., --with-debug) that’s incompatible with the pre-generated stencils, the build will fall back to using LLVM to regenerate a new, local stencil file for your platform as a build artifact and use that instead. There’s also logic to prevent that local copy from “dirtying” Git state, since debug builds aren’t what we consider to be the blessed definition for checked-in stencils.

I did consider using a bot when putting together the reference implementation but opted for a simpler approach for this first pass. I found grabbing the aggregated stencil patch from CI and applying it manually to be pretty streamlined and manageable for now. That said, if this proposal is accepted, automation would be my desired grow up here.

Thanks for sharing that example. Does the approach you linked handle patches on forks of the repository effectively? I had some concerns about the security implications of automating this, so I’d want to put deeper thought into it.

hugovk · January 28, 2025, 8:42am

In addition or instead of uploading patches, the new stencils themselves could be uploaded to GitHub Actions. Benefit: easier than applying a patch, just download and replace the files. Disadvantage: bit slower to upload and download 6 x ~1 MB files than small patches.

encukou · January 28, 2025, 10:39am

I’m surprised by the implication that applying patches is cumbersome.
Perhaps the workflow docs/instructions need to include the command to do it? It’s a bit of a lost art in the world of pull requests.

If you have curl, you can use:

curl -L <patch URL> | git am -3

Resolve any conflicts like in a merge; git am --abort if anything goes wrong.

Without curl, paste form a browser:

$ git am -3
warning: reading patches from stdin/tty...
<paste the patch>
^D (Ctrl+D; on Windows it's Ctrl+Z AFAIK)

(for testing: you can append .patch to a GitHub PR URL to get the patches)

pitrou · January 28, 2025, 10:55am

As long as the JIT is marked experimental and does not bring performance benefits large enough, I don’t see the problem. You don’t really lose anything by not enabling the JIT. In other words, this seems premature to me.

While LLVM sounds large in itself, it’s readily available in binary form for a number of platforms. I’ll point out that building a functional CPython requires OpenSSL and perhaps other non-trivial dependencies.

Not only, but the number (and size) of stencils could also increase if macro-instructions are introduced. Or am I misunderstanding how this works?

ncoghlan · January 28, 2025, 2:45pm

I thought that might be it. Note that you don’t need a full bot just to generate PRs, though - it can be done directly from a step in the GitHub Action itself (that essentially goes through the same steps a human would have taken to apply the change and submit it back to the PR branch).

Deciding when to trigger the PR creation can be a bit annoying though (the way I did it for venvstacks means closing & reopening the PR if you want the patch generation job to run again).

kj0 · January 28, 2025, 3:28pm

I agree with Antoine partly here. I don’t think writing a PEP is premature (that is needed to bring up discussion, and thank you Savannah for that!), but I do think sending it to the SC right now is premature. The SC won’t be able to evaluate the cost-benefit tradeoffs with the JIT in its current state.

Not only, but the number (and size) of stencils could also increase if macro-instructions are introduced. Or am I misunderstanding how this works?

Yes, superinstructions will increase stencil size. However, I doubt we will implement it for the sake of keeping our build times low. I did some experimentation and it ballooned our build times for almost no perf gain Superinstructions for Copy & Patch JIT · Issue #647 · faster-cpython/ideas · GitHub .

pitrou · January 28, 2025, 3:46pm

What are the prospects for improving the copy-and-patch JIT performance, if not using more macro instructions?

nas · January 28, 2025, 3:49pm

The number of people who would be using these pre-generated stencils would seem pretty small. You would be building from source code, want the JIT enabled and don’t want to install LLVM. Also, I suggest that the source distribution tar files would include the files and so the set of people excludes anyone building from an official release archive. Seems like a tiny number of people to me.

To me, these stencil files seem analogous to object files. They are generated by a compiler and platform specific. If you suggested that you want to check objects files into the git repo because people building from source don’t want to install the compiler tools, I think most people would consider that not an ideal solution.

Putting them into separate git repo does add some complexity but I think it’s a better solution. It should be no more work for people developing Python. The external git repo can be automatically updated by the CI job or Github action. For people building Python, who want the JIT and don’t have LLVM and are not building from a release archive file, they would have to checkout the external repo. Something like the following run inside the cpython git checkout:

git clone https://github.com/python/cpython-stencils.git

Addressing point-by-point the PEP on why this option was rejected:

While splitting JIT stencils into a separate repository avoids the storage overhead associated with hosting the stencils, it adds complexity to the build process.

True but I would argue that the added complexity is not a lot.

Additional tooling would be required to fetch the stencils and potentially create additional and unnecessary failure points in the workflow.

The additional tooling is to run the git clone command like above. Yes, something extra is required but i don’t think it’s much of a burden.

This separation also makes it harder to ensure consistency between the stencils and the CPython source tree, as updates must be coordinated across the repositories.

The proposal was to create a stencil file/folder based on the SHA hash of the input files (which are part of the cpython git repo). If that file is missing the external repo, you know it’s out of date and need a “git update” or “git clone”. So some coordination between the two repos is required but it’s not very complicated.

kj0 · January 28, 2025, 4:08pm

Slightly off-topic for this thread, but a few options are going in parallel right now:

Top-of-stack caching / register allocation. Worked on by Mark and Brandt
Better codegen for AArch64, by Diego.
Tracing from function entry/resume points by me.
Another optimization pass doing partial evaluation, by me and Mark.
Trace collection by Brandt.
There’s also this really cool interplay that if we require the tail-calling interpreter, the JIT and interpreter will become identical, and entering the JIT is only a single indirect jump, while exiting to the interpreter is also a single indirect jump.

ncoghlan · January 29, 2025, 1:15am

Whoops, missed this question the first time around. The security model I use for that PR job that generates new PRs is just the regular one enforced by GitHub, where PRs need maintainer approval to run actions if the PR is submitted from outside the repo by someone without the relevant permissions.

I should note one side effect of the auto-generated PRs is that the extra branches get created directly in the main repo, which can introduce noise when fetching.

mcepl · January 29, 2025, 10:28am

May I ask a completely naive question: what are those stencils? As openSUSE packager, I would like this proposal very much (we prefer to use GCC as default, having additional C compiler just for a one way of building a package is difficult to swallow), but I would need to know more about those files. Are they something as header files (textual, platform independent, readable) or are they something binary (.o files)? If the latter than I am afraid we have to rebuild them anyway and this proposal wouldn’t help us.

hugovk · January 29, 2025, 11:46am

They’re big .h header files, see Tools/jit/stencils/ in the reference implementation.

pitrou · January 29, 2025, 11:55am

Taking a quick look at this, the fact that these files inline the stencil emission code, instead of just embedding some relocation tables that would be processed by a generic routine, seems weird and may contribute to the increase in repo size.

See for example this snippet:

github.com/savannahostrowski/cpython

Tools/jit/stencils/x86_64-pc-linux-gnu.h

9ceb5224c


      
          // 0: &PyUnicode_Concat+0x0
          // 8: &_PyRuntime+0x0
          // 10: &_PyUnicode_ExactDealloc+0x0
          // 18: CONTINUE
          // 20: ERROR_TARGET
          const unsigned char data_body[40] = {
              0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
              0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
              0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
              0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
              0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
          };
          memcpy(data, data_body, sizeof(data_body));
          patch_64(data + 0x0, (uintptr_t)&PyUnicode_Concat);
          patch_64(data + 0x8, (uintptr_t)&_PyRuntime);
          patch_64(data + 0x10, (uintptr_t)&_PyUnicode_ExactDealloc);
          patch_64(data + 0x18, (uintptr_t)code + sizeof(code_body));
          patch_64(data + 0x20, state->instruction_starts[instruction->error_target]);
          memcpy(code, code_body, sizeof(code_body));
          patch_x86_64_32rx(code + 0x11, (uintptr_t)data + -0x4);

This file has been truncated. show original

The 0-initialized data_body constant arrays also seem a pointless waste of code size.

mcepl · January 29, 2025, 4:06pm

I wouldn’t have a problem with those. Yes, they are big, but Internet connection and disc space is cheap these days.

malemburg · January 29, 2025, 4:28pm

Agreed. They are only a few MB. Compare this to a full LLVM installation

savannahostrowski · January 29, 2025, 4:53pm

Eh, my experience with submodules has never been simple or without headache. It also just seems like an introduction of unnecessary overhead when the stencils are a pure function of the current repo state, especially when sending pull requests to update those files.

nas · January 29, 2025, 5:52pm

They are .h files that contain the bytes from .o files generated by the compiler. I would imagine as an openSUSE packager, you will need to regenerate them.