capability to compile Python code with dependencies into a standalone binary executable without bundling the interpreter

TimLai666 · July 26, 2024, 2:15pm

Feature Request

Background

Currently, tools like PyInstaller are used to package Python applications into standalone executables. However, these tools bundle the Python interpreter along with the application code, which results in larger file sizes and some inefficiencies. Additionally, existing methods often encounter unexpected errors or issues, especially when executed in different environments (e.g., various operating systems).

Proposal

Add a feature to CPython that allows direct compilation of Python code, along with its dependencies, into a standalone binary executable. This would eliminate the need to bundle the interpreter separately.

Implementation Suggestion

Consider using Rust for implementation due to its performance and safety advantages. Rust’s robust tooling and community support can aid in the development process.

Benefits

Reduced file sizes for compiled Python applications.
Improved execution performance.
Enhanced security and efficiency.
Simplified deployment of Python applications.
Reduced likelihood of encountering unexpected errors or issues across different environments.
Easier for users without a background in Python or programming to run Python programs.
Improved performance of the compiled executables.
Integrated IDE support for quick packaging and deployment of Python applications.

Example Use Cases

Distributing Python applications to environments where Python is not pre-installed.
Creating lightweight, efficient executables for Python scripts.
Making Python programs more accessible to non-technical users.
Enhancing the overall performance and efficiency of Python applications.
Enabling IDEs to offer streamlined packaging and deployment features for Python applications.

Promotion of Python

This feature would aid in the promotion of Python by making it easier for a wider range of users to run Python applications without needing to install a Python interpreter. Additionally, the improved performance and reliability of compiled executables would enhance Python’s appeal for developing and distributing software.

davidism · July 26, 2024, 2:17pm

I’ve moved this to the Help category because many tools that allow some form of compilation already exist, and someone may be able to help you find the one that covers your need. Also, past Idea posts along this line have not been successful.

ncoghlan · July 26, 2024, 2:44pm

For a Python-like language with ahead-of-time compilation support, you may find Mojo of interest: Mojo (programming language) - Wikipedia

For Python itself, the semantic definition of the language makes it very difficult to comprehensively support without having access to a full Python interpreter at runtime.

PyInstaller/PyOxidizer/etc are embedding a Python runtime in the binaries they create because they need it to make those binaries work correctly. Even Nuitka, which converts as much Python code as it reasonably can to compiled C code rather than leaving it all to runtime interpretation, still finds it necessary to embed a full CPython runtime in standalone binaries in order to handle language features that don’t translate nicely to precompiled C code.

TimLai666 · July 26, 2024, 3:42pm

Thank you for your response.

I understand that tools like PyInstaller and PyOxidizer include the whole Python interpreter to make sure everything works. But, could we try recording the execution of the Python code and turning that into a standalone executable? This way, we wouldn’t need to include the entire interpreter.

jamestwebber · July 26, 2024, 3:46pm

Only if the code runs entirely deterministically, with no inputs or interaction. In which case, you could just store the result?

TimLai666 · July 26, 2024, 3:54pm

I think there’s a misunderstanding. My idea is to record the machine code generated by the interpreter when the Python code runs, and then use that to create a standalone executable. This way, we could avoid bundling the entire interpreter, making the executable smaller and potentially faster. And it would allow the executable to run on machines without a Python environment.

jamestwebber · July 26, 2024, 4:15pm

Yes, there is a misunderstanding: this just isn’t how Python works. There’s no machine code to record. The interpreter is needed to execute the bytecode, which is an intermediate representation.

In some cases the byte code can be compiled into something else, but this won’t support all the features of the full language so it isn’t a general solution. Examples of this have been mentioned above.

TimLai666 · July 26, 2024, 4:26pm

I was wondering if the Python interpreter generates machine code dynamically when executing bytecode. If that’s the case, would it be possible to add a feature to the interpreter to record this machine code to create standalone executables?
Is this a feasible approach?

da-woods · July 26, 2024, 6:47pm

The experimental JIT compiler does this to an extent. But you’d still need to know that you’ve covered absolutely every possible code path in the program.

And even then, a lot of what the interpreter does is just call the C functions that implement bits of Python functionality. For example, looking up an item in a dict is never going to end up as “pure machine code” - it’s going to end up as a call to PyDict_GetItem (or maybe something a little more specialized).

It’s this collection of useful C functions that describe how to interact with Python objects that makes up most of the “interpreter”. You can’t lose them or nothing works.

da-woods · July 26, 2024, 7:01pm

I expect the best you could realistically do is: say that eval and exec don’t exist, and drop the parser and ast->bytecode bits of the Python interpreter.

I suspect that isn’t actually a huge size saving. It also breaks some popular standard library modules (e.g. dataclasses)

ncoghlan · July 26, 2024, 11:48pm

I think the closest approach to the idea you’re proposing would be Nuitka: it translates as much Python code as it can to static C code, and compiles that into a binary executable.

When it hits the limits of that approach (which it almost inevitably will when compiling non-trivial programs), it falls back to running code against the embedded Python runtime.

I don’t know how effective link time optimisation is at dropping parts of Nuitka’s embedded interpreter that a given application doesn’t use, but it is going to offer the best chance of producing smaller binaries than the tools that always embed a complete CPython runtime.

TimLai666 · July 27, 2024, 4:36am

I have tried using Nuitka. However, I’ve found that Nuitka often struggles to compile Python code successfully, and even when it does, the resulting executable sometimes doesn’t behave the same as running the code directly with the interpreter.
This is why I hope there could be an official implementation in CPython to address these issues. Having an official solution would likely be more reliable and consistent.

ncoghlan · July 27, 2024, 5:46am

Nuitka’s struggles with doing this correctly aren’t related to it being a third party project, they’re related to Python as a language fundamentally not being designed to support ahead-of-time compilation to machine code.

I do suggest checking out Mojo. While based on Python, its semantics are sufficiently different as to be able to natively support the creation of standalone binaries: Get started with Mojo🔥 | Modular Docs