About .exe wrappers created by frontends when installing wheels on Windows?

steve.dower · January 4, 2025, 5:40pm

In this scenario, because we launch more processes (because people are running under one or more levels of environment-based indirection^[1]).

I’m sure if you set up a Rust launcher that finds and launches another Rust launcher that may have to launch another Rust process in order to handle all the search paths the user expects to have, they’d each get scanned just as much, though hopefully it’s also obvious that the workflow design is unique to Python and our historical design decisions we’re trying to preserve.

It’s also a bit worse for us because attackers love using Python, and so AV scanners will readily identify our binaries and then realise that they need to try harder to see if it’s malicious or just Python. Code signing really helps here. Rust/Go are unlikely to face it, because their binaries are likely to be unique each time - a piece of malware written in Rust doesn’t necessarily look like every other program written in Rust. One based on Python though…

Obligatory mention of PEP 582 goes here. ↩︎

pf_moore · January 4, 2025, 5:49pm

I think you’re missing my point (or maybe I’m missing yours). The issue is that if I run black.exe, that takes 2-5 seconds to start because an AV scans that specific executable. That’s got nothing to do with black.exe running other executables - all that it runs is the Python interpreter (or the venv redirector) which is signed^[1].

But if I run ruff.exe, that doesn’t (as far as I know - at least no-one’s claimed it does) incur that 2-5 second cost. And yet, ruff.exe is also an unsigned executable, built using Rust as it happens, but to the AV no different from black.exe.

Why does the Python-based black.exe need signing to avoid a startup penalty, whereas the Rust-based ruff.exe doesn’t?

I assume… ↩︎

oscarbenjamin · January 4, 2025, 6:30pm

I can’t quite tell if you are saying “I” in the sense of hypothetically being someone else who has this AV problem or if you do mean your (Paul’s) actual computer takes 2-5 seconds to run black but doesn’t when running ruff.

I don’t have Windows available to test and I guess it would need to be the same AV software for a proper comparison. These are timings on a Linux machine though:

$ time ruff --version
ruff 0.8.5

real	0m0.005s
user	0m0.000s
sys	0m0.005s

$ time black --version
black, 24.10.0 (compiled: yes)
Python (CPython) 3.13.1

real	0m0.210s
user	0m0.182s
sys	0m0.028s

This is much less than 2-5 seconds but still there is a consistent 50x startup difference between black and ruff (even though black is apparently “compiled”).

steve.dower · January 4, 2025, 7:19pm

Yeah, we may be talking past each other, or there may just be too many hypotheticals here (I don’t think you’re actually seeing that timing, right? You’ve just inferred it as an example from other messages?)

I don’t have any great explanation for the difference in this scenario. A non-python.org install is likely unsigned (e.g. conda-forge), which will add more time, but the only reason I’d expect black.exe to take longer to scan is if it triggers multiple scans (e.g. scan once because it’s an exe, scan again because it’s a ZIP, scan the ZIP contents because it’s a .py).

And if the scan result is not being cached, then something else is going wrong (the only time I mentioned this was when I’m recompiling the app, which means it can’t cache the result).

Yeah, we’ve never figured out how to strip the stdlib (and importantly, the extension modules) for a specific app properly. So a Python app simply just loads more bytes off disk when you launch it.

If we started again with a focus on generating executables, we’d be far more robust about knowing what’s been referenced, but that ship has sailed so far away that anyone who attempts to follow just sinks. You can usually improve your own startup time by removing unused extension modules or zipping the stdlib.

I have no idea what this means either. Though it probably shows more benefit if you do more than just --version - it wouldn’t surprise me if black --version has loaded everything it needs (because “put all your imports at the top of the file”) while ruff --version hasn’t even loaded anything yet.

pf_moore · January 4, 2025, 9:15pm

I was speaking from the point of view of someone with the performance issue @paugier had described. I don’t have that problem myself, and as far as I’m concerned I have never found the performance of Python applications shipped via pipx or similar to be a problem.

Sorry for being unclear.

I see that as well, on Windows. But black is still fast enough (227 milliseconds on my PC) that it doesn’t matter in practice to me. Steve was quoting 2-5 second startup times caused by AV. Again, I’ve never seen that myself, but what I’m trying to establish is whether that 2-5 second overhead is exclusive to Python programs, or if it’s just a feature of “unsigned executables” in general.

Very probably

Let me ask a non-hypothetical question. You say you see 2-5 second delays when running unsigned executables. On that system, do black and ruff have the same 2-5 second delay, or does black see a significantly longer delay (twice as much - 4-10 seconds - for example)?

Because my point is that unless Python is demonstrably worse than non-Python executables on a system with an AV causing that sort of overhead, I don’t think it’s something we should be worrying about solving. A developer who is concerned about that situation can solve it for Rust or C code (by signing the executable) and they can also solve it for Python (albeit with a bit more effort, by creating an executable and signing it, and distributing that executable).

That’s not true for .exe wrappers (which is what I thought we were talking about here) as they are extremely small executables that just forward to the Python interpreter. Speeding up the startup time of the Python interpreter is a worthy goal, but not something that would explain the different startup times between black and python -m black (the example reported by @paugier here).

The version of black I have isn’t compiled - it’s just a standard entry point wrapper that runs the black.patched_main function, as defined here.

Agreed - but comparing ruff against black isn’t the real question here. I confused the issue by mentioning ruff, as an example of a program built in Rust that wasn’t signed, not as an example of something that should have comparable performance to a Python program.

The question @paugier raised was why black ran slower than python -m black. No-one knows why that is^[1], and the only plausible explanation anyone has come up with is AV programs and unsigned executables.

there’s no difference on my PC, so it seems to be related to the environment, not Python itself ↩︎

paugier · January 4, 2025, 10:15pm

I confirm.

It is not about standard “slowness” of Python startup (on Windows, the interpreter starts in approx 100 ms, which is quite slow but I don’t care about that). I don’t care about python -m black being much slower than ruff.

I also like to stress that python.exe from conda-forge (which are not signed if I understand correctly) are not affected by the startup delays (which seem to be related to antivirus scan). Same startup time than python.exe downloaded from python.org.

The programs that trigger a startup delay are the modified binaries like

black.exe
python.exe in Python virtual envs created from conda-forge Pythons and by UV (actually I’m not sure if these ones are modified since they should just read the pyenv text next to them).

One important point is that some unsigned binaries start without (large) delay (for example python.exe installed by UV or conda, or ruff). So signing is not fully necessary.

Maybe caching by the AV software does not work for modified binaries.

PS: the word “compiled” for black is just about compiling modules but it is still a standard Python entrypoint.

$ ls /home/users/me/.local/pipx/venvs/black/lib/python3.11/site-packages/black
brackets.cpython-311-x86_64-linux-gnu.so             mode.py
brackets.py                                          nodes.cpython-311-x86_64-linux-gnu.so
cache.cpython-311-x86_64-linux-gnu.so                nodes.py
cache.py                                             numerics.cpython-311-x86_64-linux-gnu.so
comments.cpython-311-x86_64-linux-gnu.so             numerics.py
comments.py

oscarbenjamin · January 5, 2025, 12:41am

You might not care about that but it matters when trying to understand what is happening. Since you are the only one with access to a machine that demonstrates the problem everyone else can only work with the information that you provide so how long does it take to run ruff --version?

If python -m black takes 250ms and black.exe takes 750ms then that suggests that using this exe incurs a 500ms overhead. One possibility is that the virus scanner imposes a cost of 500ms per unsigned exe.

However if ruff is much faster than python -m black then running ruff must be something much less than 500ms. Therefore either:

There is not a simple 500ms cost per unsigned exe.
Or, the ruff exe is signed.

An alternative to the simple 500ms cost per unsigned exe could be that having an unsigned exe causes the AV to scan other files that the unsigned exe opens. Then maybe because ruff is literally just a single .exe it only incurs the scan cost once but since black.exe presumably opens hundreds of files it could occur the scan cost many times.

How do you know that any of those are unsigned?

I could believe that they might all be signed by Astral and Continuum especially if signing solves the problems discussed here.

JeanChristopheMorinPerso · January 5, 2025, 1:02am

The launchers used by conda and conda-build are signed (see Provide codesigned stub exe's by Callek · Pull Request #13721 · conda/conda · GitHub) but they are not the same ones provided by distlib. This will only affect packages installed through conda.

ali · January 5, 2025, 9:26am

The (compiled: yes) that shows up in black’s version check indicates if black was compiled with mypyc, since black ships with mypyc-compiled wheels and pure python wheels.

paugier · January 5, 2025, 8:59pm

Right click allows one to see the signature. python.exe installed by conda-forge and uv are not signed. However, they start without delay.

It’s very slow!

Measure-Command { .\AppData\Roaming\uv\tools\ruff\Scripts\ruff.exe --version }
TotalMilliseconds : 782,1066

To be compared with (for example):

Measure-Command { .\AppData\Roaming\uv\python\cpython-3.13.1-windows-x86_64-none\python.exe -c pass }
TotalMilliseconds : 66,8593

(not signed but no delay)

and

 Measure-Command { .\AppData\Local\Programs\Python\Python313\python.exe -c pass }
TotalMilliseconds : 63,7183

(signed)

Measure-Command { .\miniforge3\python.exe -c pass }
TotalMilliseconds : 61,1182

(not signed but no delay)

mikeshardmind · January 6, 2025, 12:58am

I’m not seeing a huge delay here:

PS 01/05/2025 19:54:47> Measure-Command {py --version} | Select "TotalMilliseconds"  

TotalMilliseconds
-----------------
          19.0298


PS 01/05/2025 19:54:58> Measure-Command {ruff --version} | Select "TotalMilliseconds"

TotalMilliseconds
-----------------
          10.8882

Whatever the difference actually is here, it’s not universal.

mikeshardmind · January 6, 2025, 1:05am

Sorry, I should have made sure I was doing the full comparison with the mentioned tools with an observable startup delay, I actually can confirm a large delay with black:

PS 01/05/2025 20:01:45> Measure-Command {black --version} | Select "TotalMilliseconds"

TotalMilliseconds
-----------------
         313.7621

PS 01/05/2025 20:02:30> Measure-Command {py -3.11 -m black --version} | Select "TotalMilliseconds"

TotalMilliseconds
-----------------
         332.2085

PS 01/05/2025 20:04:08> Measure-Command {py -3.11 -c "pass"} | Select "TotalMilliseconds"         

TotalMilliseconds
-----------------
          43.2557

However, this delay appears to be entirely black being slow to start and not an issue with the executable wrapper as the delay is nearly identical when invoking with -m as a module and no such large delay exists on the same python interpreter.

steve.dower · January 6, 2025, 5:11pm

I suspect if you try py -3.11 -X importtime -m black --version then you’ll get a lot more information about why it’s slow.

When optimising apps for startup, I’ve gotten significant improvements by avoiding/shimming certain modules (those that take a lot of time but don’t really need to do anything e.g. enum and typing).

mikeshardmind · January 6, 2025, 6:16pm

Yeah, output of that one is a little noisy, and I can see a few potential culprits^[1], but was primarily showing that I couldn’t replicate a delay that appeared to be unique to the generated exe wrappers, and the only tool I have installed that’s “slow” isn’t because of the generated exe

MystBin has with -X importtime and removing the output of things that are always imported by the interpreter, biggest likely removable culprits: typing, datetime (time module is less expensive and can be used for the same purpose black uses datetime), dataclasses (handwritten classes are faster than the import, tradeoffs exist here), mypy_extensions, colorama ↩︎

steve.dower · January 6, 2025, 8:02pm

By my reading of that output (thanks for including it all), the ones that jump out to me as “huh?” are:

most of -m (runpy) is actually import contextlib
most of import json is actually import re
most of import dataclasses is actually import inspect
most of import tempfile is actually import shutil
most of import shutil is actually importing all the compression algorithms

Just from the names, these all seem like things that we could improve in the stdlib (probably by making them lazy), and significantly reduce startup time for a lot of applications. It would be meaningless if the modules were highly related, but these don’t look like that - e.g. it seems entirely feasible to use dataclasses without also using inspect for some other reason, so it’s probably only being imported for a dataclass.

Apart from just names, I’m also looking at cases where the second column is large but the first column is small. That indicates that the module is taking a lot of time because of what it imports, and not because it itself has a lot going on. Most of black’s own modules and dependencies are pretty evenly distributed, even though they take a lot of time - the ones I listed above are spending drastically more time in transitive imports than in their own code.

To bring things back on topic (suggest someone start a new thread if they’re interested in optimising our own imports), the cost of launching the Python application is definitely going to be higher than a fully native application, regardless of AV.

bonk · January 16, 2025, 5:57pm

Is there any reason it’s an .exe file and not a .bat file with the proper absolute paths baked in:
C:\Work\Project\venv\Scripts\python.exe -m black

This could also be copied around (not sure who actually does that).

pf_moore · January 16, 2025, 6:29pm

Because .bat files don’t work in all the places that .exe files do. See this post for more than you probably ever want to know on the subject…