Performance difference on fork of json module

I’m only seeing a big difference for the decoder, for which only these private functions were swapped:

#define _Py_EnterRecursiveCall Py_EnterRecursiveCall
#define _Py_LeaveRecursiveCall Py_LeaveRecursiveCall

The thing is they’re called once per 65,536 booleans in the benchmark.
I could try without them, but I kind of doubt we’ll see a big improvement:

decode json jsonc unit (μs)
List of 65,536 booleans 1.00 1.45 1147.94
List of 4,096 strings 1.00 0.61 1686.16

Some times that “noise” is where the clue to a problem is.
The two runs would show up that different Python APIs are being used.
But it looks like you found a difference that is important.

Another choice is py-spy, which can profile Python code and C extension code, and works on macOS just with a simple pip install command : GitHub - benfred/py-spy: Sampling profiler for Python programs

You can try GitHub - P403n1x87/echion: Near-zero-overhead, in-process CPython frame stack sampler with async support on macos