Haha yeah apologies for the confusion. I will update it to match the PEP soon. Meanwhile let’s focus on the document text ![]()
The PEP has a whole section dedicated to security
Indeed it does but, as @steve.dower points out, it is solely focused on a single line of defense: preventing compromise in the first place.
I’m not a security expert by any means, but there should be more than one line of defense here.
From what you’ve said in this thread, maybe there are, but that isn’t clear in the PEP.
I think the PEP should address how we prevent:
- compromise in the first place (you already cover this)
- this PEP making a compromised situation worse
- this PEP allowing attackers to avoid detection
(I’m still very much in favor of the PEP, just a bit concerned about the security impact)
Makes sense! I want to explore this better. We will update the PEP with the details and I have asked @steve.dower for a call to discuss this a bit better in a real time environment
Would you mind hosting an online meeting when you discuss with @steve.dower. I would like to be an observer and listen to some details you discuss. Sorry for the offensive request
Sorry, unfortunately it already happened today. I will update the PEP with the result of our discussion
No problem! This is just personal interest
Is it a 4k buffer or one page of memory?
I am guessing a page as you refer to mprotect that works on page sized ranges.
If it is page size then with API spec the is 4k even if the page size is bigger?
I am not sure if system use pages sizes less the 4k, in which case you would allocate enough pages to get to the 4k?
The proposal includes a 4k buffer. All the talk about mprotect was just part of a specific idea that started as part of the discussion but is not part of the main proposal.
Those are in effect the default maximum stack sizes. Most of the stack just takes up virtual address space initially until its accessed. Spawning an OS thread doesn’t take 8 MiB on Linux. On my machine, it takes about 4 KiB (accounted for in the process’s RSS) and maybe another ~30 KiB in kernel memory for the page table entries and thread data structures (I had a harder time trying measuring that precisely).
Python threads take up a bit more memory. In the free threading build, the _PyThreadStateImpl struct is 3.6 KiB. Overall I’m seeing about 24.4 KiB extra memory (in process) used per thread when spawning 10,000 threads in the free threading build.
4 KiB per-thread seems significant to me – I’d rather see a smaller global buffer.
If people feel like a global buffer is best we could separate the bit that triggers execution from the buffer and place the buffer in the runtime and the bit in the thread state.
That way we could still control what thread executes but there is a single buffer of execution. Not a fan of this but if there is consensus around it happy to do it
I think a per-thread buffer would be better. Avoiding races on a global buffer will be tricky.
Regarding memory use, the argument that stack memory use doesn’t count because it is virtual memory, but an unused buffer memory does count doesn’t make sense to me.
There is no reason why the OS needs to allocate physical memory for an unused buffer.
If you malloc the 4k you will find that the first page that the buffer is in will be allocated to the process. The 2nd page will not be force unless a later malloc pull it in.
Edit: a 4k malloc includes overhead for malloc to track the allocation which is why I say its 2 pages (assuming OS pages are 4k).
The pages of the stack are typically only allocated as the process needs them up to the maximum configured.
Not from the point of view of a native debugger, which is what you are if you’re using these buffers. You have complete freedom to freeze and resume each thread. (Though you can’t attach one-by-one with Python because of all the locks.)
More importantly, you can’t deterministically break into/connect to threads anyway, so calling into them one by one is essentially the same anyway.
Besides, the most likely scenario here is that you’ll run the same code in each thread. Provided it’s able to identify itself somehow (getident()), most likely they’re all just going to open some form of IPC to the debugger. That only requires unique code per thread if you designed it that way - there are plenty of designs that don’t.
Small update: we have updated the PEP text to account for some of the points mentioned so far:
- We have added a section explaining the deactivation mechanism based on a compile-time flag, a runtime environment variable and a -X option.
- We have improved the specification and security sections.
- We have added explicit mentions to the feature being gated via the interpreter’s audit API.
- We have switched back to write a file to be executed in the buffer instead of the code itself. We have made this decision to address the in-process security scenarios, ensuring that an attacker with arbitrary memory write capabilities cannot escalate to arbitrary code execution . All of this is covered in the " Security Implications" section.
- Typos and other minor fixes.
LGTM for the update part! Thanks for the greater work!
We have made some small updates fixing some inconsistent names, typos and added some clarification to what the file must contain (python code, not pyc files), simplified the interface for the remote_exec function and added some other general clarifications.
Ths Python Steering Council has reviewed PEP 768 - Safe external debugger interface and decided to Accept it! ![]()
Thanks everybody for the writeup and discussion and iteration on the proposal based on feedback.
Just noticed this PEP asks for defining names without a Py/_Py prefix: _remote_debugger_support and MAX_SCRIPT_PATH_SIZE:
typedef struct _remote_debugger_support {
int debugger_pending_call;
char debugger_script_path[MAX_SCRIPT_PATH_SIZE];
} _PyRemoteDebuggerSupport;
Could the struct tag be removed and a Py_ added before MAX? (See PR #135924)
I an happy with that ![]()