Extending subinterpreters with sandboxing capabilities

TobiasHT · February 8, 2024, 9:09am

I’ve been investigating various ways to easily implement sandbox environments in python to run untrusted code but all in vain. It’s so easy for the untrusted code to escape the sandbox environment. I hit the very snag that @vstinner did with pysandbox.

However it occurred to me that subinterpreters are quite independent runtimes of their own, and we could probably extend them to contain sandboxing capabilities. This would mean to limit their access to certain resources and functionality. With the configuration struct, I think it could be possible to include more options tailored to creating sandboxes.

This is just a thought I’ve had for the passed 2 days and haven’t carried out thorough study of it, but I’m somehow convinced it can be done. I just need some opinions about this perhaps.

encukou · February 8, 2024, 9:38am

Don’t go there, you’ll be burned.

Sorry I don’t have much more time to elaborate. It all depends on what kind of sandboxing you want and what kind of capabilities you want to allow; I suggest defining that first.
Here’s one example of what to think about:

for n in range(200):
    hog = list(range(2**n))

I suggest looking into Wasm or container-based tools instead.

pf_moore · February 8, 2024, 9:52am

Also, look at the history of the bastion stdlib module, and why that got removed.

Rosuav · February 8, 2024, 11:32am

Can confirm. Have been there; was burned.

brettcannon · February 8, 2024, 8:26pm

Ditto on the burning; importlib exists because of my attempt.

If you want Python in a sandbox you’re probably best using the WASI build of CPython.

vstinner · February 9, 2024, 3:09pm

Don’t do that:

Just don’t do that. Thanks

vstinner · February 9, 2024, 3:23pm

Run the Python process in a sandbox, don’t run a sandbox in Python…

mwr · October 19, 2024, 5:23pm

Would you take the time to discuss this arguments in more detail anyway?
After all, a lot of time has passed since then.
It’s from those positions of yours that they don’t seem convincing.

mwr · October 19, 2024, 5:25pm

Do you mean that there will be problems with memory overflow?
But can also introduce restrictive mechanisms for memory management.
Yes, of course, it’s now easier to create virtual and container environments with independent memory management.
But if we talk not about what the current state of affairs is, but what it might be.
It’s not at all obvious that the built-in subsystem with memory limitations are fundamentally impossible.

mwr · October 19, 2024, 5:31pm

However it has since appeared MappingProxy.
The concept of proxy objects has been successfully implemented in other languages.

jcampbell05 · October 19, 2024, 8:41pm

I guess this sounds more like a sub process like API to handle setting things like chroot would be a better solution (or sub interpreters that can be spun up in sub process with chroot like the process pools)

vstinner · October 19, 2024, 9:14pm

Did you read my long email describing the history of pysandbox? Which part should I elaborate?

It’s ok if you are not convinced. Go ahead, write your own sandbox and ask other people to break it to test its security. You will see yourself.

mwr · October 19, 2024, 9:22pm

I was asked about him and I wrote my opinion

I’m just trying not to do anything in vain
Therefore, I really want to understand the fundamentally impossible moments from someone else’s experience.

mwr · October 19, 2024, 9:30pm

Chroot requires root or configured capabilities.
As it’s, it’s quite a good solution for file isolation.

ncoghlan · October 20, 2024, 1:34am

The full power of the underlying Python runtime is still present in each subinterpreter, so they’re just as hard to sandbox as the main interpreter.

However, because they’re created by a main interpreter, there actually are potential sandboxing avenues that weren’t previously viable: it may be possible to cripple the runtime itself in a subinterpreter, meaning that gaining access to that via introspection wouldn’t let you break out automatically the way it does in the current subinterpreter implementation.

The path to take to enable that kind of thing would be to figure out how to embed a less capable (and less crashable) Python runtime in CPython, and then use a subinterpreter style API to communicate with it. This sort of approach is used widely with Lua & JS embedding, and I suspect (but don’t know) that it may be feasible to repurpose MicroPython this way. Even if that’s not possible, the WASI build (as Brett suggested), or a new MicroPython inspired runtime implementation may prove fruitful.

But subinterpreters backed directly by the CPython runtime itself? Attempting to internally sandbox those won’t work any better than attempting to internally sandbox the main interpreter does.

The short version: sandboxing Python (the language) is feasible, internally sandboxing CPython (the implementation) is not (due to the combination of the way CPython is implemented with Python’s powerful runtime introspection capabilities).

There are many examples of external sandboxing of CPython, so that is the default recommended path, backed by the security assurances of the operating system provided sandboxing mechanisms.

mwr · October 20, 2024, 11:50am

Yes, JS also implements the concept of full-fledged proxy objects, and getting the original real object from them is probably an impossible task.

And why don’t you consider the possibility of further development of event audit ideas?

This can be a plus, because sometimes don’t even need to go deep inside to implement the most difficult things.
Doubts about something else. To what extent is python not full of internal bugs?
I strongly dislike examples of the form

github.com/python/cpython

There is a way to access an underlying mapping in MappingProxyType

opened 07:34AM - 14 Apr 21 UTC

closed 07:55AM - 07 Aug 21 UTC

serhiy-storchaka

type-bug interpreter-core 3.10 3.9 3.8 (EOL)

BPO | [43838](https://bugs.python.org/issue43838) --- | :--- Nosy | @gvanrossum,… @rhettinger, @ncoghlan, @serhiy-storchaka, @brandtbucher, @domdfcoding PRs | <li>python/cpython#27300</li> <sup>*Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.*</sup> <details><summary>Show more details</summary><p> GitHub fields: ```python assignee = None closed_at = <Date 2021-08-07.07:55:34.904> created_at = <Date 2021-04-14.07:34:28.852> labels = ['interpreter-core', 'type-bug', '3.8', '3.9', '3.10'] title = 'There is a way to access an underlying mapping in MappingProxyType' updated_at = <Date 2021-08-07.08:52:38.535> user = 'https://github.com/serhiy-storchaka' ``` bugs.python.org fields: ```python activity = <Date 2021-08-07.08:52:38.535> actor = 'ncoghlan' assignee = 'none' closed = True closed_date = <Date 2021-08-07.07:55:34.904> closer = 'gvanrossum' components = ['Interpreter Core'] creation = <Date 2021-04-14.07:34:28.852> creator = 'serhiy.storchaka' dependencies = [] files = [] hgrepos = [] issue_num = 43838 keywords = ['patch'] message_count = 13.0 messages = ['391039', '397244', '397245', '398023', '398025', '398033', '398045', '398072', '398700', '399022', '399057', '399059', '399171'] nosy_count = 6.0 nosy_names = ['gvanrossum', 'rhettinger', 'ncoghlan', 'serhiy.storchaka', 'brandtbucher', 'domdfcoding'] pr_nums = ['27300'] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue43838' versions = ['Python 3.8', 'Python 3.9', 'Python 3.10'] ``` </p></details>

These are obvious implementation errors, and for some reason some pretend that these are features, not errors.
I see such approaches. I meet when there are big problems with the internal architecture and attempts to fix them will lead to even bigger problems.
Another thing is that the power of python is that can completely rewrite the concepts of modules and classes at the very top level. And thereby break ties with most possible internal problems.

Of course, but it’s about something else. About what the initiators of the event audit implementation said. To implement another protection layer at the language subsystem itself. Yes, with certain restrictions (for example, regarding the С code), but to implement (and once again, just in case, I repeat, we are not talking about any implementation requirements, we are talking exclusively about discussing the very possibility of implementation).
And the question is to what extent such an implementation is really impossible.
Because everyone says it’s impossible but they don’t give specific examples of impossibility. Because the examples that are given just talk about possible ways to work around problems.
And I just want to see a really unsolvable problem.
For example, in a long-standing article by Victor, I did not see such unsolvable problems, and my point of view on the arguments in that article is presented above.
Moreover, the very appearance of an event audit indicates the conceptual obsolescence of the article. Because the correct view point has shifted from protecting objects to monitoring and protecting against specific potentially dangerous actions. Although, of course, the relation still remains in the point of the need to protect some implementations in standard modules and classes from modifications. And it definitely seems to me that the implementation of real proxy objects in python is quite possible.

TobiasHT · October 20, 2024, 12:25pm

I think this discussion is shifting from the point of the topic that I created. The head of the topic is to discuss if subinterpreters can be extended with sandboxing capabilities, and I request we remain on that track if we’re to engage on this thread.

mwr · October 20, 2024, 12:33pm

Yes, Victor’s words and article expanded the context of the discussion. I think it’s not a problem at all.

pf_moore · October 20, 2024, 12:34pm

A lot of design decisions in Python are based on the “consenting adults” principle - the language doesn’t try to protect people from the consequences of their actions, but rather assumes reasonable behaviour on the part of the developer.

Nobody’s pretending that the issue you linked is a feature. It’s just behaviour that nobody should care about. To quote Guido from that issue:

the reason we proxy in the first place is not absolute safety but to make sure people don’t accidentally update the dict when they intend to update the class

There was never any intent that a MappingProxy protects against deliberate efforts to bypass it.

You seem to be assuming that Python is intended to prevent a malicious developer, with the ability to run arbitrary code against the interpreter, from getting access to whatever they want. It’s not - Python is built on the “consenting adults” principle and assumes that the developer can be trusted. That’s in stark contrast to languages like Javascript, which were designed from the start for use in a hostile environment where the code the interpreter is running can’t be trusted.

Making Python secure in the sense you seem to be interested in may be possible, but it would involve a massive rewrite, of both the language and the implementation, and it’s likely that the resulting language wouldn’t even be recognisably Python any more.

mwr · October 20, 2024, 12:43pm

It looks like you are the first core developer who has voiced this
But in a previous topic I gave a quick counterexample to this problem

i.e. python is so good that it allows to solve problems of a deeply low level at levels much higher.
And it seems to me that there is quite a chance to try to solve most of the problems just without a large-scale rewriting of the language architecture.
But there are still a few issues that I would like to hear from you as core developers.
And the main one is the issue with frames. Is it possible to have access to the point of their creation from python itself? Are the possible calls for accessing them limited to those events that are already in the event audit, or are there any others?