Built-in security subsystem

TIGirardi · October 18, 2024, 1:33pm

From my point of view, you are not describing anything specific. You are just posting an abstract wish list and stating that it should be doable without providing any specifics on how.

One core dev said that it was already tried, unsuccessfully, and that the core devs stopped trying. He also said that to change their minds you would need to provide an implementation.

Another person linked to a message from a core dev stating the very specific reasons he now believes it to be impossible. In that message, he says that the best way to secure Python is to run it in an external sandbox.

It is my understanding that using an external sandbox would fulfill your wish list, with the added benefit of not requiring work the core dev team. Why should they work on a solved problem?

mwr · October 18, 2024, 1:34pm

Similar disputes have already been in the previous topic. Well, if you think so then it’s your right. I’m interested in discussing something else entirely.
In any case, it is surprising that the event monitoring subsystem appeared at all. And now it’s clear why its immediate developers don’t want to discuss it. After all, in the opinion of many, it should not have been at all.

So I’m in favor of it myself. And so far, I’m giving specific counterarguments to all specific arguments.
But they tell me, no, it’s not enough, give us a ready-made implementation, and then maybe we’ll discuss it. Maybe

TIGirardi · October 18, 2024, 1:39pm

Why, and more important how, do you think the monitoring subsystem would help you? Be specific and I’m sure people are going to be more specific in their replies.

TIGirardi · October 18, 2024, 1:41pm

Not in this thread and not in the old one you were specific enough to warrant any specific response.

mwr · October 18, 2024, 1:43pm

This is the ideas section. Where did I say that something should be implemented?

No, it just said that past attempts were just unsuccessful. Regardless of their details. Namely, the details are important in the context of my specific ideas. Yes, they don’t have an implementation yet, but so go through the other topics in the subsection. People also discuss unrealized things.

But there are also my specific counterarguments on this list from ten years ago. I suggest we discuss them anyway

I didn’t say anything like that. You came up with this statement yourself.
Moreover, in the original message, I also made a footnote about this, anticipating such attempts to attribute it to me.

abessman · October 18, 2024, 1:46pm

Not at all; it was made because someone put in the work to make it.

No one has said this. It is entirely your own, incorrect, conclusion.

mwr · October 18, 2024, 1:59pm

Okay, you’re blurting out the subject. And then they will say again that there is no specifics and the topic should be closed.
There are specific arguments above and my specific counterarguments with examples. If you really want to discuss, you will discuss them.

And there were even quite a few preliminary discussions about this
But yes, Guido was also involved in those discussions.
Unfortunately, those days are over.

So be it. There is a discussion above, where everyone can see who said what.
And I’ve already talked enough in general terms about nothing. There are discussion threads above where specific things and issues are discussed.
Anyone who really wants to discuss the issue will discuss it there, and not pour it from empty to empty.

devdanzin · October 18, 2024, 3:03pm

As someone who has followed many such “securing Python” initiatives (and even found exploitable issues in a couple), I’d like to ask you to ponder whether you find this idea completely feasible out of an abundance or a lack of information.

If the former, please share more details about how it could be made to work, because the community and core devs do not believe it to be sound so far. If the latter, please read not only about other such proposals (and especially how they fail), but also about how “creating readonly anything” in Python has failed in many ways.

Regarding specifics and immutable objects, please read There is a way to access an underlying mapping in MappingProxyType · Issue #88004 · python/cpython · GitHub. Is it enough for you to believe not even MappingProxy is safe from Python’s mutability shenanigans?

Regarding the audit hooks as a security feature, I’ll quote the docs:

Note that audit hooks are primarily for collecting information about internal or otherwise unobservable actions, whether by Python or libraries written in Python. They are not suitable for implementing a “sandbox”. In particular, malicious code can trivially disable or bypass hooks added using this function. At a minimum, any security-sensitive hooks must be added using the C API PySys_AddAuditHook() before initialising the runtime, and any modules allowing arbitrary memory modification (such as ctypes) should be completely removed or closely monitored.

Also, some expected audit hooks are missing, finding similar issues would offer exploit opportunities to breaking your proposed security system:

github.com/python/cpython

Missing audit hooks in several extension modules

opened 08:21AM - 12 Feb 24 UTC

RobinJadoul

type-bug

# Bug report ### Bug description: Several extension modules don't fully emit t…he relevant audit events, leading to file read or process spawning without any traceability. In particular: - Calling a `_ctypes.CFuncPtr` does not emit `ctypes.call_function`. When combined with some known addresses, this can result in arbitrary functions in libc or python getting called. Such addresses could come from `id`, `ctypes.pythonapi._handle`, passing a `byref` pointer to `ctypes.cast`, or probably still several other methods. Coincidentally, the `ctypes.cast` method would by audited by the same `ctypes.call_function` once it is present. The downside is that it may also result in multiple audit hooks for functions like `ctypes.string_at` that have their own specialized audit hook event too. - Related, and maybe debatable, but constructing a `_ctypes.CFuncPtr` might fall under the audit event `ctypes.cdata`, as it is in spirit (though not in implementation) similar to calling a `.from_address`. An option might be to introduce `ctypes.cdata/function` similar to `ctypes.cdata/buffer` for this. - The `readline` module can open and read a file through `readline.read_history_file` without having an `open` audit hook. Together with `readline.get_history_item`, this can lead to unaudited file reads. A similar situation exists for some other functions in this library. - The `_posixsubprocess.fork_exec` function, and its only user in the standard library, `multiprocessing.util.spawnv_passfds` perform a fork + exec without any audit hooks. One would expect either `os.fork` and `os.exec` or the functionally similar `os.posix_spawn` here. I'm happy to make a quick PR for these and adjust any specific event types to be more consistent or more uniquely identifiable. Quick example in code: ```py import sys, ctypes, multiprocessing.util, readline collected = [] def collector(event, *_args): collected.append(event) sys.addaudithook(collector) def test(fn, *args, **kw): collected.clear() fn(*args, **kw) if collected: # Just check it's nonempty print("Success") else: print("Fail") test(ctypes.memmove, 0, 0, 0) test(ctypes.CFUNCTYPE(ctypes.py_object), ctypes._memmove_addr) test(readline.read_history_file, __file__) test(multiprocessing.util.spawnv_passfds, b"/bin/id", [], []) ``` ### CPython versions tested on: 3.11, 3.12, 3.13, CPython main branch ### Operating systems tested on: Linux ### Linked PRs * gh-115624

So you’d be talking about plugging holes and disabling insecure features… which has been shown not to work in general, using a feature that is explicitly listed as not suitable for security.

Even currently maintained projects like RestrictedPython, which removes all sorts of dangerous constructs and features (leaving a very limited Python that isn’t that useful), face their share of security escapes.

I hope this helps.

mwr · October 18, 2024, 3:45pm

This is exactly the kind of discussion I came here for. Thanks.

Interestingly, this is probably still a bug in the python implementation. But let’s quickly improve the implementation a little. And this is only a superficial improvement. Can come up with much better implementations.

from types import MappingProxyType

orig = {1: 2}
mp = MappingProxyType(orig)

class MP:
	def __getitem__(self, name):
		return mp[name]
		
	def __setitem__(self, name, value):
		raise AttributeError()

proxy = MP()

class X:
	def __eq__(self, other):
		other[1] = 3
		
assert proxy[1] == 2
proxy == X() # AttributeError

Yes, there is a separate item specifically about C code execution in the original message.

Again, it seems to be about executing third-party C code. Its execution will have to be prohibited if security is needed only specifically for third-party python modules. We are talking about security only at the level of the python interpreter, which means that C code execution will have to be prohibited.

But can we still have an example where the bypass occurs only through python code?
It’s another matter if we come to the conclusion that python is thoroughly saturated with bugs and vulnerabilities. Is there really a lack of specific examples of problems here, so as not to give away how bad everything is with the internal implementation?
But so far I have seen only one example of a bug specifically when executing python code. And it turned out to be quite fixable.

Are you familiar with its insides?

And I hope that we will have a lot more discussions in the same style
By the way, here’s another thought. In fact, I’m acting as a python defender here. And everyone else seems to want to prove how bad python is and is teeming with problems.
Although it would seem that it should have been exactly the opposite.

TIGirardi · October 18, 2024, 4:34pm

This was specifically about the stdlib, not third party.

zware · October 18, 2024, 4:42pm

Not so; everyone is trying to point out to you that Python is teeming with problems if you want to implement a sandbox. If you’re not trying to create a sandbox, everything is fine and useful, and it’s the useful things that cause problems for sandboxing attempts.

Maybe one could eventually create a sandbox within Python, but it’s unlikely that it would still be usable as Python. I’m afraid the only way to prove this wrong is to try it, but I personally believe it would be a waste of your time.

devdanzin · October 18, 2024, 4:46pm

I’m sure you can. Many have. The issue is, all of them gave up after studying the problem space a bit. I’d like to offer a bit of code with examples of trivial ways to bypass proxies that worked for other systems as an example:

class X:
    def __eq__(self, other: MP):
        other.__class__.__setitem__.__globals__["orig"][1] = 3
        try:
            other.__setitem__("a", "b")
        except AttributeError as e:
            e.__traceback__.tb_frame.f_builtins["__import__"]("os").system("echo Python finds a way")
            other.__class__.__setitem__ = lambda self, a, b: print("A way")
            other[1] = "not important"
        try:
            other[2]
        except KeyError as e:
            mp = e.__traceback__.tb_frame.f_back.f_locals["mp"]
            import gc
            refs = gc.get_referents(mp)
            refs[0][1] = 5

assert proxy[1] == 2
proxy == X()
print(proxy[1])
# assert proxy[1] == 2

mwr · October 18, 2024, 6:07pm

Yes, and there is a separate item about frames in the original message.
I’m sure that it’s possible to find an implementation where the frames will also be immutable.
But this requires the help of developers who probably understand the possible variants for such an implementation. Perhaps such variants have even already been implemented in someone’s projects.

Rosuav · October 18, 2024, 6:09pm

Why do you KEEP ON assuming that someone else will do all the work for you? Get out there and actually write it if you think it’s possible. Otherwise, stop repeatedly posting that you think it ought to be possible, and expecting other people to do the work of implementing it.

mwr · October 18, 2024, 6:10pm

That’s do you think it’s normal when in the example above (There is a way to access an underlying mapping in MappingProxyType · Issue #88004 · python/cpython · GitHub) we can take and change the value inside MappingProxy?
This is clearly an internal implementation bug.

I also have doubts about the possibility. Otherwise, I wouldn’t be asking questions.
But every time there are examples of problems at the python level itself, and so far I am constantly finding possible solutions to them.
And here there are two options: either the problems are still solvable or the other real problems are carefully hidden

mwr · October 18, 2024, 6:13pm

Why is there always an equal sign between the discussion of possible implementation variants and the implementation itself?
Has no implementation inside python ever been accompanied by lengthy discussions?
And if at least some of it was accompanied, stop accusing me of what I’m not offering.
If you don’t want to discuss it, don’t discuss it. But stop bullying just about ideas and their discussions.
Half of the comments in this topic can also be deleted because they relate to what I did not suggest or relate to some third-party things. And if I reacted to every such meaningless remark, and this topic could already be closed.
But now there are people who understand the problems in essence and discuss them in essence. And I hope these are not the last such experts here.

mikeshardmind · October 18, 2024, 6:31pm

This just fundamentally doesn’t make sense to exist at the interpreter level for a general-purpose programming and scripting language. One person’s intentional modification is another’s vulnerability. I can use python to rewrite firewall rules. That’s not inherently malicious, and there is no way to detect whether the intent is malicious. I could remove firewall configuration entirely. That might be fully intended as part of deployment via ansible to switch to new tools or new versions of the same tools that handle their configuration differently.

You can already sandbox python from outside of python rather successfully with tools actually built for this, from anything to a full sandbox, to capability restrictions placed on the process/service, to process namespacing and resource limitations, and blends of those.

You can also ensure python isn’t available for use by users that shouldn’t be writing or executing arbitrary code, but restricting what users can do is still better handled at another layer here. If a user launches a python process and doesn’t have permission to modify the previously mentioned firewall rules, python can’t for them either.

You’re getting a lot of detraction here because you are continuously suggesting what people perceive as the wrong tool for the job, without any demonstration of how decades of diverse expert opinions on the matter have all reached the same conclusion is somehow wrong.

mwr · October 18, 2024, 6:46pm

There is no point in implementing event auditing at the interpreter level for a general-purpose programming and scripting language. Because it is exactly the same way to analyze the work of the code through third-party independent tools.
But it exists contrary to the opinion of many. And its implementation can no longer be removed and it cannot be said that this is impossible. There were just people who believed, discussed, and then implemented.
And in addition it also appeared

Which seems to be a very promising tool for some of the checks described above.

Therefore, such a system must be configurable and deactivable. How event auditing is implemented.

I don’t mind criticism at all. But it’s one thing when criticism is accompanied by concrete examples, and I give specific counterexamples.
And it’s another thing when criticism is just for the sake of criticism. Who exactly is this newcomer and what ideas does he even allow himself

Above is an article by Victor Stinner. I took it apart. And has anyone commented on my analysis? Rather, they commented: now take and implement it, and then come discuss it. I’m not offended, I understand that there are those here who are ready to just argue for the sake of arguing. Therefore, I will patiently wait for other comments with python specifics and discuss them.

Nineteendo · October 18, 2024, 6:58pm

Should slow mode be activated? The OP has made 17/38 (45%) posts in this topic.

mikeshardmind · October 18, 2024, 7:00pm

I’m only going to add a few things and then bow out, I don’t see this going anywhere productive, but I do hope that you can come to appreciate why.

This is one of the first things you’ve said in this thread I’ve agreed with, but not in the way you probably think. I don’t think the sys audit functions should exist because they exist at a level where they can’t effectively audit. By being in the interpreter, they are subject to modification by the interpreter itself. Even if these events are useful they should not be presented as preventative security tools. We’re stuck with them as is for backwards compatibility reasons, and it seems that people are confused by what they are actually capable of.

How is this better than using existing tools that are not subject to issues of existing at the same process scope as the things they are intended to restrict? You’ll find that the ability to attach a debugger means that one python process can disable another python process’s protections if these are controlled by the same code meant to also run untrusted code.

No, you did not take it apart, and many people have commented on various aspects about it, but many people up until now probably ignored engaging with it because of how much of this doesn’t seem to acknowledge that the better tools exist and there are reasons to do this at a layer above the interpreter. You have provided nothing that shows a benefit to place it in the interpreter itself.