Require for feedback: explict absolutely private attributes

Here is an attempt for implement the explict absolutely private attributes.

Usage:

from private_attribute import PrivateAttrBase

class MyClass(PrivateAttrBase):
    __private_attrs__ = ['a', 'b', 'c']
    def __init__(self):
        self.a = 1
        self.b = 2
        self.c = 3

    def public_has(self):
        print(hasattr(self, 'a'))  # True
        print(hasattr(self, 'b'))  # True
        print(hasattr(self, 'c'))  # True

    def public_way(self):
        print(self.a, self.b, self.c)

obj = MyClass()
obj.public_way()  # 1 2 3

print(hasattr(obj, 'a'))  # False
print(hasattr(obj, 'b'))  # False
print(hasattr(obj, 'c'))  # False
obj.public_has() # Three True
try:
    obj.a #error
except AttributeError as e:
    print(e) # 'MyClass' object has no attribute 'a'

try:
    obj.a = 2
except AttributeError as e:
    print(e) # cannot set private attribute 'a' to 'MyClass' object

object.__setattr__(obj, "a", 2) # OK, but useless.

try:
    obj.a #error
except AttributeError as e:
    print(e) # 'MyClass' object has no attribute 'a'

obj.public_way()  # 1 2 3

You can use this way to protect your private attribute.

Interesting. And impressive if you’ve got it working. But this seems like a demonstration of hidden private attributes, not merely private attributes. If a library is going to deliberately introduce astonishing behaviour as a feature, it would be better IMHO to at least tell the user what the heck is going on, that they can’t read something or change it because it’s private or reserved, instead of behaving as if the attr wasn’t there at all.

If you need truly private attributes in Python for some application, then you shouldn’t be writing that in Python at all IMHO (perhaps it could be a Python extension though). Messing around with __getattr__ and __getattribute__ et al in a metaclass is a fantastic way to create especially tricky bugs.

Such bugs and such complexity is better avoided, by simply letting the user do what they want, but adhering to the convention that if they mess around with a sunder, or a dunder, then that’s not supported and they’re on their own (they broke it, they bought it).

1 Like

Here is my reason:

  1. Many people only studied python program language (now python is the most popular program language).
  2. In some place, they want to use python to write code with private attribute and want to protect them away from changing them outside the class. However, they are told that in python, it is impossible.
  3. Sometimes who use and change the private attributes are cheaters or hackers instead of themselves. By this way, they can make it more difficult to change these attributes.

It is interesting that private/protected members are a recurring issue. Yes, they are used in other languages, but that does not imply that Python is missing something in this regard.

What some people don’t understand is that private/protected members are not a security issue. They are more about a compiler-enforced programming style, which avoids potential problems like code churn when an internal name must be changed for a good reason. Hence, we use setters, getters, etc.

Python takes a different approach. Instead of strictly enforcing private and protected members, Python marks internals (i.e., what other languages consider private or protected) and assumes that the programmer will handle them reasonably.

Regarding your reasons:

  1. Fine, here not so relevant.

  2. Protect from whom exactly? The closest I can see are library authors who want protect the user of their libraries from mistakes. But there are other possibilities for them.

  3. Forget this: As soon as you have access to the code, you have plenty of ways to hack or bypass it. Private/protected access is not a suitable mechanism for preventing this.

1 Like

That’s true, but the common private attributes are easy to be hacked and the way is stable: one time you visit “obj._a” successfully, and when the code reruns, you also visit it successfully.

However, as you see, by my way, the hacker cannot find the way to hack the private attributes stable easily: The name is random and the storage use hash, which means that even the hacker visit one of the private code successfully, the same code is almost unable to hack the attribute successfully in the next running.

Private variables are not a security mechanism. They are communication between the class designer and client about what parts of a class are the user interface and which are implementation details. I’m not aware of any language than runs in user-space that truly “protects” private variables. I’m pretty sure C++ inventor Stroustrup explicitly stated this somewhere and Perl’s Larry Wall says something to the effect ‘I expect you won’t acces private variables because you’re a respectful person, not because I’m standing in the living room with a shotgun.’

So my feedback is that this goal is misguided, and prior discussion Private, protected modifier and __ notation indicates it’s unlikely to get community support.

1 Like

Here is my idea: Impossible to hiding the private attributes doesn’t mean that any attempt to protect the private attribute are useless. For example, there is no way to be sure that no criminals can escape the prison, but it doesn’t mean that any protections for reduce the escaping are unnecessary.

That idea want to change the python itself, but here is a module. Both way for private attributes are ok, just depends on user’s requirments.

The problem here is telling people it’s impossible. It’s not. This is what private attributes look like in Python:

self._stuff

And this is especially true of:

Because if you think that private attributes are going to stop cheaters, you have completely misunderstood their purpose. They’re not going to be effective at that.

This is, again, completely the wrong attitude towards private attributes. They are not a jail. They are an acknowledgement of ownership, which is entirely cooperative in nature.

Remember, ctypes will let you change the value of an integer. Purely in Python code. There is no way that you can ever make something truly private, and we don’t need it.

You misunderstand the whole idea. Sometimes the data in program need to be protect, and the module can provide one layer of protection.

However, sometimes they need to be seem as a jail. And, when cheaters or hackers use ctypes to change the program variable, one mistaken they make will leads to segfault. I just make it more difficult to be found.

I don’t understand what scenario you think this is impeding any attacker? If you have the ability to execute Python code in someone else’s Python process then a simple

import private_attribute, importlib
private_attribute.PrivateAttrBase = object
importlib.reload(module_that_uses_private_attribute)

is enough to undo it all.

I’d be really cautious of illusionary security features. There once used to be a bytecode encryption feature in PyInstaller – no matter how many warnings we put up of its increasingly trivial circumventability[1], people would be surprised when I told them that distributing their API keys with it is a very stupid thing to do. They’re just too hot headed to bother checking what protections are really being provided.


  1. it only takes one person to read the (public!) source code, find a way to beat it then publish it in a format that anyone can use ↩

4 Likes

You say “jail” but then you say “more difficult to be found”. Do you realise that these are two completely different things? If you’re trying to actually PREVENT access, you need something other than this (eg memory protection, user permissions, etc - stuff that usually happens at the level of a process). This is not that sort of tool.

Ultimately, this is the exact same sort of cooperative data hiding as self._stuffo or C++'s private: label. You can argue about exactly how much effort it takes to get around it, but you are NEVER going to stop someone who is deliberately trying to get around your security, because it is not security.

Exactly. It is worse than useless to have something that pretends to be secure. Many people will be deceived into thinking that they’re safe, when they’re really not. With something that is clearly just a convention (especially one that is occasionally violated - for example, a namedtuple will have things like _replace() that are part of the public API), there is no illusion, no confusion. It’s easy to explain the two rules of private attributes in Python:

  1. Don’t access private attributes.
  2. If you do access private attributes and something breaks, it’s on you.

Easy to explain, easy to understand. Adding other implications, especially when they can be circumvented fairly trivially, doesn’t improve the situation.

Oh, and - Yes, sometimes the data needs to be protected, but this is not doing that. If you need your code and data to be protected, put them on a server and give people specific access to that server (eg by running a web application). That is a well-known solution to the problem, and it genuinely works - to the extent that it raised concerns about people being able to close-source something that was GPL’d, leading to the AGPL for those who are concerned by that. You can’t protect something and put it in the hands of end users - just ask the DVD consortium how well that went.

Here is an example:

There is a person named “XXX YYYY ZZZZZ”, and I am a hacker. I got a task that get the personal information of the person. I found a server, which has a bug and I luckyily infiltrate the server. According to the intelligence, the person have an account on this server, but I had few time to get the information and I could only “get”.

  1. The server use “_user_name” as a private attribute of the user, I got a object which the attribute “_user_name” is “XXX YYYY ZZZZZ” and I got all of the attributes of the object. Finally I finish my task.
  2. The server used “private_attribute” to build the class, and I was difficult to find the way to do it. I got a dict that seems to be the information of the people:
{
"_@#sj_HSWJ()_1!23": "XXX YYYY ZZZZZ",
"_@&YBbv_::;,._a ce": "12386087621",
"_CDE$%?_ak15:;''_Hjgv": "sh13cFjn&*"
}

I think that the account is the person’s information and I said that I finish the task. However, I wrong: “_@#sj_HSWJ()_1!23” is not the “_username”

Define “infiltrate”. Are you able to run arbitrary code on the server? View files on the server? Run arbitrary Python code in the server application?

This isn’t Hollywood.

That’s wrong. One protection can effect doesn’t mean that we don’t need any other protections. The limit of the permission is once, while the difficulty of the data getting is another. It is like that the jail has many fences, just one of them cannot reduce the excaping, but they add the time to escaping and ensure that the prison officer can find the escapers before they successfully excaping.

For another example, the thief can steal anything when he successfully sneaks in a house. However, the time to getting the important thing is very long so that it can help people find the thief.

You are assuming “infiltrate” means full RCE or full filesystem access, but that is not the scenario.

In real-world security incidents, there are many cases where an attacker does not gain arbitrary code execution.
A very common and realistic level of infiltration is:

  1. the attacker can trigger part of the application logic
  2. the attacker cannot run arbitrary Python code
  3. the attacker cannot freely inspect internal hidden mappings or closures
  4. the attacker cannot dump process memory or patch bytecode

This is not Hollywood. This is exactly how many real vulnerabilities work:
the attacker can interact with the application, but cannot escalate further.

I’m not assuming. I’m asking. What EXACTLY is your scenario here that you’re protecting against?

Okay. And how does your proposed code affect this? How does a notion of “explicitly absolute private attributes” prevent data leakage in a way that simple underscore prefixed attribute names wouldn’t? Either you can trigger some part of application logic that reveals this or you can’t.

You said in your earlier example that you “got a dict”. That sounds like dumping process memory to me, and so what you’re saying is that there’s some kind of obscured in-memory storage? That has nothing to do with whether one part or another part of the code can access it.

Did you plan this for an actual scenario?

Yes. This module still need a way to save them. However, the module use both hashlib and random to change the save name. In fact, the user can change the logic for more difficulty to guess what attributes it is original.

These attributes are like nut wall: They will finally be eaten by zombies, but have stalled for enough time for attack plants kill the zembies.

Saving is irrelevant unless it can be loaded. How does it get loaded?

It has a default way to get. I think that I may add a feature that the user can easy change the way to encrypt in the future, and even the way to load it from the users’ ways will be hidden by users’ ways.