Require for feedback: explict absolutely private attributes

Okay, sure. But before you can talk about this as a security feature of any kind, you’re going to have to figure out all of these things, and come up with an answer for “what is the attack surface that this protects against?”. So far, I’m not seeing it. It’s just underscore-prefixed attributes with extra steps.

This mechanism is specifically designed as a barrier in scenarios where the object is partially leaked
not to prevent object leakage, but to prevent semantic recognition of leaked fields.

If an attacker can see some attributes but cannot run arbitrary Python or inspect the class definition,
then randomized, opaque attribute names stop them from knowing which field is the username, which is the token, etc.

So yes:
this is intentionally a defense layer that operates after an object has already been (partially) leaked.

It’s a valid defense-in-depth measure for partial disclosure vulnerabilities and it can reduce the loss. Of course, the user still needs to fix the leaked bug finally.

Okay, here’s some personally identifiable information about me [1]. I’m sure you have no clue what any of it means!

  • 03 1149 2795
  • Chris “Rosuav” Angelico
  • rosuav@example.net
  • 936
  • PO Box 42, Melbourne, VIC 3000
  • 4564 1234 2345 4321
  • $2b$12$kKvWf7DnZdVewYRt9d5ebe9Jov68Yzx8NSisTFUWJ74ljz/Pg6su2

You don’t have any of the attribute names, just the values.

How good is your semantic recognition of these leaked fields?

“Defense-in-depth” of this style is like always wearing a condom while teaching a class.


  1. sliiiiiiiightly falsified, but that’s not relevant to my point ↩︎

1 Like

In fact, what hacker may watch finally is (by hashlib.sha256()):

  • 009659792e345b4abea879b59fad2baef31b4f4ec2c07778094430f626a9ac98
  • d7fe7cd86fa9f86d27b09004d931d2dd2f86f685b7428fe8d602e540cbc8de9a
  • d2ff217ef228cb20daa58869006d6dc56eadce0f7b8fa235764ae2740cb98f49
  • f53f2fb9b99180ea02b1f345b6c862e6bdde16e3b82a6886be0234d09a0e1645
  • 27dbf16e43dd54072066fffcd8b16bcd2a014a7fa9e77c016f0814157378b96d
  • a2cc6d11557683f6c310a1082df78d944cca8c654064dfb662d93f7d99e7cbdf
  • 610f49c5d3c3158c6f1b2ae65649865434370d40818884e71b264a2f59111f87

The form might be different, but they use a way to encode the result.
Now if I want to get your information, I know that your name sha256 is “d7fe7cd86fa9f86d27b09004d931d2dd2f86f685b7428fe8d602e540cbc8de9a” and find:

"_user_name": d7fe7cd86fa9f86d27b09004d931d2dd2f86f685b7428fe8d602e540cbc8de9a
"_password": 610f49c5d3c3158c6f1b2ae65649865434370d40818884e71b264a2f59111f87
"_address": 009659792e345b4abea879b59fad2baef31b4f4ec2c07778094430f626a9ac98

Now the hacker just need to solve the password hash.

However, by the module’s way, I got the hash of your information:

"_{}\"}'_@BHXSBaj": d7fe7cd86fa9f86d27b09004d931d2dd2f86f685b7428fe8d602e540cbc8de9a
"_@\":\"Hh_sjoaomM8_6^*9": 610f49c5d3c3158c6f1b2ae65649865434370d40818884e71b264a2f59111f87
"_'sj-_+=x&7sScd_zk&8": 009659792e345b4abea879b59fad2baef31b4f4ec2c07778094430f626a9ac98

Yes, the _{}\"}'_@BHXSBaj is the name, but is it really must be the user name? And what is the other thing mean?

Wait, you sha256 the data you store? Then how on earth do you retrieve it afterwards?

I ask again: What is the ACTUAL use-case here? You keep moving the goalposts. If this is solving any sort of real problem, show us the actual situation where this has the slightest relevance. If it’s possible to reverse the hashes and get back to the result, then my previous point stands - it’s trivially easy to recognize what aspect something is, without needing any sort of attribute names. And if it isn’t, then having attribute names wouldn’t help.

You are misunderstanding the threat model here.

The point is not “storing data with sha256” nor “retrieving from hashes”.
That’s not the use-case and not the problem being addressed.

The scenario being discussed is:

(1) Some library or framework accidentally exposes an internal object
—for example by a JSON serializer recursively walking attributes.

(2) The attacker cannot run arbitrary Python code
—they can only see the resulting serialized structure.

(3) The attacker now sees a blob of unknown keys + values
but wishes to determine which value corresponds to which semantic field
(e.g. which one is password hash, which one is address, which one is name).

Your argument falls into a clear false dichotomy.

You assume only two possibilities:

  1. If the hash is reversible:
    then the attacker instantly knows everything, so attribute names don’t matter.
  2. If the hash is irreversible:
    then attribute names also don’t matter.

But this ignores the reality between these two extremes.

In practice:

  • A hash can be computationally hard to reverse,
    but still valuable only if the attacker knows which field is the password.
  • Semantic mapping (“this one is password”, “this one is address”)
    is itself sensitive information and often the first step of an attack.
  • Obfuscated attribute names remove the attacker’s ability to classify fields,
    even when the values are visible.

So your conclusion only holds if you assume an all-powerful attacker with infinite capability,
which is not the threat model being discussed.

The real security question is not
“can the attacker reverse the hash?”,
but
“can the attacker even identify which value is worth attacking?”

Conflating these two is exactly the false dichotomy.

I’m done trying to discuss this. It’s a pure hypothetical and you keep on shifting your situation every time I ask a question.

Figure out what your attack surface is. For real. Actually come up with a full scenario. THEN we can discuss, maybe.

The core problem in your response is that you’re treating security as a binary state:
either the attacker has full control of the system, or nothing matters.
That is a false dichotomy, and it collapses the entire discussion prematurely.

Real-world systems do not fail only through “full compromise.”
They fail constantly through partial, accidental exposure:

  • objects recursively serialized by a JSON library
  • debug endpoints returning internal state
  • cache dumps
  • logs that accidentally print object internals
  • exceptions that include object state
  • reflection in RPC frameworks
  • sandbox leakage in multi-tenant environments

These aren’t hypotheticals—they are some of the most common categories of security failures (and have entire CVE families documenting them).

In these scenarios the attacker gets field names + (possibly encoded/hash) values,
but not full system control.
And in these cases, field names absolutely leak semantic information, which directly aids an attacker:

  • which field is a password
  • which field is a token
  • which field controls privileges
  • which field is user identity
  • which fields are sensitive vs operational

Obfuscating attribute names removes that semantic leakage.
This is not meant to be a primary security boundary; it is classic defense-in-depth aimed at reducing the damage of partial object exposure.

Your last point—

“If hashes are reversible → names don’t matter; if not reversible → names don’t matter”
is exactly the false binary I’m referring to.
Semantic leakage exists regardless of whether the values are reversible.
Hashing values plus randomizing field names removes two different classes of attacker guidance.

You keep demanding an “actual scenario,” but when given several, you dismiss them because they don’t match an all-powerful attacker model.
That isn’t moving the goalposts on my part—it’s you refusing to acknowledge any attacker model other than “attacker owns the machine,” in which no mitigation makes sense.

This technique addresses a very specific, very real attack surface:
accidental object exposure where field names leak meaning.
If you don’t accept that this attack surface exists, then of course the mitigation seems irrelevant—but that is a flaw in the threat model, not in the mitigation.

This closes the loop: the use-case is valid, the attack surface is real, and the binary framing you insist on is simply incorrect.

I think the problem with this post is that originally we were talking about runtime access to the object e.g.

  1. In some place, they want to use python to write code with private attribute and want to protect them away from changing them outside the class. However, they are told that in python, it is impossible.
  2. Sometimes who use and change the private attributes are cheaters or hackers instead of themselves. By this way, they can make it more difficult to change these attributes.

And now it became mostly about the server accidentally leaking information, not runtime access. I think if you had framed it as that directly you would be getting more useful feedback.

3 Likes

Yes. Many new concepts’ abilities needs to be found in the future. However, one discussant want me to show the benefit now. That is difficult.

You example doesn’t work:

Traceback (most recent call last):
File “/Volumes/Queen/python/privattr/./venv/bin/privattr”, line 4, in
from privattr.main import main
File “/Volumes/Queen/python/privattr/src/privattr/main.py”, line 1, in
from private_attribute import PrivateAttrBase
File “/Volumes/Queen/python/privattr/venv/lib/python3.10/site-packages/private_attribute.py”, line 89, in
@warnings.deprecated(“The ‘register’ function is deprecated and will be removed in future versions.”)
AttributeError: module ‘warnings’ has no attribute ‘deprecated’

Thank you. Please tell me your python version.
I will have a look about which version add the attribute “deprecated” for the module “warnings”.

I guess it is due to PEP702

Fixed in version 2.0.1

This is backwards. You need to start with use cases, and then build for those use cases. We don’t build completely at random and then hope it goes well; that would be silly.

You cited three reasons for this feature at the start of this thread:

(1) doesn’t make sense to me as a reason. Python is popular with beginners. So what? I don’t think this relevant and definitely not a reason to implement private attributes.

(2) is a real use case, and there are languages which have private attributes and methods. But Python does not have this feature and the language design is philosophically opposed to such a feature. The “consenting adults” principle refers to the idea that, since a user of my code can always find a way around visibility rules, we should only have them by convention, not enforced by the language.

(3) is about security but is poorly reasoned. I don’t find it surprising that you were pushed very hard on this.

In fact, language level visibility contracts are often mistaken for a form of security. That’s not what they are. They’re a feature to achieve what Python achieves with a convention around the single leading underscore.

You started this thread with a request for feedback. My feedback is:
Rather than focusing so much on your idea itself, try to learn why Python doesn’t already offer visibility controls.

3 Likes

Anyone who though that I want to deny the original private attributes see here.

I’ve not followed the discussion (you asked for feedback, and you got it).

tl;dr pick your rules, and let us attack it!

Maybe there’s value (not just educational) in having a small contest, maybe even with a beer money level bug bounty?

It would be good to see a clearly defined situation, in which you feel this adds security, so then that can be tested by those of us who think “hmmmm. Not so much”.

It doesn’t sound like this was your goal, but the example worked for me when I briefly tried it yesterday, object.__getattr__(obj, 'a') was thwarted, and I can see private_attribute does make it harder to casually change attribute values. Even if it not does not entirely rule them out, it does greatly help avoid that certain category of bugs. For me that’s the benefit of true private attributes - much the same as that of immutability - being able to reason about code. If it’s not adding loads of other bugs of its own, private_attribute’s ample for that purpose.

Now I found some bug then fixed in version 2.3.0.

Here is the benefit now I guess:
When serialize an object, some programer think that “I use ‘_’ to declare that it is private, it shouldn’t be serialized”, while the serialization module writer think that “To full copy your object, I need to get all of the attributes, even if it is private, finally it will be still used by yourself”. So, the default serialization will defaultly contain those private attributes. When those data leak, it will be a great chance for the attacker to easily get the important information (such as password, api_key). However, if you write:

from private_attribute import PrivateAttrBase

class User(PrivateAttrBase):
    __private_attrs__ = ("_password", "_api_key")
    def __init__(self, user_name, password, api_key):
        ... # the other verificating
        self.user_name = user_name
        self._password = password
        self._api_key = api_key

The “_password” and “_api_key” won’t be serialized (the tests below are in IDLE):

class User:
    def __init__(self, user_name, password, api_key):
        self.user_name = user_name
        self._password = password
        self._api_key = api_key

        
user = User("Bob", "1234560", "asdfghjkl")
import jsonpickle
jsonpickle.encode(user)
'{"py/object": "__main__.User", "user_name": "Bob", "_password": "1234560", "_api_key": "asdfghjkl"}'
from private_attribute import PrivateAttrBase
class AnotherUser(PrivateAttrBase):
    __private_attrs__ = ("_password", "_api_key")
    def __init__(self, user_name, password, api_key):
        ... # the other verificating
        self.user_name = user_name
        self._password = password
        self._api_key = api_key
    __getstate__ = object.__getstate__ # If not, finally it got 'null' because 'PrivateAttrBase' default got error in '__getstate__'

    
anotheruser = AnotherUser("Bob", "1234560", "asdfghjkl")
jsonpickle.encode(anotheruser)
'{"py/object": "__main__.AnotherUser", "user_name": "Bob"}'

The module “jsonpickle” follows the “__getstate__” agreement, but you cannot ensure that all of the modules do. In this case, from all of the programmer, the using are “normal” (though in fact there are some comflict), but it leads to security problem in default way.

Ofcourse, it cannot block all of the illegal visiting, but if follow the agreement, it can be safer.

Now there are something to note:

  • Don’t list the attribute names which are public in parent class to child class “__private_attrs__”.
  • Don’t define PrivateAttrBase’s subclasses with same name in one module twice or more.

This isn’t a problem.

If a consumer of an object is defined to read all attributes, then that’s what it does. (I’d say that sounds pretty broken, outside of the case of frameworks which define and use their own attributes in some domain.)

If an object has private attributes that should not be read, then you shouldn’t pass it to readers which read private attributes.

Even if this were a problem people encounter (I can’t recall ever seeing this), it would be a problem at the integration site between these two objects.

You later cite jsonpickle, which is about pickling, which is a very well defined serialization method. In fact, I almost brought up pickle before, since that’s the proof that none of this is a security issue.

Yes, but in real projects this requirement is practically impossible to satisfy.

When a serialization or inspection module does not respect __getstate__, the class author cannot ensure that all call sites will avoid passing the object into readers that walk every attribute.

Large Python applications often contain:

  • many classes,
  • many third-party serialization/inspection utilities,
  • and many integration points written by different developers.

If even one part of the system uses a tool that recursively reads all attributes, the private values will leak.
This is not a misuse by a single developer — it is an integration problem caused by the fact that attribute enumeration is the default behavior of many libraries.

Because of this, relying solely on every caller to “remember not to pass the object into such readers” is unreliable in practice.

My library aims to eliminate this accidental leakage by removing private attributes from __dict__, so that even tools which ignore __getstate__ will not serialize them.

For more, you can run this code in IDLE to test the effect (need the last build):

from private_attribute import PrivateAttrBase
import random
import string

target0 = "1982hwrubfidh1ur2ufwgier2kgwfu"
target1 = "176wetdwtyqghuyhq2i4ewf9r"
target2 = "29e7rwfy1iey19heuidwhu1f2qq2fe"
target3 = "1382rygfhb1iyeurhiq9fuyq29whi"
target4 = "02hiu291h`je1ud9whi1j0odweqhwihej"
target5 = "91ygde2dy2gufwgh1yrih2wfyi1eduhij"
target6 = "2478tedygwuhi1u9qiwhedfi1edqwhu"
target7 = "932729ueqwdhgi1equhwiuhq2diwujuedw"
target8 = "02huewiefgyvu2whefgbwhduefhrehu"
target9 = "018ehudwfihueiwdhusbufhydhujefd"

def get_random_string():
    a = ""
    for i in range(10):
        a += random.choice(string.printable)
    return a

class OneRandomClass(PrivateAttrBase):
    _private_name0 = get_random_string()
    _private_name1 = get_random_string()
    _private_name2 = get_random_string()
    _private_name3 = get_random_string()
    _private_name4 = get_random_string()
    _private_name5 = get_random_string()
    _private_name6 = get_random_string()
    _private_name7 = get_random_string()
    _private_name8 = get_random_string()
    _private_name9 = get_random_string()
    __private_attrs__ = [
        f"_my_private_attr_{_private_name1}_0",
        f"_my_private_attr_{_private_name1}_1",
        f"_my_private_attr_{_private_name1}_2",
        f"_my_private_attr_{_private_name1}_3",
        f"_my_private_attr_{_private_name1}_4",
        f"_my_private_attr_{_private_name1}_5",
        f"_my_private_attr_{_private_name1}_6",
        f"_my_private_attr_{_private_name1}_7",
        f"_my_private_attr_{_private_name1}_8",
        f"_my_private_attr_{_private_name1}_9",
    ] + [f"_private_name{i}" for i in range(10)]
    def __init__(self):
        attr_list = list(f"_my_private_attr_{self._private_name1}_{i}" for i in range(10))
        random.shuffle(attr_list)
        for i, name in enumerate(attr_list):
            setattr(self, name, globals()[f"target{i}"])

    def result(self):
        attr_list = list(f"_my_private_attr_{self._private_name1}_{i}" for i in range(10))
        result_dict = {}
        for i in attr_list:
            result_dict[i] = getattr(self, i)
        return result_dict


class AnotherRandomClass:
    _private_name0 = get_random_string()
    _private_name1 = get_random_string()
    _private_name2 = get_random_string()
    _private_name3 = get_random_string()
    _private_name4 = get_random_string()
    _private_name5 = get_random_string()
    _private_name6 = get_random_string()
    _private_name7 = get_random_string()
    _private_name8 = get_random_string()
    _private_name9 = get_random_string()
    def __init__(self):
        attr_list = list(f"_my_private_attr_{self._private_name1}_{i}" for i in range(10))
        random.shuffle(attr_list)
        for i, name in enumerate(attr_list):
            setattr(self, name, globals()[f"target{i}"])

    def result(self):
        attr_list = list(f"_my_private_attr_{self._private_name1}_{i}" for i in range(10))
        result_dict = {}
        for i in attr_list:
            result_dict[i] = getattr(self, i)
        return result_dict


a = OneRandomClass()
b = AnotherRandomClass()

Now in IDLE shell, you have the object “a” and “b”, and you cannot change anything on the object (just like a tool to read the object), and you cannot call the function except “__getstate__” found on the object or the private attribute in “private_attribute” module (to simulate the serialization tools: they cannot do that). What you can call is what the serialization tools normally call. It is just a condition that a careless programmer forgets to limit the visiting. Now you need to get the result of the method “result” before call the method just in this shell.