Immutability in Python is really hard

Marco_Sulla · October 23, 2019, 6:41pm

Writing a class that is really immutable in Python is really hard, if not impossible.

Currently I’m playing a little creating an immutable version of dict. To ensure “real” immutability, I had to:

create a function _setattr(), in this horrible way:

def _setattr(self, *args, **kwargs):
    """
    not implemented
    """

    if inspect.stack()[1].filename == __file__:
        object.__setattr__(self, *args, **kwargs)
    else :
        raise NotImplementedError(self._immutable_err)

In this way, I created a class with all its members private. And I had to use inspect.stack() that is not guaranteed to work on every Py implementation…

To make it working, the trick is overriding __new__():

def __new__(klass, *args, **kwargs):
    try:
        klass.__setattr__ = object.__setattr__
    except Exception:
        pass
    
    self = super().__new__(klass)
    self._initialized = False

and in __init__(), write:

def __init__(self, *args, **kwargs):
    self._klass = type(self)
    self._klass_name = self._klass.__name__

    self._immutable_err = "'{klass}' object is immutable".format(klass=self._klass_name)

    if self._initialized:
        raise NotImplementedError(self._immutable_err)

    [...]

    self._initialized = True
    self._klass.__setattr__ = self._klass._setattr

use __slots__, which for what I know are a bit problematic when you inherit a class
override __delattr__ so it raises NotImplementedError

And after all of this, I’m not really sure you can’t do some trick to mutate the object anyway…

I mean, all this mess could be avoided, if Python, like all the other OO languages, implements private. Why Python decided to not implement a normal OO visibility of the members of a class?

brettcannon · October 23, 2019, 9:02pm

That is an extremely broad statement to make.

I don’t know Guido’s direct motivation, but my guess is it isn’t necessary in order to accomplish what you want. I mean if you say something is immutable and people decide to lift the curtain to mess with it then it’s on them when something breaks because they chose not to respect the contract/restriction you made with them when using your software. Same goes for hiding details: Python has existed for nearly 30 years without visibility modifiers because as long as people follow the requirements the software outlines then there isn’t a problem.

njs · October 23, 2019, 9:35pm

It’s also fundamentally impossible to implement Java or C++ style visibility in Python, because their way of doing visibility requires that the compiler have static knowledge of every object’s type. Since Python’s a dynamic language, there’s no reliable way to look at an attribute lookup and tell whether the attribute is private, and if there was we still wouldn’t know whether the calling method is supposed to have access to it, because methods don’t know which class they’re defined on.

Even in Java and C++ though, private isn’t fully enforced: you can use tricks like reflection or type-casting to break the rules and access stuff you aren’t “supposed” to.

BTW the reason you’re having so much trouble is because you don’t want immutability, you want limited mutability. Immutability doesn’t require stack inspection; just define __setattr__ to raise an error and you’re basically done. Or better yet, use attrs or dataclasses with frozen=True.

Marco_Sulla · October 23, 2019, 11:37pm

Well, what attrs or dataclasses do is the same I’ve done… but you are right in one point, I don’t need inspect. I can simply use object.__setattr__().

And by the way, quoting myself:

And after all of this, I’m not really sure you can’t do some trick to mutate the object anyway…

Well, the trick is object.__setattr__() itself…

I know that you can use reflection in other languages, but it’s a bit tricky (even if I wrote a ReflectionUtility in Java that is very simple to use…).

I don’t think privateness is something related to static typing. If I say that a member is private, I simply want that it should be modified only from inside the class. Who cares if the member is a integer or a datetime? Indeed, for what I know, esnext can create es6 js code that implements a sort of privateness of class members.

ammaraskar · October 24, 2019, 12:18am

Could you explain the motivation behind making private members literally impossible to access?

Using a convention like don’t access members prefixed by _ and enforcing it using a linter doesn’t seem much different than the Java compiler complaining about accessing private members.

Are you trying to guard against people poking into your library’s immutable containers? Trying to code super defensively?

njs · October 24, 2019, 6:14am

But when you say “modified only from inside the class”, you’re saying that the method and the attribute have to belong to the same type. And in Python we don’t know either of those types at compile-time, or even necessarily at run-time. In particular, in Python a method is just a function that gets called with the object as the first argument, so the same function can potentially be a method on multiple classes, and also called as a top-level function, all at the same time. Python just fundamentally doesn’t treat types the same way Java/C++ do.

steven.daprano · October 24, 2019, 7:55am

“All” other OO languages?

I guess that means that Smalltalk isn’t an OO language. Or Ruby, or
Objective-C.

You may avoid all that mess by naming your private members with a
leading underscore, and trusting that the caller won’t mess with
them.

Marco_Sulla · October 24, 2019, 8:17pm

Excuse me, I had to be more precise: All the other OO languages used by at least the 3% of people… with all due respect to Smalltalk.

I can also trust the other programmers, but the interpreter does not. For example:

"a" is "a"
# True
myimmutable(*args, **kwargs) is myimmutable(*args, **kwargs)
# False

Even if, I must admit

(1, 2) is (1, 2)
# False
frozenset((1, 2)) is frozenset((1, 2))
# False
a = 257
b = 257
a is b
# False

This is strange, since numbers, tuples and frozensets are immutable, if they are composed by immutables only. Maybe it’s too much difficult to implement a caching for this kind of situations? I don’t know. But I suppose it’s technically possible to cache immutables like those.

Furthermore:

d = dict()
object.__setattr__(d, "keys", None)
# AttributeError: 'dict' object attribute 'keys' is read-only

fd = frozendict()
object.__setattr__(fd, "keys", None)
fd.keys()
# TypeError: 'NoneType' object is not callable

dict.items = None
# TypeError: can't set attributes of built-in/extension type 'dict'

frozendict.items = None
frozendict().items()
# TypeError: 'NoneType' object is not callable

I mean, what is the advantage of be able to do this kind of things? To do monkey patches, an anti-pattern? To make the code more unreadable?

Furthermore immutables can be shared between threads. But you can trust such an “immutable”?

@njs:

But when you say “modified only from inside the class”, you’re saying that the method and the attribute have to belong to the same type.

Why? I can do

class A():
    pass

a = A()
a.a = 5
a.a = "a"

There’s no type check. The problem is I can change the attributes of a class with any other object, and even create new ones or delete them without anything that I can do to stop it if I want to code a real immutable class. There’s no way to do it in Python, you have to code it in C. And this IMHO is not only stressful, is also useless and potentially dangerous.

ammaraskar · October 24, 2019, 8:46pm

"a" is "a"

works due to an implementation detail, an optimization known as string interning. You’ll notice a similar thing when the numbers are smaller:

>>> a = 1
>>> b = 1
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

Either way, the is operator is irrelevant in this discussion. It merely checks that two objects are the same referentially. Just because an object is immutable doesn’t mean there will only be one instance for equal objects.

You can easily implement such behavior yourself for your myimmutable class, merely look up if another myimmutable was constructed with the same arguments and return that.

Marco_Sulla · October 24, 2019, 9:18pm

Either way, the is operator is irrelevant in this discussion

I don’t think so. String interning is possible because strings are immutables. In theory, also a = 257; b = 257; a is b could lead to True, if the interpreter will implement it.

You can easily implement such behavior yourself for your myimmutable class, merely look up if another myimmutable was constructed with the same arguments and return that.

Unluckily not, since

frozendict({"a": 1}) == frozendict(a=1)

but args and kwargs of __init__ are differents. So I have to call __init__ before. But in __init__ I can’t change self, only __new__ can.

But the interpreter could do something similar to string interning for this kind of objects. I suppose this is quite possible and desiderable for memory saving.

ammaraskar · October 24, 2019, 9:33pm

Doing it within the class is tough, instead a much simpler approach is:

def myimmutable(...):
    cached_copy = ... # some lookup based on the args here
    if cached_copy:
        return cached_copy
    new_instance = _myimmutable(...)
    # put it in the cache here
    return new_instance

The interpreter can’t just do this willy-nilly for all immutable objects. Sometimes the cost of the lookup and caching far exceeds any benefit (e.g numbers only being interned up to 256). Either way, it’s trivial to implement interning yourself for immutable objects as shown above.

Marco_Sulla · October 25, 2019, 4:06pm

Well, maybe this is of some use:

from concurrent.futures.thread import ThreadPoolExecutor
from threading import Semaphore
import time
import sys

class Cache():
    @property
    def interval(self):
        return self._interval
    
    @interval.setter
    def interval(self, val):
        self._interval = val

    def __init__(self, interval=5, *args, **kwargs):
        if interval < 3:
            raise ValueError("`interval` parameter must be >= 3")

        self._cache = dict(*args, **kwargs)
        self.interval = interval
        self._lock = Semaphore()

        executor = ThreadPoolExecutor()
        executor.submit(self._clear_cache)
    
    def _clear_cache(self):
        while True:
            time.sleep(self.interval)
            toremoves = []

            with self._lock:
                for k, v in self._cache.items():
                    if sys.getrefcount(v) < 5:
                        toremoves.append(k)
                
                for k in toremoves:
                    del self._cache[k]

    def getInstance(self, klass_or_instance, *args, **kwargs):
        if not args and not kwargs:
            new_instance = klass_or_instance
            klass = type(new_instance)
        else:
            new_instance = klass_or_instance(*args, **kwargs)
            klass = klass_or_instance

        try:
            key = hash(new_instance)
            bruteforce = False
        except Exception:
            key = id(new_instance)
            bruteforce = True

        old_instance = self._cache.get(key)
        
        res = None
        
        if old_instance is None:
            if bruteforce:
                with self._lock:
                    for k, v in self._cache.items():
                        if new_instance == v:
                            res = v
                            break
            
            if res is None:
                with self._lock:
                    self._cache[key] = new_instance

                res = new_instance
        else:
            res = old_instance
        
        return res

_cache = Cache()

def getInstance(klass_or_instance, *args, **kwargs):
    return _cache.getInstance(klass_or_instance, *args, **kwargs)

__all__ = (Cache.__name__, getInstance.__name__)

I tested it a little and it seems to work. Instead of using weakref, that does not work always, I simply check the ref counts.
Don’t know if this is always precise, since the ref count was augmented by one even if I do not assign the output of cache.getInstance() to a variable… furthermore I don’t know if this works with cyclic refs.

Paddy3118 · November 4, 2019, 9:42pm

Several other OO languages have hidden back doors to allow access to private members. I had come across othrs lamenting Pythons lack of “true” privacy so started this task on RC where some users have come forward with examples and documentation on such access.

Marco_Sulla · November 4, 2019, 11:03pm

Already said. And I repeat that in Python this is too much simple, and not only prevent possible optimizations, but render monkey patching, an anti-pattern, very simple. I quote myself:

d = dict()
object.__setattr__(d, "keys", None)
# AttributeError: 'dict' object attribute 'keys' is read-only

fd = frozendict()
object.__setattr__(fd, "keys", None)
fd.keys()
# TypeError: 'NoneType' object is not callable

dict.items = None
# TypeError: can't set attributes of built-in/extension type 'dict'

frozendict.items = None
frozendict().items()
# TypeError: 'NoneType' object is not callable

class A():
    pass

a = A()
a.a = 5
a.a = "a"

About optimization, for example, frozenset(another_frozenset) returns another_frozenset. The two objects points to the same address, since frozenset is written in C and is really immutable, and can implement an idempotent constructor. On the contrary, an idempotent constructor for a Python class is not very recommendable…

brandtbucher · November 5, 2019, 12:09am

I’m not sure what you mean exactly when you say that objects written in C are “really immutable”… Nothing is really immutable.

Immutability is less about hiding things and more about keeping agreements. Rather than seeing your users as hostile programmers who are trying hard to break their own code, maybe see them as ones who just want an immutable mapping in order to prevent accidents, or to avoid making defensive copies when passing them to functions in other libraries, or to cache expensive data structures. Then you can focus less on obfuscating your own code and more on improving interfaces and performance.

If you really want something totally bulletproof, Python may not be the best language to work with:

>>> import pycapi  # Disclaimer: this is my own package.
>>> t = (0, 1, 2)  # An immutable tuple.
>>> x = "XXX"
>>> pycapi.PyTuple_SET_ITEM(t, 1, x)  # https://docs.python.org/3/c-api/tuple.html#c.PyTuple_SET_ITEM
>>> t
(0, 'XXX', 2)

steven.daprano · November 5, 2019, 1:24am

“Several” other OO languages? +1 understatement of the year – the
Rosetta Code page you link to lists 35, including Ada, C#, C++ and
Java. And that’s probably the tip of the iceberg.

I think that we can dismiss complaints that Python is somehow unique, or
at least unusual, in having ways to break “private”. Some languages make
it harder to break, some make it easier, and Python just says “We trust
you, don’t do anything silly, but if you do, it’s on you.”

(And even there, there are exceptions, such as closures. If there’s a
way to break the encapsulation of a closure, I don’t know it.)

This is mostly a good thing (a common complaint about Pascal was that
there was no escape from the compiler’s rules), but it does come with
some costs, like ruling out some compiler optimizations. But overall, I
think that for Python’s intended use-cases, the benefits outweigh the
costs.

brandtbucher · November 5, 2019, 1:49am

(And even there, there are exceptions, such as closures. If there’s a way to break the encapsulation of a closure, I don’t know it.)

There’s a dunder for everything:

>>> def get_foo_getter():
...     super_secret = "foo"
...     def get_foo():
...         return super_secret
...     return get_foo
... 
>>> getter = get_foo_getter()
>>> getter()
'foo'
>>> getter.__closure__[0].cell_contents = 'bar'
>>> getter()
'bar'

Marco_Sulla · November 5, 2019, 10:32am

Ok but this is very hard to know. On the contrary, monkey patching in Python is really easy.

For what I know, Python was created not only to be easy to use, but also to help the developer to write good and readable code. That’s why the indentation is mandatory; strings are immutable; classical for loop is not supported (yes, you can do for i in range(len(iterable)), but it’s more complicated!); /**/ it’s not supported (yes, you can use """… see above); implicit conversions in operations are very low (also the + operator between strings and other non-strings objects is not supported, that is the only think easier to do in Java!); usually if something goes wrong an exception is raised, even if it renders the code more slow, instead of returning 0, -1, "", None or everybody else; there’s no switch - case, even if it will improve performance; there’s the pass statement; there is the name mangling for “private” class attributes; from modulex import * does not import “protected” and “private” variables; there’s a = x if y else z; and there’s no do ... while.

I mean, for emulating private and improve memory and performance, one has to use __slots__, that is less readable and it’s a mess when you want to extend that class. Is that really more simple, readable and efficient instead of not introduce private or everything else (a decorator for example)? IMHO not.

My wish is not that Python have to support the creation of classes that are really immutable. As we know, this is impossible, and not only for Python. My wish is that __slots__ will be more simple, and to discourage bad coding practices, like monkey patching.

aeros · November 22, 2019, 8:31am

This may apply to built-in types, but for custom objects you can easily add an implementation for using the + operator on a string and non-string, in both directions:

>>> class MyClass:
	def __add__(self, other):
		return id(self) + id(other)
	def __radd__(self, other):
		return id(self) + id(other)

>>> test_str = "test"
>>> test_obj = MyClass()
>>> test_str + test_obj
279616730195648
>>> test_obj + test_str
279616730195648

I wouldn’t particularly recommend doing something like this (as the behavior is externally unclear), but there’s nothing we do to explicitly prevent it. Python takes the philosophy of “consenting adults” quite seriously, and avoids limiting functionality as much as possible.

That’s along the same lines of why Python doesn’t have truly private instance attributes or more strictly enforced encapsulation, unlike Java. You can communicate to the users of an API that certain components should be private (through the use of an underscore or exclusion from __all__), but you can’t entirely prevent them from using those components.

Marco_Sulla · November 26, 2019, 12:09am

So why __slots__?

Immutability in Python is *really* hard

Immutability in Python is really hard