Adding Deep Immutability

Preserving?

Potting?

Fossilising?

Petrifying?

Entombing?

Pickling – er, no, we’ve already used that one. :frowning:

Assuming that “the PEP” still means the document linked in the OP then I have read it and I am still asking the same questions which you have not answered.

The PEP says that freeze(obj)

recursively marks obj and all objects reachable from obj immutable.

The set of objects reachable from any given object is large e.g. given

from fractions import Fraction as F

class A:
    def f(self):
        return F()

a = A()

then we have

>>> a.f.__globals__['F']
fractions.Fraction

Does this mean that freeze(a) needs to freeze the Fraction class? How far does this go? What is the complete set of objects reachable from a?

The PEP suggests that the references can go quite far:

it is possible that freezing an object that references global state (e.g., sys.modules , built-ins) could inadvertently freeze critical parts of the interpreter.

Regardless of whether it is in C or Python the lru_cache function is useless if it can’t mutate its inner cache which it needs to do every time the wrapped function is called. So what should happen when freezing a function f that uses lru_cache? Do we get a frozen f that is broken or has its cache disabled or is f just not freezable? Is the answer that a Python version of lru_cache would be frozen but broken whereas a C version would not be freezable? If it is not freezable then how far does this impact the freezability of other objects from which f is reachable?

How is this expected to work in practice in terms of library support? Will there be a small number of specialised libraries that will provide limited freezable types? Or are people going to expect to be able to freeze anything from anywhere? What would a library (in Python or C or Rust) need to do if it wanted to support freezing and how realistic is it to even do that?

3 Likes

I think that is obviously impossible, but this might be -somewhat- equivalent to
“prevents every methods of obj to mutate any state”
(… like, put it in a safety box where it cannot do any change).
Assuming this is doable right now the problem would arise from static methods, as they are not distinguishable when coming from the frozen instance or not.

As I mentioned, the PEP avoids in-place freezing non-local states of a function by making immutable copies of cells, globals and builtins when freezing a function:

There are also detailed comments in the reference implementation:

You can make calls to an immutable cached function as long as there is no cache miss, so pre-cache before freezing if that’s viable for the circumstances. But otherwise yeah if you want the cache to continue to store new call results then it is simply not going to be a freezable class. As I mentioned, the PEP takes the stance of a strict opt-in policy so existing libraries aren’t expected to work with freezing out of the box. Library maintainers wishing to support freezing should take care to whitelist freezable C extensions and mark unfreezable classes by inheriting from the NotFreezable class.

2 Likes

I don’t think trying to freeze function is ever going to be successful. Here a minimal example that bypasses the frozeness with the current reference implementation:

def counter():
    __import__  # Needed because `import` doesn't work otherwise
    import sys
    sys.counter += 1

    return sys.counter

If you now say “lets just disable import”, that first of all is really annoying for some usecases and will disallow some libraries be frozen unnessarily.

But also, that doesn’t fix the issue: __import__('sys') will still bypass it. And if you blacklist the __import__ identifier, you just start a race comparable to trying to sandbox python internally.

And if I understand the goals of this project correctly, this can’t just be ignored. The goal is to allow concurrency without issues, and this kind of issue prevents this I think.

2 Likes

Yes, but realistically how far does that go? If you freeze a then you need to freeze the globals used by a but then those are types from other modules and you need to freeze the globals in those modules and so on. Literally for the particular case of a that I showed above, is a freezable and what is the full set of things that would need to be frozen as part of freeze(a)?

I think that few libraries are going to subclass this NotFreezable class and right now zero libraries use it. In general the idea of superclasses for negated properties is not scalable so a different approach for that is needed.

Now back to my original question:

I think that this would inevitably happen and users would either complain that freeze(obj) doesn’t work or that it seems to work but freeze(obj).func() doesn’t work because the class is broken after freezing. I’m trying to work out what situations would likely lead to either case and what the implications would be if actually trying to make it work or conversely the likely downsides of refusing to make it work.

4 Likes

Haha, this is an excellent example! Thanks!

First, just so we agree on the meaning of a frozen function. A frozen function is not the same as a pure function. For example, if you pass in a mutable object to a frozen function, it is permitted to mutate that object. So this is a perfectly fine frozen function:

def counter(sys):
	sys.counter += 1
	return sys.counter # works if sys isn't immutable

Obviously, this is not the same as your excellent example. The crux in your example is whether it is possible for a frozen function to reach mutable state that was not passed in. The PEP does not permit a frozen function to capture enclosing mutable state (handled by freezing) and throws an exception if a function tries to access globals() directly. Also, in the PEP, immutability is deep, so if sys is immutable above, you can’t traverse it to get to mutable state. But what should we do about imports? The problem is not imports themselves, but the possibility of mutable module state.

Let’s start with what’s in the PEP and then what we envision as part of the future extensions foreshadowed in the PEP.

In the PEP

In the multiple subinterpreters model, each subinterpreter has its own private module state, so if the function is allowed to import the sys module and access module state directly, this is not a race.

In free-threaded Python, this would indeed be a race. In this PEP, I am not so worried about this case. I mean, immutable objects and frozen functions are moving the needle in the right direction — and guaranteeing things in Python is hard!

Looking forward a little

When we add region-based ownership (aka Lungfish – the future, foreshadowed PEP), this race should no longer be possible. The relevant case here is sketched on pages 21 and 22 (slide 16 IIRC) in the presentation we gave at the language summit (https://wrigstad.com/summit-presentation-final.pdf).

With region-based ownership, mutable module state like your counter should be enclosed in one or more regions, and only one thread at a time will be able to access a region’s content. So even free-threaded Python will avoid the data race in your example. If the sys module is not “Lungfish compliant”, it will essentially share a lock with all other non-compliant modules making sure that only one thread at a time accesses state in these modules. This would also avoid the data race.

You can look at sections 5.3.3 and 5.3.4 of this paper that describes the Lungfish design if you are interested in additional details, including freezing of modules that provide only constant immutable state. There’s lots of detail to unpack there for which you may not have the time. But if you do look and you do have questions or find bugs, feel free to – encouraged actually – to reach out.

3 Likes

How? Fundamentally, how can the above code be data race free (in the sense that no two calls to that function every return the same value) without adding a manual lock in the code?

Or are you just defining “thread-safe” in the same way it has always been guaranteed in python: The objects aren’t broken afterwards? Because if so, that isn’t exactly what I understand under fearless concurrency.

To make my thoughts a bit clearer:

Since you are clearly saying that some things will not be lungfish compliant, lets assume the module (in this case sys) is one of those things. Lets also assume I can get a mutable reference to it from inside the function - this is very likely not something you will ever be able to prevent. This means that every access to sys.counter has to be locked. But there are three separate accesses here, two reads and one write that all need to happen at once to be consistent. How will the runtime know to lock all of them together, since otherwise a data race is not guaranteed to not happen?

Ofcourse, the simple answer is “such constructs aren’t thread safe”, but my understanding was that the goal is that after freeze(x), using x is thread-safe (as long as you don’t pass in non-frozen stuff), but this kind of edge shows that that isn’t true, right?

If one has other references to an object that gets frozen, and tries to mutate that, that will raise a runtime error -a nd it may be that the typing system may never be able to account for that, and that is alright. (The other reference could be in another thread, so there is no deterministic “after freeze”) . I see no problem with the typesystem not catching all possible errors that could arise in runtime - either with these objects or anything else.

People rely to much on the typing system. Runtime error possibilities like this will have to be, as it had always been, checked in tests

On the other hand, typing as is, can support writing such code correctly. Maybe one thing that could be shipped along one of the PEPs (or made into another) is a typing marker for objects which might be frozen at any time. That could be helpful for people meddling into code with such references.

def thread_worker(queue: Queue[typing.Freezable[list]], lock: Lock):
    ...
    item = queue.get()
    ...
    with lock:
         if not inspect.isfrozen(item):
              item.append(new_thing)
    ...

.
Then, for 99% of the code which wouldn’t make use of the freezable capability these objects, the typing system could be lax about an object possibly getting frozen at anytime - but when explicitly marked as such with something like typing.Freezable, then it could check all iterations in the block.

(As a bonus, the type checker tools could even infer that a particular instance could be “frozen at any time” by seeing the isfrozen call in the code )

I would be against this. It doesn’t make sense to have an annotation for something that “may or may not be frozen”. (requiring that something isn’t frozen may be worth expressing, but not “might freeze”, only express the constraints your function requires) This is different from the type system not being able to know at any particular time. The program author (or at least the person freezing an object) should only utilize freezing prior to passing things around, there shouldn’t ever be a “might be frozen” case that’s modelable from annotations, the author should always have enough information. (a function that freezes inputs it doesn’t control is pathological and does not make sense)

1 Like

good. Anyway, typing wise, the effect of such a mad thing would be the same of a union type -

Aside typing markups - what about the inspect.isfrozen ? Does it look good?

Agreed but the freeze function is greedy and reaches far from the object that is apparently being frozen. It will easily freeze things that the author does not control unless the object being frozen is very carefully isolated from almost everything in a way that I think many Python programmers will not find easy to understand.

Assuming we can go by the reference implementation I now know the answer to this question:

The answer is yes. The C version of lru_cache cannot be frozen but a Python implementation of it would be frozen and would then be broken. Suppose we have:

def cache(func):

    values = {}

    def cached(*key):
        try:
            return values[key]
        except KeyError:
            val = values[key] = func(*key)
            return val

    return cached

@cache
def square(x):
    return x*x

Then if you make a class that calls this function you can freeze an instance of your class but it will break the square function:

class A:
    def do_stuff(self, val):
        return square(val)

a = A()

print(square(3)) # works

from immutable import freeze
freeze(a)        # seems to work...

print(square(5)) # TypeError: dict is immutable

Now by freezing a you have broken the square function and not just in the sense that it doesn’t work when reached from a. The square function is broken for all callers including those who had no idea that you were going to freeze anything (e.g. the author of the library in which it is defined).

Touching even so much as a single function from a given library might reach the entire library. Unless every single part of that library is freezable (unlikely for any nontrivial library) you can’t even import from the library for risk that any part of it being in the module globals contaminates everything as unfreezable.

What this means is that you have invisible deeply transitive object colouring. The object graph from freezable objects can never reach anything unfreezable. The runtime cannot reliably detect whether things are freezable though (square should not be) and static typing cannot model it either.

6 Likes

I consider this another case of “This isn’t an object anyone should be freezing”, there’s already many caveats documented on lrucache, the combination of which would lead people to understand this:

If a method is cached, the self instance argument is included in the cache

Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.

But this is probably worth addressing more directly. It should be possible to mark individual classes as unsafe to freeze from python, and the python implementation of lru cache use on a method should mark that for the user. Similarly, user-defined decorators that behave like this should as well.

This currently only exists in the draft with:

The type NotFreezable which is an empty type which cannot be frozen and can be used as a super class to classes whose instances should not be freezable

and covering the inverse:

The function register_freezable(type) – which is used to whitelist types implemented as C extensions, permitting their instances to be frozen

I think this demonstrates the utility of being able to call register_unfreezable rather than inherit.

If you want to be able to check for freezability without breaking things then only opt-in mechanisms could work reliably. I am not going to go through thousands of functions and classes and mark them as unfreezable. Neither will the maintainers of many other libraries on PyPI. Unless you envisage something that is limited to a small number of specialised libraries (essentially opt-in at the library level) an opt-out approach will always be unreliable.

Yes, but the point is the action at a distance effect. I am the author of square and it is a function in my library. You are the author of the A class that happens to use the square function. You have no idea whether square uses lru_cache or not and I can change that at any time.

More importantly you may not even be using square directly but rather some other function from which square is reachable through the very far reaching object graph. Potentially me adding the square function to some module breaks your use of completely different functions simply by the new function existing in the object graph even if it never appears in the call graph.

13 Likes

So, upon reading the PEP, when getting to the part of the immutability having to spread to the classes of all instances in a frozen structure (and then to their super and metaclasses), something hit me.

That can’t be quite right - it is too much change on a running process. Even though “well behaved” code will create all their classes statically before putting them to “use” and methods that might freeze data come into action - not all libraries are that static, and that is not the way the language works.

So, keeping things short and simple, and looking at an existing example: the way pickle works:
What if when getting to references to classes and modules, those would actually be changed to a proxy type in the frozen object?

That would require the careful development of a proper “frozen proxy” object for classes (and possibly module instances as well?) - which would , of course, work correctly with isinstance and issubclass calls - but would prevent what I can just visualize as a “catastrophic freezing event” upon the first call to freeze in a process.

Also, such a specialized proxy to a class wouldn’t be a first: super() objects are carefully designed proxies for classes and have worked seamlessly so far.

So, to recap, up to this day, pickle will simply store a reference to functions and classes, and recover these references on unpickling, and that had worked quite well through decades -
I think this might also play nicely for “frozen” objects: from a certain point on the reference hierarchy, a proxy to the original object is stored, instead of the direct reference.

How that proxy would behave upon being retrieved would currently be subject to a lot of bike-shedding, so I propose we check the feasibility up to here first - and later that behavior of the de-referenced proxy could be stated.

1 Like

@oscarbenjamin - could you imagine a way for an approach involving proxy objects for “not frozen along with the structure” to work for the scenarios you describe?

These are good points. I think the proposal is trying too hard to make an object truly immutable when for all practical purposes it is wholly unnecessary.

In almost all legitimate real-world use cases I can think of, barring instance registries and caches, classes and module globals are altered only during initialization. Once they start spawning instances ready for concurrency, modifying classes and modules would be considered monkey-patching and a bad practice.

In other words, I think we should limit the scope of the proposal such that freezing an instance does not freeze its class, freezing a class does not freeze its bases, and freezing a function does not freeze its globals and builtins, at least by default.

To quote dataclasses’s documentation on so-called Frozen Instances:

It is not possible to create truly immutable Python objects. However, by passing frozen=True to the @dataclass decorator you can emulate immutability. In that case, dataclasses will add __setattr__() and __delattr__() methods to the class. These methods will raise a FrozenInstanceError when invoked.

So by the philosophy of a consenting adult we should leave it up to the developer to ensure that best practice is followed, that no monkey-patching business is going on.

I hope this pragmatic approach would make it easier for this proposal to move forward.

4 Likes

I don’t know. There are too many unknowns in this proposal right now. I’m not sure what is envisaged for how this would be used in practice or what limitations would be needed to make it work. Right now the reference implementation is too limited to test this with any nontrivial code because it doesn’t handle any builtin things but also even vaguely nontrivial Python classes that use e.g. classmethod or that reference a module object.

My expectation is that libraries would not be able to support this with existing types and functionality and would need to create new interfaces. For example given that numpy arrays have a shared mutable buffer I don’t think that you could freeze a numpy array within the goals outlined in the proposal. Instead there would need to be a new freezable_ndarray type. But then it is not clear what the advantage of that would be compared to numpy adding an immutable_ndarray type which can be done already and also could be shared across threads safely but without any freezing mechanism needed.

4 Likes

I delved into this problem some time ago and developed a library for it (GitHub - diegojromerolopez/gelidum: Freeze your objects in python). It has some issues (mainly the type hinting should be improved) and I’m missing some basic functionality like freezing modules for example.

My main issue in gelidum was trying to change the object in place, i.e. not making a frozen copy but to change the object without using new memory.

Thank you for bringing this topic up @stw I think this idea should be considered as it has many use cases and I’m very happy to see it discussed here.

3 Likes

I have objections and worries - but this is the greatest change in the horizon in the living threads around. :slight_smile:

So, I’d like to see updates if any.

As far as I am concerned, I keep my last idea floating: not trying to eagerly recursively freeze everything under the Sun, but rather, at least in Python code, replace everything retrieved from an eagerly shallow-frozen object with a proxy, which will present a lazily frozen behavior.
So, one won’t need to freeze sys.set_int_max_str_digits, if one has a method in a frozen object checking sys.version

3 Likes