`__assignment_to_self__`, `__meta__` dunders, `meta()` magic-method, `pureread()`, `purewrite()`, `purefunc()` builtins

PEP for __assignment_to_self__, __meta__ dunders, meta() magic-method, pureread(), purewrite(), purefunc() builtins

Impetus and Introduction

I tried writing a ProxyType class, and an implementation using it which had the goal of fully transparently having an on-disk and in memory representation of a data structure as .yaml. I found a number of stumbling blocks which I believe can be used to explore ways to improve the language. I’m first going to establish the technical basis for these. Afterwards, I’ll attempt to sufficiently prove their merit by covering the value which they could bring. Lastly, I will propose concepts for how they could be implemented.

Technical Basis

The __assignment_to_self__ Dunder and the Self-Assignment Problem

One of the first issues I first ran into is the root-assignment problem, which quickly scales to the self-assignment problem. If making a ProxyType, it is impossible to assign values to the proxied value inside the ProxyType itself. Instead, the ProxyType is replaced. This is readily expected and normal behavior when considering plain types. However, if we want a true ProxyType then we need a way to allow delegation of the assignment to the ProxyType itself to allow for proxying the assignment.

The __meta__ dunder, meta() builtin, and the Frequent Need of Higher-Level Programmatic Communication

My current work with Django and thinking on the ProxyType has made me realize that there is frequently a critical, but unfortunately abstract, use-case for types to describe themselves or augment the way the object behaves. For Django’s need, I’ll defer to the Model types use of _meta. For a ProxyType, there needs to be a standard and regular way to augment at runtime how to proxy. I am reserving the justification for the next session. This section only establishes that there is a strong pattern which ought to be more formalized.

Enabling optimizations by detecting function purity with pureread(), purewrite(), and purefunc() builtins

There has long been want of a way to detect degrees of function purity. Purity is hard to introduce into the typing system as in many ways it is a characteristic or meta-attribute which doesn’t map cleanly to the typing system. Rather, by establishing relation about external state to a function’s execution, we can characterize functions. The basic way to do this is to establish whether the function does or does not perform a read or write from external state. pureread() reports True iff the function only reads data sent in through it’s parameters. purewrite() reports True iff state external to the function is not possible to alter external state (relevant to parameters like lists). purefunc() operates as return pureread(f) && purewrite(f) to reduce boilerplate.

Value and Merits

This section attempts to establish sufficient value for each of these changes through high-level use-case examples. It is not meant to be comprehensive and is surely going to be the most critical section for debate. Rather than be thorough to an extreme degree, I intent to expand on this through follow-up discussion. I believe doing so is more efficient and robust.

Use-case example for __assignment_to_self__

A more extreme use-case for ProxyType is for a state object which is file-backed and synchronized with many different processes across a compute grid utilizing Raft v2. The power of a system which can recover from crashes from file and keep in sync with a massive compute grid by sharing an implementation of ProxyType which behaves exactly like a standard type (i.e. dict) allows ease of development of these systems in a way currently syntactically impossible. This is ostensibly an argument for the ProxyType, but without being able to handle such simple operations as v = "a" through a dunder such as __assignment_to_self__, then this category of functionality is impossible. I believe this to be one of the most contentious proposed changes because it is the most likely to be seen as introducing unexpected behavior.

Use-case example for __meta__ dunder and meta() builtin

The Django Model type is an easy example for handling metas in a standard way. Similarly with the aforementioned grid compute example utilizing a ProxyType which would need to handle updates in network configuration. There are doubtless other major examples. There is an already established need and similar usage. I believe it is time to consolidate and standardize.

Use-case example for pureread(), purewrite(), and purefunc() builtins

functools.cache would immediately benefit for correctness by being able to test for function purity with purefunc(). The need that I’ve discovered for pureread() and purewrite() stem from the experience I gained with the ProxyType. The ProxyType needs specific detection for locking needs. If a proxied function is a pure function, it does not need locking. If a proxied function needs to read external state, it needs a read lock since there can safely be multiple readers. If a proxied function writes to external state, readers must be blocked and it must obtain a write lock. I believe this will be the strongest changes to propose in terms of acceptability.

Conceptual Implementation

This section proposes a way to go about implementing these proposals conceptually. These proposals are not sacred. They are merely a way. They DO NOT relate to merits.

An implementation for __assignment_to_self__.

The = operator would have to change from an intrinsic to a magic-method. This would be the same as += redirecting to __iadd__(). The implementation would likely be somewhat invasive to primitive types, but is simple.

An implementation for __meta__ dunder and meta() builtin

In order to allow for operating on objects and types with respect to __mro__, an implementation utilizing .meta(*args, **kwargs) on a variable is not viable. This is similar to how we think about and use getattr() and setattr(). To follow with those methods, I propose type.meta(subject, *args, **kwargs) for using alternate implementations (presumably implementations in __mro__) and meta(subject, *args, **kwargs). The former will call the specified type’s __meta__ while the latter uses the subject’s __meta__.

An implementation for pureread(), purewrite(), and purefunc()

While expected to be the least controversial, this might also be the largest actual change. This will require expanding both the inspect module and the codeobject section of the datamodel. During the parsing of functions, parameters need to be marked as copied. Then, statement by statement, each is marked for reads and writes and whether the read or write is on a variable which is tagged as copied or not. Nested calls are recursively assessed on similar criteria. On finishing parsing, the entire function itself is tagged as being pure for reads and pure for writes. To address functions which conditionally are pure for reads and pure for writes, tracing for possible values in parameters in recursively assessed functions will allow for more precise reporting on purity. This set of changes really works against types which are pass by reference. To address this aspect, because otherwise passing list’s or dict’s would unexpectedly alter purity, a function qualifier of either pure or impure could be introduced to explicitly allow old behavior with being able to alter state through parameters which are passed by reference with the default changing to COW or visa versa.

My preference from experience of least surprising behavior is to change all pass by reference to be COW and introduce the impure keyword to restore the current exact parameter behavior. It is important to note that, even with full COW, objects which have functions on them which fail purity tests will still cause the function they’re used on to fail purity tests if those functions are called.

I changed the category to Ideas, since “PEPs” is for discussion of existing PEPs.

I don’t understand most of what you are proposing, but this change alone seems like a non-starter. It completely changes the basic semantics of assignment.

1 Like

How can I make this more clear? What is difficult to understand for you?

I do understand about the = being a red flag. I expected as much. But I believe there is sufficient merit to not dismiss it out of hand.

It is such a complete and drastic change that, no, that’s not going to happen. Fortunately though, you can get around this with one additional level of indirection. You can’t hook x = 1 but you can hook somefile.x = 1; if you wanted to have a class representing your YAML file, its root would simply be another attribute.

I’m not sure why all of these unrelated feature requests are treated as a single proposal, but the pure*() checkers are theoretically possible, as long as you’re okay with them being implemented conservatively (that is to say, a True return means it really is pure, but a False return might mean that it can’t be sure). Check out the dis module and explore the way Python bytecode works; if the function does any LOAD that isn’t LOAD_FAST, it’s not pure. Unfortunately, there’s no way to differentiate between pureread and purewrite here, since any operation on an object from global state could potentially change that state.

These were ‘imperfections’ I ran into during some work. This is to get these out instead of getting something perfect for this forum.

Using attribute access is possible for a ProxyType. However, should that proxy be passed somewhere else truly expecting an original type, then a malfunction is likely to occur, negating the premise of a ProxyType. Especially, as I found, recursive wrapping of elements through the use of the same ProxyType does not work as well at it would if = were a magic-method. Such a change is non-breaking for existing code so I really would appreciate an explanation of how such a change would be problematic.

That may be current infrastructure for purity stuff. I would like something more like what I proposed because I believe it better aligns with what users would need and want. I’ll take the time to read into the dis module. As I mentioned, it would likely be the largest code change.

The abstraction will ALWAYS leak. The proxy is its own object, and for example, x is y will always check the objects themelves and not anything they refer to. That doesn’t mean that abstractions are useless, of course, but they’re always finite.

This is true, but not my intent or point of contention. The behavior is what I want to keep focus on. Should is be a source of deviation, then I’ll argue that it, too, deserves related magic-methods.

I agree with @Rosuav that changing = (and is, too) would be such a drastic change that it would be a non-starter. And likely to kill performance of code that doesn’t need this feature.

1 Like

How are they drastic changes to users? I agree that there would be a harmful performance impact. What I keep seeing as responses read as knee-jerk responses for the reason that it touches something deep about the language but nothing past that. I really do want to know why, materially, such a change would be undesirable.

Do you have any thoughts about the other proposals?

They’re drastic because you can look at existing Python code and reason about what it does. Object identity, including assignment, is fundamental to knowing how Python works. You could argue that overriding augmented assignment already requires knowing what’s been overridden, but that’s not nearly as important or widespread as regular assignment. You can call this a knee jerk reaction, or you can call it 25 years of experience with Python. Changing = would be like wanting to override id().

No, I stopped reading at __assignment_to_self__. If the others aren’t related, they should be considered separately.

1 Like

I guess you mean other than the “harmful performance impact”.

A variable initially has no value, so the first assignment would work as it does today. Then if I understand your proposal, the next assignment would use __assignment_to_self__ if it existed for the object. This is confusing.

For example, if you had a list of these proxy objects, imagine this code:

for p in proxies:
    ...

First p would be assigned to the first proxy object. But the next iteration of the loop would assign the second proxy to p, which is the first proxy, so it has the special method. What would happen? Is it what you want to happen? It seems error-prone to me.

The behavior I just described is what would happen if this were the first use of p in the function. But imagine you have two of these loops:

for p in first_proxies:
    ...
for p in second_proxies:
    ...

In the first iteration of the second loop, p is the last value from first_proxies, and you’d be running:

first_proxies[-1].__assignment_to_self__(second_proxies[0])

Your proposal changes the current simple semantics of assignment and makes it confusing.

4 Likes

I can see why from this description it would become a problem. To restate, it is because the behavior of nested assignments become wildly complex computationally and could have counter-intuitive behavior. Thank you for explaining that.

1 Like

Thanks for listening. Many people in your position have a hard time letting go of their proposal.

2 Likes