C++ style pass by reference

I propose specifying a function parameter as being pass-by-reference, using the notation “&name” instead of “name”.
The actual argument can be any valid assignment target: a variable, an attribute, a subscript, or even a “&name” in another function.
Any references to “name” in the function (get, assign, delete) will have the same effect as the same references to the actual argument in the caller.
Attributes and subscripts are more involved than simple variables, and so I consider this an optional part of my proposal. I would be pleased to be able to modify a variable by passing it by reference to a function.

Here’s an example where I would find this useful, in an interactive session:

>>> from click import edit
>>> def update(s: str, maybe: bool = True) -> str:
...     return edit(s) if maybe else s
...
>>> s = "hello"
>>> s = update(s)
>>> s = update(s, maybe=False)   # returns and assigns s unchanged

There is extra coding work here. The update() function has to be sure and return a new value in all cases, even if s isn’t changed. And the result of update(s) has to be assigned to s. These are both places where a coding error would be easy to make.

With my proposal, it would look like this:

>>> from click import edit
>>> def update(&s: str, maybe: bool = True) -> None:
...     if maybe: s = edit(s)
...
>>> s = "hello"
>>> update(s)
>>> update(s, maybe=False)   # No assignment to s

Here the update() function doesn’t have to do anything if it decides no to modify s. And the call to update() doesn’t have to explicitly change s.

Here’s another example which I and many others would find useful.

def safe_delete(&obj: Any) -> None:
    try: del obj
    except: pass
x = 1
del x
del x      # NameError
x = 1
safe_delete(x)
safe_delete(x)    # OK

Examples using attribute or subscript targets:

@dataclasses.dataclass
class A:
    str s = 'hello'

a = A('foo')
update(a.s)
print(a.s)    # new string provided by update()
d = dict(s='foo')
update(d['s'])
print(d['s'])    # new string provided by update()

Implementation

In update.__code__:

  • s is stored in the frame as a (Cell *) rather than a (PyObject *), as though it were a free variable or a cell variable. There is no MAKE_CELL bytecode
  • References to s would use LOAD_DEREF, STORE_DEREF, and DELETE_DEREF bytecodes. s does not appear in update.__closure__.

In the caller:

  • The def update statement is implemented in the usual way. The parameter s is flagged in the symbol table as CELL | DEF_PARAM | DEF_BOUND, with a new flag DEF_BYREF added. This will prevent emitting a MAKE_CELL bytecode and tell the caller to pass s as a Cell object.
  • The variable s is flagged as CELL, just as though it were captured as a free variable in some enclosed scope.
  • The call to update(s) uses LOAD_CLOSURE to put the Cell for s on the stack as an argument. New bytecodes will be required to put a Cell subtype on the stack for an attribute or subscript target.

Attributes and Subscripts

These entail defining new types in cpython similar to the cell class. They should behave as cell objects, by subtyping the PyCell_Type type. The ob_ref member is reused to point to the object whose attribute or subscript is to be accessed.

PyCell_Check() will will check first for PyCell_Type, then check if the type’s base type is PyCell_Type. This is as fast as currently implemented when the object is a PyCell.

PyCell_Get() and PyCell_Set() will need to do different things for these new types. They will check for PyCell_Type exactly, so they are as fast as currently implemented when the object is a PyCell. When this check fails, it will execute appropriate code for the actual subtype.

The get/set functions for cell_contents will be different for each type, found via the type object.

The PyAttrCell_Type type (or whatever you want to call it), would hold a reference to a name (string). It will get/set/delete the named attribute of the ob_ref, with the usual exceptions.
The PySubscrCell_Type type will hold a reference to a subscript. It will get/set/delete obj_ref[subscript], with the usual exceptions.

How does the caller know which function is being called? Consider:

def regular_function(obj):
    print("Hi I'm a regular function", obj)
    obj = 42

def safe_delete(&obj):
    try: del obj
    except: pass

if random.randrange(2):
    regular_function, safe_delete = safe_delete, regular_function

def test():
    x = 1
    safe_delete(x)
    safe_delete(x)

How should test() be compiled? In normal closures, the outer function is compiled differently due to the presence of the inner function, thus allowing the sharing of variables between scopes. In this case, there’s no way to know whether x might be passed as a cell or a normal reference (by the way, the term “reference” is going to cause a lot of confusion, since in Python it always means “reference to an object”, but in C++, it means “reference to a variable” and is what you’re trying to create here).

Most programming languages that have pass-by-reference (BASIC, Pascal, C++ ) use them for two main purposes:

  • to save memory when passing large values, like arrays;
  • output parameters.

Python doesn’t need either of those. We don’t have pass-by-value, so passing a large array (list, dict, etc) doesn’t make a copy. And we don’t need output parameters because we can just return a tuple of multiple values.

We could use pass-by-reference to implement procedure-style subroutines that operate by side-effect. At the moment we can only do that with mutable objects. Pass-by-ref would allow us to do that with any value, mutable or not.

It would allow us to write a swap(a, b) procedure, instead of having to repeat the references:

spam.eggs[cheese], aardvark.hovercraft = aardvark.hovercraft, spam.eggs[cheese]

But… operating by side-effect is generally considered to be, if not an outright anti-pattern, at least a code smell. Do we really need more of it? Explicit assignment is arguably better.

Personally, I think that the idea of having two additional calling conventions (pass-by-ref and pass-by-value to automatically copy values passed in) would be kinda cool.

But I doubt that they would actually be useful enough to justify the added complexity.

3 Likes

Since Python already passes objects by reference, you could solve the problem like this:

from click import edit
from dataclasses import dataclass
from typing import Generic, TypeVar

T = TypeVar("T")

@dataclass
class ref(Generic[T]):  # a generic reference
    value: T

# example from above:

def update(s: ref[str], maybe: bool = True) -> None:
    if maybe: s.value = edit(s.value)

s = ref("hello")  # create a reference
update(s)
update(s, maybe=False)

# access result
print(s.value)

Though, ref should be implemented in C to be faster.

2 Likes

Pass-by-reference values are a bad idea. They lead to side-effects and non-localities that are error-prone and difficult to debug.

If a function must modify its parameters, embed them in an object, a dict, or a list, and pass that.

This is risky enough:

def modifyparam(param):
    param[1] = 'Hello'

param = list(range(4))
modifyparam(param)
print(param)
1 Like

I see my proposal is flawed, in that

  1. Calling a function does not give the compiler information about any by-ref parameters. The callable expression could evaluate to anything. Even if the function is called simply by its name, who knows if that name has been redefined.
  2. Having side effects of the function call can be surprising. Any callable might modify one of its arguments, without warning.

I still like the idea of having a function modify a variable in the caller’s namespace. However, we need to make it explicit at the point of the call that the argument might be so modified.

C++ doesn’t have this problem because the compiler, and the programmer, always know what function is being called.

So let me try again.

Revised Proposal

We will define a class called (for lack of a better name) TargetProxy. This is something which, like types.CellType, is normally created by the interpreter.

TargetProxy Objects

A TargetProxy is created by applying a new operator, unary &, to a target. The target can be any target used by an assignment statement. That is, a plain variable name, a subscription, or an attribute. The proxy is an ordinary Python object, which can be passed around.

For a proxy targ = &(target), the behavior is as follows:

  • targ may or may not have a current value at any given time. The value can be bound, rebound, or unbound.
  • targ.value is the current value, if any, of the target.
  • targ.value = expr changes the current value of the target to expr.
  • del targ.value removes the current value, if any, of the target. It will raise an exception if there is no current value, the same exception as though executing del target.

With this, we can write and call the safe_delete() function as follows:

def safe_delete(targ: TargetProxy) -> None:
    try: del targ.value
    except: pass

x = 1
o = object()
o.spam = 3
d = dict()
d[3] = 42

safe_delete(&x)
safe_delete(&o.spam)
safe_delete(&d[3])

# x, o.spam, and d[3] are all gone.
safe_delete(&x)
safe_delete(&o.spam)
safe_delete(&d[3])
# Still all gone, and no exceptions raised.

TargetProxy Function Parameters.

As an additional enhancement, any function parameter can be declared as &arg.
This will tell the compiler that the actual argument is a TargetProxy object. Actually, it could be anything which has an arg.value attribute, or descriptor.
The compiler will convert

  • The expression arg to arg.value.
  • The statement arg = expr to arg.value = expr.
  • The statement del arg to del arg.value.

Matching arguments and parameters.

If a program calls a function with &targ and the function is not expecting a TargetProxy, then the function will just use the TargetProxy object rather than its wrapped value. This would be no different from other methods of modifying a target in the caller, such as passing an array to be filled in with a result which the caller would then used. If the function is not expecting this mechanism, then of course the caller would get the wrong results.

If a program calls a function with a plain target and the function is expecting aTargetProxy, then when it references arg.value, this would raise an exception.

Basically, the caller and the function need to be in agreement about the use of TargetProxy. Otherwise, the caller will either get the wrong result or will get an exception raised.

Implementation

I am assuming that the code to implement &(target) and &arg in cpython will be straightforward. At this point, I just want to see if there’s interest in the proposed feature.

If x, o.spam, and d[3] are “gone”, wouldn’t these lines raise NameError (or KeyError in the case of d[3]) before safe_delete is even called?

Why not just call del x, o.spam, d[3] directly?

If x no longer exists, &x will raise NameError.

Not a safe assumption.

What is the purpose of this?

In other languages, such as Pascal, pass-by-reference has two main purposes:

  • To save memory by avoiding needing to copy large data structures when passing them to a function.
  • To allow output parameters.

I presume C++ has the same purposes.

The first is unnecessary in Python. Python does not copy values when passing them to a function.

The second is of very little purpose in Python. We can return two or more values from a function, which covers 99% of the use-cases for output parameters. For the remaining, it is trivial to pass a list and have the function store any extra output inside the list.

I recently had need to do this myself:

def func(arg, output=None):
    result = process(arg)
    if output is not None:
        assert isinstance(output, list)
        output.append("some extra information")
    return result

Of the top of my head, I can’t think of anything that we can do with pass-by-reference that we can’t already do with existing Python.

1 Like

I think the main point here is that most people don’t agree with you. In general Python programmers don’t want functions that can modify variables in the caller’s namespace. The reasons are complex, but are mostly rooted in the fact that Python doesn’t actually have “variables” in the sense that C/C++ does, it has “names”, which serve the same sort of purpose, but are subtly different. So there’s a lot of potential for talking past each other here - but the basic fact is that functions altering the “variables” passed to them is not something Python programmers tend to want (except when trying to write code that follows the conventions of another language in Python).

That would be a way of doing this, but it’s not really worth debating the syntax unless people agree with the basic idea.

3 Likes

At this point, function calls and arguments are irrelevant, so let’s leave them out of the proposal altogether.

A “target” in Python, currently, could be:

  1. A namespace and a name
  2. An object and an attribute name
  3. An object and an item key

There might be some value in having compiler support for sending a closure cell to another target, but I’d have to see some good use-cases before really judging it. The other two are perfectly well handled with a simple wrapper, since we can already do this. Here’s a TargetProxy class:

class TargetProxy:
    def __init__(self, mode, ns, key):
        self.mode, self.ns, self.key = mode, ns, key
    @property
    def value(self):
        if self.mode == "attr": return getattr(self.ns, self.key)
        elif self.mode == "item": return self.ns[self.key]
        # closures need compiler support
    @value.setter
    def value(self, newval):
        if self.mode == "attr": setattr(self.ns, self.key, newval)
        elif self.mode == "item": self.ns[self.key] = newval
    @value.deleter
    def value(self):
        if self.mode == "attr": delattr(self.ns, self.key)
        elif self.mode == "item": del self.ns[self.key]

Is there justification for having a magical compiler construct that converts &foo.bar into TargetProxy("attr", foo, "bar") ? I won’t say it’s impossible, but I’m dubious, given the uncommon use-cases and the fact that you could easily do it yourself in this way.

Doing it for “namespace and name” would be a bit harder. The best I can come up with is this:

def foo():
    spam = "ham"
    # hidden internal stuff:
    def _get_spam():
        return spam
    def _set_spam(val):
        nonlocal spam
        spam = val
    def _del_spam():
        nonlocal spam
        del spam
    _spam_proxy = TargetProxy("closure", (_get_spam, set_spam, _del_spam))
    # end internal stuff
    some_func(_spam_proxy)
    # which is like "some_func(&spam)"

I’m sure you could hide all the internal details with MacroPy. The trouble is, this is still an incredibly rare use-case and one that hardly justifies the level of work needed for it to function.

I wouldn’t assume that. But I also don’t think there’s all that much interest.