Python weirdness: can anyone explain to this Python newbee?

JohnHind · November 15, 2021, 5:32pm

def myfunc():
    print("myfunc")
print(myfunc is myfunc)
class myclass:
    def mymethod(self):
        print("mymethod")
    def test(self):
        print(self.mymethod is self.mymethod)
myinstance = myclass()
myinstance.test()

Output:
True
False

So a function is the same as itself, but a method is not? That’s weird!

tiran · November 15, 2021, 5:47pm

self.mymethod returns a bound method object. It’s a callable object that has a referene to self and to the method function object. The bound method object is the magic bit that passes self to the function object, too. The bound method object is created and discard on demand.

>>> class myclass:
...     def mymethod(self):
...         print("mymethod")
...     def test(self):
...         print(self.mymethod is self.mymethod)
... 
>>> myinstance = myclass()
KeyboardInterrupt
>>> myinstance = myclass()

>>> myinstance.mymethod.__func__
<function myclass.mymethod at 0x7f99f4c159d0>
>>> myinstance.mymethod.__self__
<__main__.myclass object at 0x7f99f4d24f70>

>>> myinstance.mymethod.__self__ is myinstance
True
>>> myinstance.mymethod.__func__ is myclass.mymethod
True

>>> bound1 = myinstance.mymethod
>>> bound2 = myinstance.mymethod
>>> hex(id(bound1)), hex(id(bound2))
('0x7f99f4c1c840', '0x7f99f4c1c900')

The method call mymethod(*args, **kwargs) executes mymethod.__func__(mymethod.__self__, *args, **kwargs).

JohnHind · November 15, 2021, 6:35pm

Ah, I see. I notice that an == test DOES work as expected, presumably because the self and func attributes both match?

tiran · November 15, 2021, 6:44pm

That is correct. Two bound method objects are equal when _self__ and __func__ are identical:

github.com

python/cpython/blob/e2b4e4bab90b69fbd3614fc9ce5755edfe184297/Objects/methodobject.c#L298-L322

    
      
          static PyObject *
          meth_richcompare(PyObject *self, PyObject *other, int op)
          {
              PyCFunctionObject *a, *b;
              PyObject *res;
              int eq;
          
          
    if ((op != Py_EQ && op != Py_NE) ||
                  !PyCFunction_Check(self) ||
                  !PyCFunction_Check(other))
              {
                  Py_RETURN_NOTIMPLEMENTED;
              }
              a = (PyCFunctionObject *)self;
              b = (PyCFunctionObject *)other;
              eq = a->m_self == b->m_self;
              if (eq)
                  eq = a->m_ml->ml_meth == b->m_ml->ml_meth;
              if (op == Py_EQ)
                  res = eq ? Py_True : Py_False;

This file has been truncated. show original

In this case == in C is the same as is in Python.

JohnHind · November 16, 2021, 12:20pm

Yes. I still think it is weird that the same name in the same namespace can refer to different objects. That is exposing users to an implementation detail. It could work by caching the object and returning the same one on second and subsequent references, or by redefining ‘is’ to be the same as ‘==’ for bound method objects. It would also help if there was a repr method for bound method objects that identified the ‘self’ and the ‘func’, not just ‘bound method’.

I was trying to define a tuple of method references so I could delegate a call according to an index number. I was passing ‘self’ when I called the reference and getting a parameter count error. I now realize this was because ‘self’ was being passed twice! I am actually coming from Lua and I initially expected functions to be truly first-class objects like they are in Lua i.e. the basic function definition, translated to Python, would be:

my_func = def()
    print("my_func")

Lua has semantic sugar so the more standard form gets translated to the above:

def my_func()
    print("my_func")

the function definition is semantically a literal or constructor, it could be embedded directly in the tuple constructor and the ‘lambda’ form with its restrictions would be redundant since any function definition could be anonymous. As it is, I am having to assign the methods to dummy names and then use reflection to make references to them in a tuple created in the initializer.

steven.daprano · November 16, 2021, 3:31pm

To a first approximation, if you care about object identity, you are
probably doing it wrong. There are exceptions of course, but generally
speaking object identity is not that important, especially for methods.

Why do you care that two method look-ups return the same object or not?
There is nothing in the language that promises that looking up a method
will always return the same object. Methods are wrappers around
functions, and are created on the fly, when and as needed. They could be
cached, but that would likely cost memory, for little or no benefit.

(I’m not sure whether anyone has actually investigated the cost versus
benefit of caching methods, or whether this is just an artifact of a
historical design. But in either case, what we have now is that method
objects are created as needed. Even if that changes in the future, it
doesn’t help you.)

But putting aside the design question of whether method objects should
be cached or not, you seem to have some misconceptions. Firstly, you
asked:

“a function is the same as itself, but a method is not?”

Methods are identical to themselves, like every other object:

a = instance.method
a is a  # True

What differs is that the method lookup instance.method may or may not
return the same object each time. There is no language promise either
way.

If you care about the nuts and bolts, you might like to read about the
descriptor protocol, which is the machinery behind a whole lot of
stuff in Python, such as properties (computed attributes), super(), and
various flavours of method.

https://docs.python.org/3/howto/descriptor.html

Descriptors are considered pretty advanced stuff, although perhaps not
as advanced as metaclasses. YMMV.

You also implied that Python functions are not “truly first-class”
objects, as in Lua. They are. Like every other object, they have a
class, they have identity, state and behaviour. You can pass them as
arguments, and return them from function calls. You can create new
functions on the fly. Aside from the syntactic forms (def statements and
lambda expressions), there is a FunctionType with a constructor, and
with a bit of work you can create new functions using that.

What are the missing, to be considered first class citizens?

You suggested:

“I am having to assign the methods to dummy names and then use
reflection to make references to them in a tuple created in the
initializer.”

I’m not really sure I understand what you are doing there. But to the
degree that I understand it, it sounds like you are making something
which is very easy far more complicated than it needs to be. Perhaps if
you show some code, we can be more helpful.

As far as methods, here are two ways to use them. Try these, and feel
free to ask any questions you may have.

a = []
b = []
bound_methods = (a.append, b.append)
bound_methods[0]('hello')
bound_methods[1]('world')
print(a, b)

# prints ['hello'] ['world']

Bound methods already know the instance that they apply to, so you don’t
need to provide an argument for the “self” parameter.

Or you can use unbound methods, which are actually just the raw,
unwrapped function object.

unbound_methods = (list.append, list.reverse)
unbound_methods[0](a, 101)
unbound_methods[0](b, 202)
unbound_methods[1](b)
print(a, b)

# prints ['hello', 101] [202, 'world']

Unbound methods don’t know what instance to operate on, so you have to
explicitly pass an instance which will be bound to the “self” parameter.

tiran · November 16, 2021, 5:01pm

This would increase memory usage and slow down instance deallocation. The bound method paths in CPython are performance critical and the code highly optimized.

tiran · November 16, 2021, 5:12pm

The concept of unbound methods may be confusing to Python 3-only users. It was a concept from Python 2 times. Python 3 no longer has unbound methods for user-defined classes. I removed them 14 years ago.

class myclass:
    def mymethod(self):
        print("mymethod")

Python 2:

>>> myclass.mymethod
<unbound method myclass.mymethod>

Python 3:

>>> myclass.mymethod
<function myclass.mymethod at 0x7f1d4cbf6d30>

Unbound methods only exists for types and functions that are implemented in C.

>>> list.append
<method 'append' of 'list' objects>
>>> list().append
<built-in method append of list object at 0x7f1d4cfbea00>

JohnHind · November 16, 2021, 5:38pm

Here is where I have landed:

class Mech:
    def __init__(self):
        self.inputs = tuple((getattr(self, n) for n in dir(self) if n.startswith("__S") and not n.endswith("__")))
        if len(self.inputs) < 1 or not all(callable(f) for f in self.inputs): raise TypeError("Derived class must define runnable state functions")
        self.state = 0
    def __call__(self, *pargs, **nargs):
        bm = self.inputs[self.state]
        ns = bm(*pargs, **nargs) # NOTE: bm is 'bound_method' (a closure containing the self parameter)
        if type(ns) is not int or ns < 0: ns = len(self.inputs)
        if ns < len(self.inputs):
            self.state = ns
            return
        raise RuntimeError(f"Error:{ns}")

class KeyMech(Mech):
    def __S0(self, i):
        print(f"State0 {self.state} {i}")
        return 1
    def __S1(self, i):
        print(f"State1 {self.state} {i}")
        return 0
    
KB_Mech = KeyMech()
for i in range(0,10): KB_Mech(i)

Mech is a generic base for state machines. The methods in KeyMech define the states for a specific state machine. Calling the instance with parameters updates the machine by vectoring the call to the specific input method for the current state and that method returns the next state or an error number.

In Lua, I simply defined the input functions anonymously in the initializer and they were stored directly and solely in a Table (acting as a List). The solution above has an order problem: the pseudo-names of the methods in the derived class have to follow a format so they get selected and alphabetically sorted rather than just being entered into the tuple in the order in which they are defined.

I will look into the FunctionType constructor as suggested above to see if that suggests a better way.

steven.daprano · November 16, 2021, 6:37pm

Please don’t waste your time looking at FunctionType. It is irrelevent
to your problem. I mentioned it only in response to your implied
complaint that functions aren’t first class citizens in Python (unlike
Lua). There is nothing you can do with FunctionType that you can’t do a
thousand times more easily with def or lambda.

I don’t have time to run your code right now (perhaps later in the day)
but at a glance, I expect that you are running into double underscore
name mangling issues. Do you need your methods to be flagged with
leading double-underscores? They’re usually more trouble than they are
worth. Unless you have a really good reason for them, it is best to take
them out. And if you are not subclassing your KeyMech class, it is hard
to imagine how it would be useful at all.

Try changing the double-underscore to a single:

__S0 --> _S0

and adjust your code to suit, and you will probably see some different
behaviour.

JohnHind · November 16, 2021, 10:33pm

No need to fix the code - it works fine. I was just hoping for an approach that let me write the functions directly into a list (tuple) without having to write them to their own names first and worry about preserving their order.

At the moment, KeyMech is only test code - in the real application there would be a lot more states and the code for each state would be much more complex (so lambda is far too restrictive). Mech is the library code and KeyMech the application code. I also removed some bells and whistles from Mech (state names and error texts, ability to specify the starting state).

It has just occurred to me that rather than try to hide the functions after references to them have been put in the tuple I should just use delattr() to remove them altogether. I’ll try that tomorrow!

steven.daprano · November 16, 2021, 10:56pm

I just tried running your example code and it raises TypeError. Are you
sure its fine?

>>> obj = KeyMech()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
TypeError: Derived class must define runnable state functions

Why do you want to hide the functions (do you mean methods?)

Crimonk · November 17, 2021, 9:23am

File “exe”, line 5
for loop in range(c):
…___________… ^
SyntaxError: invalid syntax

I dont undertand what is invalid

TeamSpen210 · November 17, 2021, 11:09am

There is a way to get definition order - the problem is that dir() is really intended more for humans to read, so it automatically sorts the list. If you use vars() instead to get access to the class dictionary, that will be ordered by definition order as of Python 3.6. You might want to have a look at __init_subclass__(), it’s a classmethod called whenever a class gets subclassed. You could use that to gather up the state methods into a dict.

Alternatively, is there any reason your states have to be integers? You could just use the methods themselves to store the state - simply do self.state = self.first_state in __init__(). Then you could use the equality functionality mentioned to do the checks you need. If you don’t want the small cost of building methods, you could use strings also.

JohnHind · November 17, 2021, 12:04pm

Yea, you are right. I tested it in CircuitPython on a microcontroller were it worked fine. In Python 10 on a PC, the methods are not found because the names are mangled (revealed by a print(dir(self)) at the top of the init). That is quite nice because it enables me to select just the methods defined in KeyMech not the ones inherited. But no good because it does not work on the microcontroller!

JohnHind · November 17, 2021, 12:24pm

The problem with making the methods identities the states (and storing the current state as a reference to a state method) is that the state methods would have to return a reference to another state method instead of an index. Where would that come from? Using an index also allows me to store other data about states (e.g. a friendly name for display) in another tuple.

vars() is a good tip, or the underlying dict attribute, if I can definitely rely on definition order being preserved. I actually went the other way so I could use sorting as a way of controlling the order! As for init_subclass(), I went for a different way of doing this: I make ‘inputs’ a derived class attribute and only create it in the init if it is not already there.

UPDATE: vars() seems to work as you suggest on Python 3.10, but on CircuitPython it does not exist. I tried KeyMech.dict. This gave the same result as vars() on Python 3.10, but does not preserve definition order on CircuitPython. What a pity - that would have been really neat!

TeamSpen210 · November 17, 2021, 8:22pm

Needing to return a reference to the other methods isn’t a problem at all, they can just do self.other_method. The functions are all called after the class is fully defined, so they all exist by that point to reference. For friendly names, you could say make a dict mapping the method __name__ to that, or store it as an attribute (maybe via decorator).

For definition order, CircuitPython is based on MicroPython which implements Python 3.4/5, so dicts aren’t ordered by default. What you have to do is use a metaclass to swap the dict used when constructing the class:

from collections import OrderedDict
class MechMeta(type):
    @classmethod
    def __prepare__(mcls, name, bases):
        return OrderedDict()
    def __new__(mcls, name, bases, ns):
        """This is the constructor of the actual class."""
        cls = super().__new__(mcls, name, bases, ns)
        cls.states = [name for name in ns if name.startswith("state_")]

class Mech(metaclass=MechMeta):
    ...

Since this is rather advanced I’m unsure if MicroPython will fully implement this, but it’s how this is done in 3.5.

steven.daprano · November 18, 2021, 1:35am

Mystery syntax errors which look correct can be caused by (at least) two
things:

the presence of invisible control characters or Unicode characters in
the source code.
or a missing close bracket (square bracket, round parentheses or
curly brace) on the previous line(s).

Look for a missing ) ] or } on the previous lines. If you can’t
see any, you might need to open the file in a hex editor and look for an
invisible control character between the “range” and the “(”.

Or just delete the line and re-type it, and see if the error goes away.

JohnHind · November 18, 2021, 9:40am

Many thanks for all your help, Spencer! The penny dropped about using the method references as state identities directly shortly after I switched the computer off last night, and I agree this is a much superior way of doing this.

I’ve learned a great deal about the fiddly bits of Python with this! It is not quite as close to Lua as I’d hoped and I am beginning to miss Lua’s minimalistic elegance. The inability to embed a method definition in the constructor of a data structure or a function call (except the highly restrictive lambda construct) seems like a real limitation and I still maintain this disqualifies Python functions as being fully first-class. You should be able to define (construct) a function anywhere a function reference is valid like you can with any other value.

The automatic enclosure of method references also seems a bit over the top. Lua simply does this as a pre-process:
self.mymethod(*pargs, **nargs)
gets translated textually to:
self.mymethod(self, *pargs, **nargs)
This covers 99% of cases, and you can make a closure explicitly if you need to pass a reference around.

JohnHind · November 18, 2021, 6:02pm

Just in case anyone needs closure, here is what I have finally landed on:

class Mech:
    def __init__(self, init_state):
        self.get_name(init_state)
        self.state = init_state    
    def __call__(self, *pargs, **nargs):
        ns = self.state(self, *pargs, **nargs)
        if callable(ns):
            self.state = ns
        else:
            raise RuntimeError(f"Error: {ns}")
    def get_name(self, state):
        if callable(state):
            for k,v in type(self).__dict__.items():
                if state is v: return k
        raise RuntimeError(f"State must be a method of {type(self)}")

class KeyMech(Mech):
    def first_state(self, i):
        print(f"{self.get_name(self.state)} {i}")
        return KeyMech.second_state
    def second_state(self, i):
        print(f"{self.get_name(self.state)} {i}")
        if i > 8: return "a bad thing happened"
        return KeyMech.first_state
        
KB_State = KeyMech(KeyMech.first_state)
for i in range(0,10): KB_State(i)

A bit constrained by the limitations of CircuitPython and I’m not completely happy with the efficiency of ‘get_name’ which both validates a state and returns its name, so as a compromise it is not routinely used on state transitions.