NumPy braodcasting behavior with np.fromfunction

damien_mattei · March 15, 2023, 9:15am

hello

i do not understant why i do not get a 2x2 matrix in this example:

import numpy as np
np.matrix(np.fromfunction(lambda i,j: 7, (2,2)))
matrix([[7]])

when i do that it is ok:

np.matrix(np.fromfunction(lambda i,j: i + j, (2,2)))
matrix([[0., 1.],
        [1., 2.]])

regards,

Damien

damien_mattei · March 15, 2023, 9:40am

what sounds strange is that if i do that, the second example worked:

np.fromfunction(lambda i,j: 7.0, (2,2))
7.0
np.fromfunction(lambda i,j: i - i + 7.0, (2,2))
array([[7., 7.],
       [7., 7.]])

must i write silly expressions like i - i + 7.0

to make it work!!!

seems if at least one parameter is in the lambda body it works.

damien_mattei · March 15, 2023, 9:52am

solution is to use vectorize:

np.fromfunction(np.vectorize(lambda i,j: 7.0), (2,2))
array([[7., 7.],
       [7., 7.]])

but i still do not understand

franklinvp · March 15, 2023, 12:35pm

np.fromfunction gives to lambda i,j: 7 the unpacking of the input

array([[[0., 0.],
        [1., 1.]],

       [[0., 1.],
        [0., 1.]]])

It returns 7.

However, lambda i,j:i+j will sum the two component of that input giving

array([[0., 1.],
       [1., 2.]])

This is what fromfunction does for your input

args = indices((2, 2), dtype=float)
return function(*args,)

where function is the lambda in each case.

damien_mattei · March 15, 2023, 2:23pm

thanks.

it was a matter of chance i get the expected result in 1 case over 2

bverheg · March 16, 2023, 9:13am

This looks like an optimization in Python, replacing the function with a constant value no matter what the argument is:

>>> func1 = lambda i: 7 + 0 * i
>>> func2 = lambda i: 7
>>> func1(1)
7
>>> func2(1)
7
>>> func1([1,2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
TypeError: unsupported operand type(s) for +: 'int' and 'list'
>>> func2([1,2])
7
>>> func1('zorro')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> func2('zorro')
7

and thus with a numpy array the unexpected result:

>>> func1(np.array([1,2]))
array([7, 7])
>>> func2(np.array([1,2]))
7

I would consider this a bug.

franklinvp · March 16, 2023, 1:40pm

That is not what is happening.

In the case of func1 the functions * and + are define how to handle arguments of type int and numpy.ndarray. In the case of 0 * np.array([1,2]) it results in np.array([0,0]), while in the case of 7 + np.array([0,0]) it results in np.array([7,7]). What numpy did was to broadcast the operands to arrays of a common shape. See the, for example, the case of add, where they mention what is done when the shapes of the arguments are different.

The definition of func2 is to return 7 for all inputs. So, it returns 7 for all input, as long as it is called with the appropriate signature of one positional argument.

bverheg · March 16, 2023, 4:38pm

That’s exactly what I meant: the function always returns 7, no matter what the argument is. This is just unexpected because I tend to write func(array) to map a scalar function over an array. That works in most cases, but not always. The doc of numpy.fromfunction warns about this. One can use np.vectorize to create a function that does map properly:

>>> func = lambda i: 7
>>> vfunc = np.vectorize(func)
>>> func(np.arange(3))
7
>>> vfunc(np.arange(3))
array([7, 7, 7])

CAM-Gerlach · March 17, 2023, 5:19pm

I’m a little unclear how it’s surprising. If you define a function (such as your func2) that results a constant value that does not depend on the input (or any external state), it must always return that value, exactly as you told it to do. If that value is a single integer literal, then the return value will always be that single integer literal; it will not and cannot magically get changed into an array just because you passed on in, because the return value you defined does not depend on the input.

By contrast, if you define a function where the return value depends on the input (such as your func1), then the type and value you get back may vary depending on what you input. In this case, as @franklinvp mentioned, adding or multiplying the input NumPy array by a scalar results in a NumPy array with all the elements of the original added or multiplied by the scalar.

This may seem a little magical, but in Python, most of the operators (+, *. etc) are just syntactic sugar for calling special double underscore (dunder) methods on the objects involved, so objects can effectively “redefine” these operators to mean what they want.

You can create your own class that demonstrates a simple example of this:

class Vector:
    def __init__(self, *values):
        self._values = list(values)

    def __add__(self, other):
        return Vector(*(value + other for value in self._values))

    def __mul__(self, other):
        return Vector(*(value * other for value in self._values))

    # Right hand side variants
    def __radd__(self, other):
        return self.__add__(other)

    def __rmul__(self, other):
        return self.__mul__(other)

    # Etc for any other desired operators

    # So str(), repr(), implict and explict printing work
    def __str__(self):
        return str(self._values)

    def __repr__(self):
        return f"Vector({', '.join(repr(value) for value in self._values)})"

Now, you can perform addition and multiplication on an instance of your Vector with a scalar and it will return a Vector with the operation “broadcast” to all the elements, like NumPy does:

In [1]: vec = Vector(1, 2, 3)

In [2]: vec * 5
Out[2]: Vector(5, 10, 15)

In [3]: vec * 2  + 1
Out[3]: Vector(3, 5, 7)

In [4]: 7 + 0 * vec
Out[4]: Vector(7, 7, 7)

If you call your func1 and func2 on this Vector vec, you’ll get the same result as on the NumPy array above:

func1(vec)
Out[27]: Vector(7, 7, 7)

func2(vec)
Out[28]: 7

bverheg · March 17, 2023, 5:48pm

Sorry if I did not make myself clear enough. I have no problem in understanding what happens. But I have often (thousands of times) used a simple function call f(a) to map a scalar function f on an array a. It’s the most efficient way to do it. And over time my mind got corrupted into interpreting f(a) just as mapping the function f on the array. But of course it isn’t. Python just evaluates the function, and that happens to return my expected result, until it doesn’t when the function is a constant.
But thanks for helping me clear up my mind.
Now looking back at some cases where I used it, I actually found that I got the expected array result even with a constant function, because the result of the function was inserted back into an array, and so broadcasting did its work again

CAM-Gerlach · March 17, 2023, 6:02pm

Ah right, yup. The fact that the call is within a function has nothing to do with the result, which would be the same given the the same code and variable values regardless of the context in which it was run, and it’s really the NumPy array object rather than your code that’s doing the broadcasting. If you weren’t used to the pattern of always performing this broadcasting in a function, I’ll bet it would have been much more clear—given arr = np.arange(3) as in your example, you might expect the line of code

>>> print(7 + 0 * arr)

to print an array, wheres I’m sure it would be clear that

>>> print(7)

would just print the scalar value 7.