Slicing question

Hei,

I am trying to understand the difference between the following two code snippets (and it might be a numpy issue, please let me know):

Using this code, I get the expected output:

import numpy as np
aa = np.arange(6)
aa[2:4][:] = 8
print(aa)

The array aa now contains [0, 1, 8, 8, 4, 5].

import numpy as np
aa = np.arange(6)
idx = np.all([aa > 1, aa < 4], axis=0)
aa[idx][:] = 8
print(aa)

With this code aa is the original [0, 1, 2, 3, 4, 5].

Also,

import numpy as np
aa = np.arange(6)
idx = np.all([aa > 1, aa < 4], axis=0)
aa[idx] = 8
print(aa)

gives again [0, 1, 8, 8, 4, 5]. Could somebody explain what is going on?

Cheers

When you’re stuck like this, why not add in a few more print statements, just like the one at the end? And it’s never a bad idea in my experience, to break code down into smaller steps.

The answer is idx isn’t a slice, it’s a numpy array of booleans (for 2 <= x <= 3).
https://numpy.org/doc/stable/reference/generated/numpy.all.html

Indexing aa with idx or any other array, creates an entirely new array. This new one is the array being mutated by the assignment to 8, not aa.

import numpy as np
aa = np.arange(6)
idx = np.all([aa > 1, aa < 4], axis=0)
print(f'{repr(idx)=}')
# aa[idx][:] = 8
meets_conds = aa[idx]
print(f'{repr(meets_conds)=}')
meets_conds[:] = 8
print(f'{repr(meets_conds)=}')
print(f'{aa=}')

output:

repr(idx)='array([False, False,  True,  True, False, False])'
repr(meets_conds)='array([2, 3])'
repr(meets_conds)='array([8, 8])'
aa=array([0, 1, 2, 3, 4, 5])

I haven’t been able to find out how this works, actually. As @JamesParrott said, boolean indexing is ‘supposed’ to give you a new array. (Which would make using it to modify an array impossible.) I suspect it’s a dedicated method that’s somehow being called, but I can’t find it. I would treat it as a bit on numpy magic. Since it’s magic, it is fragile, and using aa[idx][:] instead of aa[idx] breaks the magic.

numpy arrays are contiguous data structures in memory, this allows to return slices of them as “views” on them (it equally works for strides), thus not requiring to allocate more memory. But when doing aa[idx], you are passing indexes, this forces numpy to construct a new array (with np.take under the hood), instead of creating a view on it.
np.shares_memory can help you investigate whether arrays are views of others or not

aa = np.arange(6)
idx = np.all([aa > 1, aa < 4], axis=0)
sli = slice(2,4)

np.shares_memory(aa, aa[idx])  # False
np.shares_memory(aa, aa[sli])  # True

So, how do you explain that
aa[idx] *= 2
does modify aa?

As an expression x[i] is equivalent to x.__getitem__(i) which in the case of numpy arrays returns a new object. If i was a slice then the new object is still a view on to the same buffer as the original object but if i is a boolean array or an array of integer indices then it returns an array with a newly allocated buffer and a copy of the data (a shallow copy in the case of object arrays).

The statement x[i] = v is equivalent to the statement x.__setitem__(i, v) which for a numpy array (and normal Python containers) always mutates x in place whatever kind of index i is.

When you do x[i][j] = v this equivalent to (x[i])[j] = v or in other words:

x.__getitem__(i).__setitem__(j, v)`

Now if i is a slice then x.__getitem__(i) is a view into the same buffer as x. However if i is an array of bool/int then __setitem__(j, v) is being called on a temporary array that does not share its buffer with x.

A statement like x[i] *= 2 is equivalent to:

x.__setitem__(i, x.__getitem__(i) * 2)

which always mutates x. A statement like x[i][j] *= 2 is equivalent to:

tmp = x[i]
tmp[j] *= 2

The question then is if tmp and x share the same buffer which will be the case if i was a slice but not if it was a bool array.

5 Likes

This is exact and well-written :+1:.

You can also remark that aa[idx] *= 2 would just do nothing if it was not modifying aa.

Aha!

Excellent answer, thank you very much!