Setting random NaN in multidimensional numpy array

Good morning to everyone,

I am trying to generate a 4D numpy array that contains a random number of “nan”. I am following a methodology that I have found in this page (How to randomly insert NaN in a matrix with NumPy in Python ? - GeeksforGeeks), concretely “method 2”, that consists in creating a mask. This method, however, considers a 2D numpy array - a matrix-, but in my case I am dealing with a 4D numpy array.

Let me show you an example of my problem.

Array “mask” is a 4D boolean array full of “False”:

mask = [ [ [ [F F F F], [F F F F] ], [ [F F F F], [F F F F] ] ],
[ [F F F F], [F F F F] ], [ [F F F F], [F F F F] ] ] ]

Array “random_nan” is a 4D array that contains randomly generated integer numbers that represent the number of elements for each sub-array of “mas” that will be turned to “True” (T):

random_nan = [ [ [ [2], [3] ], [ [1], [2] ] ], [ [ [0], [2] ], [ [1], [3] ] ] ]

So, what I want to get is a “result” array like this:

result = [ [ [ [T T F F], [T T T F] ], [ [T F F F], [T T F F] ] ],
[ [F F F F], [T T F F] ], [ [T F F F], [T T T F] ] ] ]

I am struggling, but I can’t think of a “pythonic” way to get this “result” without having to set “for” loops to explicitly run for the different sub-arrays.

I would be glad if you could give me a hand with this problem.

Thanks in advance.

Joan

To iterate multiple lists together you can use zip(). You can use the same technique for nested lists for every iteration level. Maybe NumPy has other tools for that but I do not know NumPy much. Here is a solution with plain Python:

Input data (click to show)

I fixed the brackets and other things to make the code runnable. Please consider making a runnable example next time.

F = False
T = True

mask = [
        [ [ [F, F, F, F], [F, F, F, F] ], [ [F, F, F, F], [F, F, F, F] ] ],
        [ [ [F, F, F, F], [F, F, F, F] ], [ [F, F, F, F], [F, F, F, F] ] ]
    ]

random_nan = [
        [ [ [2], [3] ], [ [1], [2] ] ],
        [ [ [0], [2] ], [ [1], [3] ] ]
    ]
import random

for random_nan_d1, mask_d1 in zip(random_nan, mask):
    for random_nan_d2, mask_d2 in zip(random_nan_d1, mask_d1):
        for (random_nan_count,), mask_d3 in zip(random_nan_d2, mask_d2):
            for index in random.sample(range(len(mask_d3)), random_nan_count):
                mask_d3[index] = True
            print(random_nan_count, mask_d3)  # diagnostic print
Diagnostic output (click to show)
2 [False, False, True, True]
3 [True, True, False, True]
1 [False, False, True, False]
2 [True, False, False, True]
0 [False, False, False, False]
2 [True, True, False, False]
1 [True, False, False, False]
3 [True, False, True, True]

Output:

[[[[False, False, True, True], [True, True, False, True]],
  [[False, False, True, False], [True, False, False, True]]],
 [[[False, False, False, False], [True, True, False, False]],
  [[True, False, False, False], [True, False, True, True]]]]

Here is a Jupyter notebook with the solution: python-ntb/2022-07-13_iterate_two_4d_arrays.ipynb at main · vbrozik/python-ntb · GitHub

Hello Václav,

Many thanks for your answer! I didn’t know about the “zip()” function, it seems a good way to perform a mapping between two iterable objects.

I have run your code and it works fine! The only thing is that, in my problem, I don’t need the last “for” loop because the integer values in “random_nan” array not only give information about the number of True that will be included in the corresponding subarray in “mask” but also informs about the position of the elements that will be turned to “True” (for example, when “random_nan” is “2”, it means that elements in positions 0 and 1 in the corresponding subarray will be converted into “True”).

Even though this solution works fine, I would like to find some NumPy alternative to avoid the explicit use of “for” loops to run for the different subarrays.

Thank you very much for your time and help!

Regards,

Joan

After just a quick googling I have found that probably the most direct way in NumPy would be to use nditer(). This function creates an iterator which allows you to iterate a multi-dimensional array. So you would not need the nested loops.

I thinks this could work:

  • Create nditer for random_nan.
  • Create nditer for mask with op_flags=['readwrite']
  • Create a context for the iterator using the with statement.
  • zip the two iterators
  • Iterate the zip using for. Inside the loop make the changes.
  • End the context to write the changes back to the mask array.

What I did not notice in the documentation is how do you limit the “depth” of the iteration. I.e. how to say that you do not want to iterate to the deepest level - the individual elements but one level up - the vectors (1D arrays) of elements.

If you make a functional code it would be great if you show it here.