Matrix form of the weight in the random choice library

Dear community,

I have a question regarding the module random choice. I understand that the method works for a 1-dimension array. For example, let’s say the list is

item=[‘A’, “B”, “C”],

and we want to choose two elements from this list. Then the following code would work

random.choices(items, weights = [0.1, 0.3, 0.6], k = 2)

However, for some problems that I am working on, I need the weight to be an mxn array. More precisely, suppose that there is a category with m elements. Each element in this category has n items (in my application, n depends on the specific element but for simplicity, let us assume that they are the same.) My goal is to randomly choose k elements from this set. By matrix multiplication, we can compute the probability of choosing an item from the total set of mxn items. However, the weight will be is an mxn array. One can, of course, transfer this to an 1xmn array. I wonder whether there is a quicker way to resolve this?

Thank you!


Like almost anything with multidimensional arrays, its usually best to do this with NumPy. The choice method numpy.random.Generator does \what you want, as it can both input and output a multidimensional array, with multidimensional probabilities per-element. So assuming you have some m x n array choices:

import numpy as np
choices = np.array([[x_11, ..., x_1n], ..., [x_m1, ..., x_mn]])

With an m x n array of the weights:

weights = np.array([[w_11, ..., w_1n], ..., [w_m1, ..., w_mn]])

Then you could select k samples randomly from any element in the array, output to a 1D array, by flattening both the inputs:

rng = np.random.default_rng()
rng.choice(choices.flatten(), k, weights.flatten())

If the initial dimensional structure has meaning that you want the output to reflect (e.g. choosing specific categories, etc.) there are ways to do that, but you’ll need to be more specific about what you’re looking for there.

1 Like

Thank you for your answer. It looks great!

1 Like

I tried the above approach but somehow it does not work. Here is my example.

import numpy as np

cat_1=['milk', 'yogurt', 'cheese']
cat_2=['rice', 'cereal', 'noodle', 'longrice']
cat_3=['similac', 'huggies', 'wipes', 'nuts', 'pamper']
all_cat=['cat_1', 'cat_2', 'cat_3']
cat_weight=np.array([3, 2, 2])
weight_1=[np.random.randint(1, 10) for item in cat_1]
weight_2=[np.random.randint(1,10) for item in cat_2]
weight_3=[np.random.randint(1,10) for item in cat_3]
weights=np.array([weight_1, weight_2, weight_3])
items=np.array([cat_1, cat_2, cat_3])

rng = np.random.default_rng()
rng.choice(items.flatten(), k, weights.flatten())

I got the following errors

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Do you know what happened? Thanks!

Please, always provide the full error and traceback inside a code block. Otherwise, the error could have occured due to not only any of the lines of code you’ve shown, but also other first or third-party code that isn’t, and we are left to blindly guess at where it might be.

Fortunately, since you helpfully provided your full code itself in a code block, I was able to figure out what is going on.

Testing the code, there are three direct issues here, plus one side one.

First, the immediate error is my fault for not actually running the code I provided—the rng.choice() call needs have its arguments passed by keyword, since the ones I used are not the first three. I.e. you should have

rng.choice(items.flatten(), size=k, p=weights.flatten())

Second, your weights are probabilities and need to sum to one, so simply divide the weights array by its sum before passing it:

rng.choice(items.flatten(), size=k, p=(weights / sum(weights)).flatten())

Finally, it looks like your different categories are not the same length, so they aren’t actually a 2D array (or matrix) at all (per the original question). Therefore, Numpy treats them as a 1D array of list objects, which is not what you want. However, since you don’t actually care about the arrays dimensions in the final output, you can solve this by flattening one step earlier, when you put the weights and items into the arrays.

To do this, use the unpacking operator (*) to create flat lists of weights and items instead of nested ones, which is most convenient to do when creating the arrays:

weights=np.array([*weight_1, *weight_2, *weight_3])
items=np.array([*cat_1, *cat_2, *cat_3])

Then, you can omit the flattening calls in rng.choice(), though they don’t hurt:

rng.choice(items, size=k, p=weights / sum(weights))

If you do have an actual matrix or 2D array (or an array of any dimension), which is what the original question was asking, you don’t need to flatten beforehand, and you may or may not want to flatten before outputting, depending on what you want your output to look like.

Also, FYI, you have an extra all_cat and cat_weight that don’t appear to be used anywhere, at least in the given code.

1 Like

I tested and it worked (sorry for my delayed response). Thank you very much for your clear explanation!

1 Like