Matrix form of the weight in the random choice library

tungprime · February 25, 2022, 7:29pm

Dear community,

I have a question regarding the module random choice. I understand that the method works for a 1-dimension array. For example, let’s say the list is

item=[‘A’, “B”, “C”],

and we want to choose two elements from this list. Then the following code would work

random.choices(items, weights = [0.1, 0.3, 0.6], k = 2)

However, for some problems that I am working on, I need the weight to be an mxn array. More precisely, suppose that there is a category with m elements. Each element in this category has n items (in my application, n depends on the specific element but for simplicity, let us assume that they are the same.) My goal is to randomly choose k elements from this set. By matrix multiplication, we can compute the probability of choosing an item from the total set of mxn items. However, the weight will be is an mxn array. One can, of course, transfer this to an 1xmn array. I wonder whether there is a quicker way to resolve this?

Thank you!

CAM-Gerlach · February 28, 2022, 6:51am

Welcome!

Like almost anything with multidimensional arrays, its usually best to do this with NumPy. The choice method numpy.random.Generator does \what you want, as it can both input and output a multidimensional array, with multidimensional probabilities per-element. So assuming you have some m x n array choices:

import numpy as np
choices = np.array([[x_11, ..., x_1n], ..., [x_m1, ..., x_mn]])

With an m x n array of the weights:

weights = np.array([[w_11, ..., w_1n], ..., [w_m1, ..., w_mn]])

Then you could select k samples randomly from any element in the array, output to a 1D array, by flattening both the inputs:

rng = np.random.default_rng()
rng.choice(choices.flatten(), k, weights.flatten())

If the initial dimensional structure has meaning that you want the output to reflect (e.g. choosing specific categories, etc.) there are ways to do that, but you’ll need to be more specific about what you’re looking for there.

tungprime · March 8, 2022, 5:32am

Thank you for your answer. It looks great!

tungprime · March 8, 2022, 4:48pm

I tried the above approach but somehow it does not work. Here is my example.

import numpy as np


cat_1=['milk', 'yogurt', 'cheese']
cat_2=['rice', 'cereal', 'noodle', 'longrice']
cat_3=['similac', 'huggies', 'wipes', 'nuts', 'pamper']
all_cat=['cat_1', 'cat_2', 'cat_3']
cat_weight=np.array([3, 2, 2])
weight_1=[np.random.randint(1, 10) for item in cat_1]
weight_2=[np.random.randint(1,10) for item in cat_2]
weight_3=[np.random.randint(1,10) for item in cat_3]
weights=np.array([weight_1, weight_2, weight_3])
items=np.array([cat_1, cat_2, cat_3])

k=3
rng = np.random.default_rng()
rng.choice(items.flatten(), k, weights.flatten())

I got the following errors

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Do you know what happened? Thanks!

CAM-Gerlach · March 8, 2022, 9:12pm

Please, always provide the full error and traceback inside a code block. Otherwise, the error could have occured due to not only any of the lines of code you’ve shown, but also other first or third-party code that isn’t, and we are left to blindly guess at where it might be.

Fortunately, since you helpfully provided your full code itself in a code block, I was able to figure out what is going on.

Testing the code, there are three direct issues here, plus one side one.

First, the immediate error is my fault for not actually running the code I provided—the rng.choice() call needs have its arguments passed by keyword, since the ones I used are not the first three. I.e. you should have

rng.choice(items.flatten(), size=k, p=weights.flatten())

Second, your weights are probabilities and need to sum to one, so simply divide the weights array by its sum before passing it:

rng.choice(items.flatten(), size=k, p=(weights / sum(weights)).flatten())

Finally, it looks like your different categories are not the same length, so they aren’t actually a 2D array (or matrix) at all (per the original question). Therefore, Numpy treats them as a 1D array of list objects, which is not what you want. However, since you don’t actually care about the arrays dimensions in the final output, you can solve this by flattening one step earlier, when you put the weights and items into the arrays.

To do this, use the unpacking operator (*) to create flat lists of weights and items instead of nested ones, which is most convenient to do when creating the arrays:

weights=np.array([*weight_1, *weight_2, *weight_3])
items=np.array([*cat_1, *cat_2, *cat_3])

Then, you can omit the flattening calls in rng.choice(), though they don’t hurt:

rng.choice(items, size=k, p=weights / sum(weights))

If you do have an actual matrix or 2D array (or an array of any dimension), which is what the original question was asking, you don’t need to flatten beforehand, and you may or may not want to flatten before outputting, depending on what you want your output to look like.

Also, FYI, you have an extra all_cat and cat_weight that don’t appear to be used anywhere, at least in the given code.

tungprime · March 10, 2022, 9:20pm

I tested and it worked (sorry for my delayed response). Thank you very much for your clear explanation!

tschirp · January 2, 2023, 11:39am

Hi,

i have a related problem to the author’s one. I found this thread and your answer to it and i think you could help me.
I’m writing an Ant colony optimization algorithm for the Set Covering Problem where i need to implement a function that creates a 2D-matrix with “Rows” as rows and “Ants” as columns and “chosen columns” as values. I have an array of all columns that can be chosen from and a 2D-Array of Weights where there is a weight for every row-column combination. Currently i use a for loop over all rows to generate a 1D-Array for every row with the random.choices function and then set it as the specific row of the result matrix. This takes about 2 seconds which is way too long. You said:

Blockquote
If the initial dimensional structure has meaning that you want the output to reflect (e.g. choosing specific categories, etc.) there are ways to do that, but you’ll need to be more specific about what you’re looking for there.

I think that applies to my problem. I want to create a 2D-Matrix with only one function using a 2D array of weights where in the result matrix every row uses only the equivalent row of the weights matrix. Is that in any way possible?

Here is my part of code (The weights matrix is not initialized here because that would require many more steps as it is a results of many follow up steps after a subgradient optimization. It is a sparse matrix which is turned into an array with .toarray())

import numpy as np
number_of_columns = 630009
number_of_rows = 507
number_of_ants = 100

all_columns = np.arange(number_of_columns)
chosen_columns = np.zeros((number_of_rows, number_of_ants))
for row in range(number_of_rows):
    chosen_columns[row] = random.choices(all_columns, weights=weights[row], k=number_of_ants)

Thanks in advance!