Google Colab Code debugging

I am new to python. I am using google colab to practice some ML methods, but I got stuck with the error message for a few days. I don’t know how to debug it even though it seems there are answers to similar questions on the website. Please help!

below are my code

train, X_train, y_train = scale_dataset(train, oversample=True)
valid, X_valid, y_valid = scale_dataset(valid, oversample=False)

below are the error message


ValueError Traceback (most recent call last)
in <cell line: 1>()
----> 1 train, X_train, y_train = scale_dataset(train, oversample=True)
2 valid, X_valid, y_valid = scale_dataset(valid, oversample=False)

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level)
9652 for _dt in cond.dtypes:
9653 if not is_bool_dtype(_dt):
→ 9654 raise ValueError(msg.format(dtype=_dt))
9655 else:
9656 # GH#21947 we have an empty DataFrame/Series, could be object-dtype

ValueError: Boolean array expected for the condition, not object

It isn’t clear what you mean here, therefore it’s hard to know what the next step should be.

What website is “the website”? This one? The Google Colab one? Some site for a tutorial you’re following? Something else?

What similar questions did you find, and what do their answers say? Did you try doing anything after reading those answers? What happened when you tried following such advice (if any), and how was it unhelpful?

From the internet, it says below

“This error most commonly occurs when you try to select columns using a dataframe.”
So I used the following code to solve the problem.

X_train = train.iloc[:,:400] # Grabs all rows and first 400 columns
y_train = train.iloc[:,-1:] # Grabs all rows and last 1 columns
y_train=np.ravel(y,order=“c”)

X_test = test.iloc[:,:400] # Grabs all rows and first 400 columns
y_test = test.iloc[:,-1:] # Grabs all rows and last 1 columns
y_test=np.ravel(y,order=“c”)

knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)

But in later step, it gave me the following error message, don’t know how to debug that. Can you please help?


ValueError Traceback (most recent call last)
in <cell line: 2>()
1 knn_model = KNeighborsClassifier(n_neighbors=3)
----> 2 knn_model.fit(X_train, y_train)

4 frames
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
395 uniques = np.unique(lengths)
396 if len(uniques) > 1:
→ 397 raise ValueError(
398 “Found input variables with inconsistent numbers of samples: %r”
399 % [int(l) for l in lengths]

ValueError: Found input variables with inconsistent numbers of samples: [2048, 1536]

Hi,

just an fyi, when entering your code for others to see, it would help greatly if you entered your code as per (it will then appear as the test lines below):

Regarding these two lines:

X_train = train.iloc[:,:400] # Grabs all rows and first 400 columns
y_train = train.iloc[:,-1:] # Grabs all rows and last 1 columns

For indexing, should it instead be:

X_train = train.iloc[:, 0:400] # Grabs all rows and first 400 columns
y_train = train.iloc[:, -1]    # Grabs all rows and last 1 columns

This is wrong, and wouldn’t address the apparent symptoms. :400 means the same as 0:400 in this context: a slice of indices from 0 inclusive up to 400 exclusive. -1: is fine; it slices rather than indexing, so as to produce a 2-dimensional result.

However, it seems that the goal is to convert the results to a 1-dimensional result in the next line, using the np.ravel call. This is where the problem occurs:

Notice that the code shown here does not define any y variable, only y_train. So the result from the first line is irrelevant. np.ravel will use some other y defined earlier in the code (not shown to us here), and that result will replace the y_train slice.

The “input variables” have “inconsistent numbers of samples” because the y data has some different size, so it can’t be lined up with X_train.

We could fix this by using y_train = train.iloc[:,-1] as suggested and also removing this use of np.ravel. It’s trying to fix a problem from the previous line, but it’s using the wrong data to do so.

Or we can fix it (but this is messier) by using the right input (y_train) in the np.ravel call.

The same problem occurs with y_test.

Thanks much for letting me how to insert the code in a more readable way. This is the first time for me to use this platform.