Hello! I want to apply some classification algorithms on a data set with a categorical variable of two possible values: 0 and 1. I built a countplot on this categorical variable and got this result:
What should I do in this situation?
The data set I am using is a public data set from Kaggle, and this is for a college project, so the data is not critical. I can remove the rows that have the wrong data.
In this plot, it’s clear that there are some values of 5, but I don’t understand the rest of the values, for example the “46”, “618” or “269375”.
Are they specific values that I have to process or are they ranges?