ValueError: setting an array element with a sequence

Im trying to training a Random Forest Classifier to predict movie success based on various features.

Im using tmdb_5000_movies.csv data set. code as below
df_movies = pd.read_csv(‘tmdb_5000_movies.csv’)
df_credits= pd.read_csv(‘tmdb_5000_credits.csv’)

df_movies.rename(columns={‘id’: ‘movie_id’}, inplace=True) ## Rename the ‘id’ column to ‘movie_id’

Merge the DataFrames on ‘movie_id’ with specified suffixes

df_merged = pd.merge(df_movies, df_credits, on=‘movie_id’, suffixes=(‘_movie’, ‘_credit’))

def extract_genres(json_str):
genres = json.loads(json_str.replace(“'”, “"”)) # Replace single quotes with double quotes for valid JSON
genre_names = [genre[‘name’] for genre in genres]
return genre_names
except (json.JSONDecodeError, TypeError):

Apply the function to the ‘genres’ column

df_merged[‘genres’] = df_merged[‘genres’].apply(extract_genres)

Handle keywords


def extract_keywords(text):
L =
for i in ast.literal_eval(text):
return L

Apply the function to the ‘keywords’ column

df_merged[‘keywords’] = df_merged[‘keywords’].apply(extract_keywords)

Handle cast


Function to convert string to list of keyword names and keeping top 4 cast

def convert_cast(text):

L = []
counter = 0
for i in ast.literal_eval(text):
    if counter < 4:
return L

df_merged[‘cast’] = df_merged[‘cast’].apply(convert_cast)

Handle crew


Function to Extract The Director Name

def get_director(text):
L =
for i in ast.literal_eval(text):
if i[‘job’] == ‘Director’:
return L

df_merged[‘crew’] = df_merged[‘crew’].apply(get_director)

Converting overview to list


Remove spaces from strings

def remove_spaces(text):
if isinstance(text, list):
return [t.replace(" “, “”) for t in text]
elif isinstance(text, str):
return text.replace(” ", “”)
return text

Apply the function to remove spaces

df_merged[‘overview’] = df_merged[‘overview’].apply(remove_spaces)
df_merged[‘genres’] = df_merged[‘genres’].apply(remove_spaces)
df_merged[‘keywords’] = df_merged[‘keywords’].apply(remove_spaces)
df_merged[‘cast’] = df_merged[‘cast’].apply(remove_spaces)
df_merged[‘crew’] = df_merged[‘crew’].apply(remove_spaces)

Drop the ‘homepage’ column

df_merged.drop(columns=[‘homepage’], inplace=True)

Preprocess the data

categorical_features = [‘genres’, ‘original_language’, ‘production_countries’, ‘spoken_languages’, ‘status’]
label_encoder = LabelEncoder()

for feature in categorical_features:
df_merged[feature] = label_encoder.fit_transform(df_merged[feature].astype(str))

Prepare the features and target variable

X = df_merged.drop([‘title_movie’, ‘vote_average’, ‘vote_count’], axis=1)
y = df_merged[‘vote_average’].apply(lambda x: 1 if x >= 6 else 0) # Assuming a vote_average >= 6 is considered a success

Split the data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train a Random Forest Classifier

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42), y_train)

I’m getting the below error. ValueError: setting an array element with a sequence.

The problem as you have posed it does not have enough information for us to resolve.

It’s not just that most of the code is hard to read because it isn’t formatted, and because you have clearly omitted a bunch of it, but you have given us dozens of lines of code without any hint as to which of these lines causes the error.

Also, the top result of a Google search is this: python - ValueError: setting an array element with a sequence - Stack Overflow

It would be very helpful to explain why this didn’t work for you, particularly since I suspect that the undefined variable you have, pd, is in fact pandas.


when entering your scripts onto this forum, please follow these guidelines so that your code will appear as it does on your Python editor for easier reading and interpretation.

Can you elaborate on the source of the reported issue? The editor will list the line number from where the exception is being generated. Please describe the source of the error.

Here is an example of what generates the stated exception. When you define an array, it should be X x Y dimensions. For example, with X = 3, and Y = 5, you will have a 3 x 5 dimensional array. If you don’t have a value for a given cell, you have to assign it a value of 0 and NOT leave it blank.

Here is a visual that may help in understanding this concept a little bit better:


In figure (a) the the first row has three columns. The second row has two columns. It needs to have three columns. However, the value has to be set to 0 if it has no value. It cannot be left blank. Otherwise, a ValueError: setting an array element with a sequence exception will be generated. Refer to figure (b) for the correct way of defining an array.

Therefore, somewhere in your script, an array or an array like object is being defined as per the characteristics of figure (a).

Okay, and where in the code does this appear to occur? (Do you understand what a traceback is, and why it is useful?)