How to set the "label gain" parameter? No code examples found!

sofiavlachou · June 17, 2022, 9:30am

Hello to everyone,

I think many of you have the same problem as me with the label gain parameter!

I searched extensively for months, but could not find much useful documentation or python code examples on how to set the label gain in my code. The official LightGBM Ranker API does not provide this information. There is no code example as well.

After facing a series of several errors, the same error happened again: lightgbm.basic.LightGBMError: Label 47 is not less than the number of label mappings (31). Some coders have suggested to set the parameter label gain without any explanation.

My ranking problem: My objective is to calculate the most popular products based on likes, comments, or their frequency. For instance, to identify the five or ten most popular. I want everything to always be classified according to these criteria each time in a descending order, starting with the one with the most likes and finishing with the one with the fewest, for instance.

For this reason, I share my updated code with all of you. Can somebody help me?? I will appreciate your help!

Here is my code:

# Dependencies
import pandas as pd
from pandas import set_option
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import RepeatedStratifiedKFold
from pandas import read_csv
import numpy as np
from numpy import unique
from sklearn import metrics
# LGBMRanker
import lightgbm as lgb
from lightgbm import LGBMRanker
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')




# Load data
names = ["label","id","comments","likes","product 1 frequency","product 2 frequency"]
dataset = pd.read_csv("Product Ranking.csv", names=names, encoding="utf-8", error_bad_lines=True,
					   skip_blank_lines=True, sep=",", delimiter=None, doublequote=True, keep_default_na=True,
					   nrows=1223, header=6, engine="python")

# Shape
print(dataset.shape)


# Max labels
max_label = dataset.label.nunique()
print(max_label)



# Core Model
gbm = lgb.LGBMRanker(objective="lambdarank", )



# Split the data in train and test
array = dataset.values
X = array [:,0:4]
y = array [:,5]
X = X.astype('int64')
y = LabelEncoder().fit_transform(y.astype('str'))

X_train, X_test, y_train, y_test = train_test_split(X,y,
                                                    train_size=0.75, test_size=0.25,random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)




# define search
model = lgb.LGBMRanker()

# perform the search
model.fit(X, y,
		  group=group=[10, 24, 34, 14, 20, 22, 20, 19, 10, 23, 22,
				 10, 27, 14, 22, 11, 14, 10, 10, 26, 10, 10,
				 13, 10, 10, 21, 17, 12, 21, 10, 10, 10, 27,
				 12, 21, 10, 10, 11, 13, 10, 15, 17, 10, 9,
				 10, 10, 10, 10, 10, 10, 14, 23, 10, 10, 13,
				 10, 10, 10, 18, 16, 16, 11, 9, 10, 10, 10,
				 10, 11, 12, 10, 10, 18, 10, 16, 12, 10, 9,
				 14, 11, 10, 11, 11, 10, 10, 12, 9, 12, 11, 8, 13])


# 5) Predictions
test_pred = gbm.predict(X_test)
X_test["predicted_ranking"] = test_pred
X_test.sort_values("predicted_ranking", ascending=False)

Sample of my data:
1,1,6,378,10,86
3,1,2036,65206,10,86
3,1,2036,65206,10,86
1,1,5,237,10,86
2,1,16,799,10,86
3,1,77,7073,10,86
2,1,18,1973,10,86
3,1,87,7686,10,86
0,1,7,73,10,86
0,1,2,32,10,86
1,2,11,1485,24,29
2,2,123,20831,24,29
1,2,1,318,24,29
1,2,5,455,24,29

Thank you in advance!
Sofia

vbrozik · June 21, 2022, 9:53am

Hello,

your mention label gain multiple times without an explanation what it is. Maybe add some links to documentation, articles where it is explained or at least mentioned.

We also do not know which part of your code the error message relates to.

A biger problem probably is that your question is about a specific framework - LightGBM. Maybe there are not many people in this forum familiar with LightGBM?