I’ve a problem with the tensorflow example for boosted d-trees. The titanic dataset is used, where the goal is to predict passenger survival, given characteristics such as gender, age, class, etc.
In this example, the whole dataset is loaded first. Then, for the training set, the dataset without the column that specifies if a person has survived is passed.
import numpy as np import pandas as pd from IPython.display import clear_output from matplotlib import pyplot as plt
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') y_train = dftrain.pop('survived') y_eval = dfeval.pop('survived')
I can’t think of a reason why it’s necessary to delete the column I want to predict. Can somebody please help me out here?