Predicting Training Data

import numpy as np
import pandas as pd

dataFile=‘C:/Users/user/Documents/Python/Data/ad-dataset/ad.data’
data=pd.read_csv(dataFile,sep=",",header=None,low_memory=False)

Check whether a given value is a missing value, if yes change it to NaN

def toNum(cell):
try:
return np.float(cell)
except:
return np.nan

Apply missing value check to a column / Pandas series

def seriestoNum(series):
return series.apply(toNum)

train_data=data.iloc[0:,0:-1].apply(seriestoNum)
train_data.head(20)

Out:

0	1	2	3	4	5	6	7	8	9	...	1548	1549	1550	1551	1552	1553	1554	1555	1556	1557

0 125.0 125.0 1.0000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 57.0 468.0 8.2105 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 33.0 230.0 6.9696 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 60.0 468.0 7.8000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 60.0 468.0 7.8000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 60.0 468.0 7.8000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 59.0 460.0 7.7966 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 60.0 234.0 3.9000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

def toLabel(str):
if str==“ad.”:
return 1
else:
return 0

train_labels=data.iloc[train_data.index,-1].apply(toLabel)
train_labels

Out:

0 1
1 1
2 1
3 1
4 1

3273 0
3274 0
3275 0
3276 0
3278 0
Name: 1558, Length: 2359, dtype: int64

Training Phase

from sklearn.svm import LinearSVC

clf = LinearSVC(max_iter=10000,dual=False)
clf.fit(train_data[100:2300],train_labels[100:2300])

Out:
LinearSVC(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, loss=‘squared_hinge’, max_iter=10000,
multi_class=‘ovr’, penalty=‘l2’, random_state=None, tol=0.0001,
verbose=0)

Test Phase

clf.predict(train_data.iloc[12].reshape(1,-1))

Out:


AttributeError Traceback (most recent call last)
in
1 # Test Phase
----> 2 clf.predict(train_data.iloc[12].reshape(1,-1))

~\Anaconda3\lib\site-packages\pandas\core\generic.py in getattr(self, name)
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5273 return self[name]
-> 5274 return object.getattribute(self, name)
5275
5276 def setattr(self, name: str, value) -> None:

AttributeError: ‘Series’ object has no attribute ‘reshape’

Hi,

reshape is an attribute of numpy arrays, but does not work for a pandas Series. Unfortunately the error message from sklearn when you try to do clf.predict(train_data.iloc[12]) tells you to try reshape because predict expects 2d data, but this won’t work straight away for your example.

What you need to do instead is convert the Series to a numpy array, and then you can use reshape. pandas has the to_numpy method which does this, so a working snippet would be:

clf.predict(train_data.iloc[12].to_numpy().reshape(1, -1)

Thanks Irjball Irjball that worked