Data formatting and plot AR model

Hello, while running the below code, I have got the following error “TypeError: float() argument must be a string or a real number, not ‘Period’”

→ I tried to change the index with pd.DatetimeIndex or to_timestamp but was unsuccessful.

would appreciate any idea

thanks in advance !


import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import datetime as dt

import pandas_datareader as pdr

import statsmodels.api as sm

from matplotlib.pyplot import plot, scatter, show, xlabel, ylabel

import yfinance as yf

from statsmodels.tsa.ar_model import AutoReg

from datetime import datetime

from datetime import timedelta

start = ‘2021-01-01’

end = “2024-03-22”

stock1 = yf.Ticker(‘AAPL’)

stock1_data = stock1.history(interval=‘1d’, start= start, end= end)

stock2 = yf.Ticker(‘AMZN’)

stock2_data = stock2.history(interval=‘1d’, start= start, end= end)

X = np.log(stock1_data[“Close”])

Y = np.log(stock2_data[“Close”])

X = sm.add_constant(X)

model = sm.OLS(Y,X)

results = model.fit()

alpha = results.params.values[0]

beta = results.params.values[1]

errors = Y - (alpha + X[“Close”]*beta)

errors.index = pd.DatetimeIndex(errors.index).to_period(‘D’)

train_end = datetime(2023,4,1)

test_end = datetime(2024,4,1)

train_data = errors[:train_end]

test_data = errors[train_end+timedelta(days=1):test_end]

model2 = AutoReg(train_data, lags=3)

model2_fit = model2.fit()

pred_start_date = test_data.index[0]

pred_end_date = test_data.index[-1]

predictions = model2_fit.predict(start=pred_start_date, end=pred_end_date)

plt.plot(predictions)

Please read the pinned thread and format the code properly so that we can read it properly, then show a complete error message so that we can properly understand the context of the error. Copy and paste, starting from the line that says Traceback (most recent call last): until the end, and format it the same way as the code.

Hello @kknechtel

Sure, well noted. please find below :

# 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import pandas_datareader as pdr
import statsmodels.api as sm
from matplotlib.pyplot import plot, scatter, show, xlabel, ylabel
import yfinance as yf
from statsmodels.tsa.ar_model import AutoReg
from datetime import datetime
from datetime import timedelta

#download data
start = '2021-01-01'
end = "2024-03-22"

stock1 = yf.Ticker('AAPL')   
stock1_data = stock1.history(interval='1d', start= start, end= end)

stock2 = yf.Ticker('AMZN')   
stock2_data = stock2.history(interval='1d', start= start, end= end)

#linear regression model
X = np.log(stock1_data["Close"])
Y = np.log(stock2_data["Close"])
X = sm.add_constant(X)
model = sm.OLS(Y,X)
results = model.fit()

#model parameters and spread
alpha = results.params.values[0]
beta = results.params.values[1]
errors = Y - (alpha + X["Close"]*beta)

#index set up
errors.index = pd.DatetimeIndex(errors.index).to_period('D')

#training and testing
train_end = datetime(2023,4,1)
test_end = datetime(2024,4,1)
train_data = errors[:train_end]
test_data = errors[train_end+timedelta(days=1):test_end]

#AR model fitting
model2 = AutoReg(train_data, lags=3)
model2_fit = model2.fit()

#last year prediction
pred_start_date = test_data.index[0]
pred_end_date = test_data.index[-1]

predictions = model2_fit.predict(start=pred_start_date, end=pred_end_date)

#plot results
plt.plot(predictions)

error message :

#
plt.plot(predictions)
Traceback (most recent call last):

  Cell In[36], line 1
    plt.plot(predictions)

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/pyplot.py:2812 in plot
    return gca().plot(

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py:1690 in plot
    self.add_line(line)

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:2304 in add_line
    self._update_line_limits(line)

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py:2327 in _update_line_limits
    path = line.get_path()

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/lines.py:1028 in get_path
    self.recache()

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/lines.py:659 in recache
    x = _to_unmasked_float_array(xconv).ravel()

  File ~/anaconda3/lib/python3.11/site-packages/matplotlib/cbook/__init__.py:1340 in _to_unmasked_float_array
    return np.asarray(x, float)

TypeError: float() argument must be a string or a real number, not 'Period'

Important
Figures are displayed in the Plots pane by default. To make them also appear inline in the console, you need to uncheck "Mute inline plotting" under the options menu of Plots.

Thank you in advance

Did you try to check the predictions results before doing the plot? Do you understand what they should look like? Is there something unexpected about them? The error tells us clearly that they do not have a suitable format for plotting.

Yes I did. Predictions are negative although we are expecting positive numbers (as errors training data). however I did the exercise with other stocks with positive predictions and can’t plot it. That’s why I was more focused on indexing issue and how to remove adequately the timezone information in errors data.

I mean the types, not just the values. How is the data structured? Do you expect to see some instances of the Period class? (Are you familiar with this class from the documentation? What should it mean?)

I never used Period class before and don’t clearly understand it. I saw it in a tutorial to change index frequency. I usually ran errors alone after and saw dtype:float + checked the class with print(type(errors)). Still a Series, so thought it was ok.
I did some research following to your questions and see that .to_period turns datetime values to period object. So I assume plot does not handle period “object” ? and I should find a way to convert back the period object to datetime ?

First, where the code says

in plain English, what do you want this part of the code to accomplish? What sort of data is in alpha, beta and index right before this code?

I used .to_period to clean up the index and keep only %Y-%m-%d date format. Which I assume is standard, then easier to use after in the code.

It enables to bypass TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.

Alpha and Beta are float numbers.
errors.index refers to time data.