Rolling Regression

Rolling OLS applies OLS across a fixed windows of observations and then rolls (moves or slides) the window across the data set. They key parameter is window which determines the number of observations used in each OLS regression. By default, RollingOLS drops missing values in the window and so will estimate the model using the available data points.

Estimated values are aligned so that models estimated using data points \(i, i+1, ... i+window\) are stored in location \(i+window\).

Start by importing the modules that are used in this notebook.

[1]:
import pandas_datareader as pdr
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
import matplotlib.pyplot as plt
import seaborn
seaborn.set_style('darkgrid')
pd.plotting.register_matplotlib_converters()
%matplotlib inline

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-e6136d107186> in <module>
----> 1 import pandas_datareader as pdr
      2 import pandas as pd
      3 import statsmodels.api as sm
      4 from statsmodels.regression.rolling import RollingOLS
      5 import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'pandas_datareader'

pandas-datareader is used to download data from Ken French’s website. The two data sets downloaded are the 3 Fama-French factors and the 10 industry portfolios. Data is available from 1926.

The data are monthly returns for the factors or industry portfolios.

[2]:
factors = pdr.get_data_famafrench('F-F_Research_Data_Factors', start='1-1-1926')[0]
print(factors.head())
industries = pdr.get_data_famafrench('10_Industry_Portfolios', start='1-1-1926')[0]
print(industries.head())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-04e698bc1d37> in <module>
----> 1 factors = pdr.get_data_famafrench('F-F_Research_Data_Factors', start='1-1-1926')[0]
      2 print(factors.head())
      3 industries = pdr.get_data_famafrench('10_Industry_Portfolios', start='1-1-1926')[0]
      4 print(industries.head())

NameError: name 'pdr' is not defined

The first model estimated is a rolling version of the CAP-M that regresses the excess return on Technology sector firms on the excess return on the market.

The window is 60 months, and so results are available after the first 60 (window) months. The first 59 (window - 1) estimates are all nan filled.

[3]:
endog = industries.HiTec - factors.RF.values
exog = sm.add_constant(factors['Mkt-RF'])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
params = rres.params
print(params.head())
print(params.tail())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-42b186682c02> in <module>
----> 1 endog = industries.HiTec - factors.RF.values
      2 exog = sm.add_constant(factors['Mkt-RF'])
      3 rols = RollingOLS(endog, exog, window=60)
      4 rres = rols.fit()
      5 params = rres.params

NameError: name 'industries' is not defined

We next plot the market loading along with a 95% point-wise confidence interval. The alpha=False omits the constant column, if present.

[4]:
fig = rres.plot_recursive_coefficient(variables=['Mkt-RF'], figsize=(14,6))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-b0cce1b02155> in <module>
----> 1 fig = rres.plot_recursive_coefficient(variables=['Mkt-RF'], figsize=(14,6))

NameError: name 'rres' is not defined

Next, the model is expanded to include all three factors, the excess market, the size factor and the value factor.

[5]:
exog_vars = ['Mkt-RF', 'SMB', 'HML']
exog = sm.add_constant(factors[exog_vars])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
fig = rres.plot_recursive_coefficient(variables=exog_vars, figsize=(14,18))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-ec25ece0a7c5> in <module>
      1 exog_vars = ['Mkt-RF', 'SMB', 'HML']
----> 2 exog = sm.add_constant(factors[exog_vars])
      3 rols = RollingOLS(endog, exog, window=60)
      4 rres = rols.fit()
      5 fig = rres.plot_recursive_coefficient(variables=exog_vars, figsize=(14,18))

NameError: name 'sm' is not defined

Formulas

RollingOLS and RollingWLS both support model specification using the formula interface. The example below is equivalent to the 3-factor model estimated previously. Note that one variable is renamed to have a valid Python variable name.

[6]:
joined = pd.concat([factors, industries], axis=1)
joined['Mkt_RF'] = joined['Mkt-RF']
mod = RollingOLS.from_formula('HiTec ~ Mkt_RF + SMB + HML', data=joined, window=60)
rres = mod.fit()
print(rres.params.tail())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-7dbf48a1c710> in <module>
----> 1 joined = pd.concat([factors, industries], axis=1)
      2 joined['Mkt_RF'] = joined['Mkt-RF']
      3 mod = RollingOLS.from_formula('HiTec ~ Mkt_RF + SMB + HML', data=joined, window=60)
      4 rres = mod.fit()
      5 print(rres.params.tail())

NameError: name 'pd' is not defined

RollingWLS: Rolling Weighted Least Squares

The rolling module also provides RollingWLS which takes an optional weights input to perform rolling weighted least squares. It produces results that match WLS when applied to rolling windows of data.

Fit Options

Fit accepts other optional keywords to set the covariance estimator. Only two estimators are supported, 'nonrobust' (the classic OLS estimator) and 'HC0' which is White’s heteroskedasticity robust estimator.

You can set params_only=True to only estimate the model parameters. This is substantially faster than computing the full set of values required to perform inference.

Finally, the parameter reset can be set to a positive integer to control estimation error in very long samples. RollingOLS avoids the full matrix product when rolling by only adding the most recent observation and removing the dropped observation as it rolls through the sample. Setting reset uses the full inner product every reset periods. In most applications this parameter can be omitted.

[7]:
%timeit rols.fit()
%timeit rols.fit(params_only=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-7c15b0dab35e> in <module>
----> 1 get_ipython().run_line_magic('timeit', 'rols.fit()')
      2 get_ipython().run_line_magic('timeit', 'rols.fit(params_only=True)')

/usr/lib/python3/dist-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line, _stack_depth)
   2324                 kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2325             with self.builtin_trap:
-> 2326                 result = fn(*args, **kwargs)
   2327             return result
   2328

<decorator-gen-60> in timeit(self, line, cell, local_ns)

/usr/lib/python3/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188
    189         if callable(arg):

/usr/lib/python3/dist-packages/IPython/core/magics/execution.py in timeit(self, line, cell, local_ns)
   1167             for index in range(0, 10):
   1168                 number = 10 ** index
-> 1169                 time_number = timer.timeit(number)
   1170                 if time_number >= 0.2:
   1171                     break

/usr/lib/python3/dist-packages/IPython/core/magics/execution.py in timeit(self, number)
    167         gc.disable()
    168         try:
--> 169             timing = self.inner(it, self.timer)
    170         finally:
    171             if gcold:

<magic-timeit> in inner(_it, _timer)

NameError: name 'rols' is not defined