Note

The following documentation follows the work of Dunis et al. (2006) that is based on Enders and Granger (1998)

Warning

In order to use this module, you should additionally install TensorFlow v2.8.0. and Keras v2.3.1. For more details, please visit our ArbitrageLab installation guide.

Threshold Auto Regression

Introduction

The gasoline crack spread can be interpreted as the profit margin gained by processing crude oil into unleaded gasoline. It is simply the monetary difference between West Texas Intermediate crude oil and Unleaded Gasoline, both of which are traded on the New York Mercantile Exchange (NYMEX).

\[S_{t} = GAS_t - WTI_t\]

\(S_{t}\) is the price of the spread at time \(t\) (in $ per barrel), \(GAS_t\) is the price of unleaded gasoline at time \(t\) (in $ per barrel), and \(WTI_t\) is the price of West Texas Intermediate crude oil at time \(t\) (in $ per barrel).

In Dunis et al. (2006) the case is made that the crack spread exhibits asymmetry at the $5 dollar mark, with seemingly larger moves occurring on the upside of the long-term ‘fair value’ than on the downside.

Cointegration was first introduced by (Engle and Granger 1987). The technique is to test the null hypothesis that any combination of two series contains a unit root. If the null hypothesis is refuted and the conclusion is that a unit root does not exist, the combination of the two series is cointegrated.

The phenomena of the spread exhibiting larger moves in one direction than in the other, is known as asymmetry. Since the traditional unit root test has only one parameter for the autoregressive estimate, it assumes upside and downside moves to be identical or symmetric. Non-linear cointegration was first introduced by (Enders and Granger 1998), who extended the unit root test by considering upside and downside moves separately, thus allowing for the possibility of asymmetric adjustment.

TAR Model

Enders and Granger extend the Dickey-Fuller test to allow for the unit root hypothesis to be tested against an alternative of asymmetric adjustment. Here, this is developed from its simplest form; consider the standard Dickey–Fuller test

\[\Delta \mu_{t} = p \mu_{t-1} + \epsilon_t\]

where \(\epsilon_t\) is a white noise process. The null hypothesis of \(p=0\) is tested against the alternative of \(p \neq 0\). \(p=0\) indicates that there is no unit root, and therefore \(\mu_i\) is a stationary series. If the series \(\mu_i\) are the residuals of a long-run cointegration relationship as indicated by Johansen, this simply results in a test of the validity of the cointegrating vector (the residuals of the cointegration equation should form a stationary series).

The extension provided by (Enders and Granger 1998) is to consider the upside and downside moves separately, thus allowing for the possibility of asymmetric adjustment. Following this approach;

\[\Delta \mu_{t} = I_t p_1 \mu_{i-1} + (1 - I_t) p_2 \mu_{i-1} + \epsilon_t\]

where \(I_t\) is the zero-one ‘heaviside’ indicator function. This paper uses the following specification;

\[I_t = \left \{ {{1, if \mu_{t-1} \geq 0} \over {0, if \mu_{t-1} < 0}} \right.\]

Enders and Granger refer to the model defined above as threshold autoregressive (TAR). The null hypothesis of symmetric adjustment is \((H_0: p_1 = p_2)\), which can be tested using the standard F-test (in this case the Wald test), with an additional requirement that both \(p_1\) and \(p_2\) do not equal zero. If \(p_1 \neq p_2\), cointegration between the underlying assets is non-linear.

Implementation

class TAR(price_data: DataFrame)

The Threshold AutoRegressive Model is an extension provided by Enders and Granger to the standard Dicker-Fuller Test. It considers the upside and downside moves separately, thus allowing for the possibility of asymmetric adjustment.

__init__(price_data: DataFrame)

Init function.

Parameters:

price_data – (pd.DataFrame) Collection of time series to construct to spread from.

fit() RegressionResults

Fits the OLS model.

Returns:

(RegressionResults)

summary() DataFrame

Returns summary as in paper. Uses the Wald Test to check for significance of the following hypotheses; - p_1 = 0 - p_2 = 0 - p_1 = p_2

Returns:

(pd.DataFrame) Summary of results.

Example

# Importing packages
import pandas as pd
from arbitragelab.ml_approach.tar import TAR

# Getting the dataframe with time series of asset returns
data = pd.read_csv('X_FILE_PATH.csv', index_col=0, parse_dates = [0])

# Calculating spread returns and std dev.
spread_series = data['spread']

# The TAR model expects a Zero mean series.
demeaned_spread = (spread_series - spread_series.mean())

# Initializing and fit TAR model.
model = TAR(demeaned_spread)
tar_results = model.fit()
tar_results.summary()

tar_results.fittedvalues.plot()

# Show metrics on model fit.
model.summary()

Research Notebooks

The following research notebooks can be used to better understand the components of the model described above.

References