Note

The following documentation closely follows the paper:

Multivariate Cointegration Framework

Introduction

The cointegration relations between time series imply that the time series are bound together. Over time the time series might drift apart for a short period of time, but they ought to re-converge. This could serve as the basis of a profitable pairs trading strategy, as shown in the Minimum Profit Optimization module. The current module extends the Minimum Profit Optimization framework to three or more cointegrated assets. The corresponding trading strategy was illustrated with an empirical application to trading four European stock market indices at a daily frequency.

Multivariate Cointegration

Cointegration is defined by the stochastic relationships among the asset log returns in the multivariate cointegration framework.

Let \(P_i\), where \(i = 1, 2, \ldots, N\) denote the price of \(N\) assets. The continuously compounded asset returns, i.e. log-returns at time \(t > 0\) can be written as:

\[r_t^i = \ln{P_t^i} - \ln{P_{t-1}^i}\]

Now construct a process \(Y_t\) as a linear combination of the \(N\) asset prices:

\[Y_t = \sum_{i=1}^N b^i \ln{P_t^i}\]

where \(b^i\) denotes the \(i\)-th element for a finite vector \(\mathbf{b}\). The corresponding asset returns series \(Z_t\) can be defined as:

\[Z_t = Y_t - Y_{t-1} = \sum_{i=1}^N b^i r_t^i\]

Assume that the memory of the process \(Y_t\) does not extend into the infinite past, which can be expressed as the following expression in terms of the autocovariance of the process \(Y_t\):

\[\lim_{p \to \infty} \text{Cov} \lbrack Y_t, Y_{t-p} \rbrack = 0\]

Then the log-price process \(Y_t\) is stationary, if and only if the following three conditions on log-returns process \(Z_t\) are satisfied:

\begin{gather*} E[Z_t] = 0 \\ \text{Var }Z_t = -2 \sum_{p=1}^{\infty} \text{Cov} \lbrack Z_t, Z_{t-p} \rbrack \\ \sum_{p=1}^{\infty} p \text{ Cov} \lbrack Z_t, Z_{t-p} \rbrack < \infty \end{gather*}

When \(Y_t\) is stationary, the log-price series of the assets are cointegrated.

For equity markets, the log-returns time series can be assumed as stationary and thus satisfy the above conditions. Therefore, when it comes to empirical applications, the Johansen test could be directly applied to the log price series to derive the vector \(\mathbf{b}\).

Strategy Idea

The core idea of the strategy is to bet on the spread formed by the cointegrated \(N\) assets that have gone apart but are expected to mean revert in the future. The trading strategy, using the notations in the previous section, can be presented as:

For each time period, trade \(-b^i C \sum_{p=1}^{\infty} Z_{t-p}\) value of asset \(i, \: i=1, \ldots, N\)

where \(C\) is a positive scale factor. The profit of this strategy can be calculated:

\[\pi_t = \sum_{i=1}^N -b^i C \bigg[ \sum_{p=1}^{\infty} Z_{t-p} \bigg] r_t^i = -C \sum_{p=1}^{\infty} Z_{t-p} Z_t\]

The expectation of the profit is thus:

\begin{align*} E[\pi_t] & = E \bigg[ -C \sum_{p=1}^{\infty} Z_{t-p} Z_t \bigg] \\ & = -C \sum_{p=1}^{\infty} (Z_{t-p} - E[Z_t])(Z_t - E[Z_t]) \\ & = -C \sum_{p=1}^{\infty} \text{Cov} [Z_t, Z_{t-p}] \\ & = 0.5 \: C \text{ Var} Z_t > 0 \end{align*}

In the above derivation, the two conditions introduced in the previous section were applied:

  1. \(E[Z_t] = 0\), and

  2. \(\text{Var }Z_t = -2 \sum_{p=1}^{\infty} \text{Cov} \lbrack Z_t, Z_{t-p} \rbrack\).

By definition, both \(C\) and the variance \(\text{Var } Z_t\) are positive values, which means the expected profit of this strategy is positive. However, the portfolio resulting from the strategy is not dollar neutral.

To construct a dollar neutral portfolio, the assets need to be partitioned based on the sign of the cointegration coefficient of each asset, \(b^i\), into two disjoint sets, \(L\) and \(S\).

\[ \begin{align}\begin{aligned}i \in L \iff b^i \geq 0\\i \in S \iff b^i < 0\end{aligned}\end{align} \]

Then the notional of each asset to be traded can be calculated:

\[ \begin{align}\begin{aligned}\frac{-b^i C \text{ sgn} \bigg( \sum_{p=1}^{\infty} Z_{t-p} \bigg)}{\sum_{j \in L} b^j}, \: i \in L\\\frac{b^i C \text{ sgn} \bigg( \sum_{p=1}^{\infty} Z_{t-p} \bigg)}{\sum_{j \in L} b^j}, \: i \in S\end{aligned}\end{align} \]

where \(\text{sgn(x)}\) is the sign function that returns the sign of \(x\).

Note

  • The resulting portfolio will have \(C\) dollars (or other currencies) invested in long positions and \(C\) dollars (or other currencies) in short positions, and thus is dollar-neutral.

  • The expected profit of the strategy is defined by the log-returns, so altering the notional value of the positions will not change the returns.

  • The strategy will NOT always long the assets in the set \(L\) (or always short the assets in the set \(S\)).

In a real implementation, the price history of the assets is finite, which indicates that the true value of \(\sum_{p=1}^\infty Z_{t-p}\) cannot be obtained. The assumptions of the multivariate cointegration framework suggest that returns of further history do not have predictability about the current returns (\(\lim_{p \to \infty} \text{Cov} \lbrack Y_t, Y_{t-p} \rbrack = 0\)). Therefore, a lag parameter \(P\) will be introduced and the infinite summation will be replaced by a finite sum \(\sum_{p=1}^P Z_{t-p}\).

Trading the Strategy

The MultivariateCointegration class can be used to generate the cointegration vector, so that later the trading signals (number of shares to long/short per each asset) can be generated using the Multivariate Cointegration Trading Rule described in the Spread Trading section of the documentation.

The strategy is trading at daily frequency and always in the market.

Implementation

This module generates a cointegration vector for mean-reversion trading of three or more cointegrated assets.

class MultivariateCointegration

This class optimizes bounds for mean-reversion trading of a spread consisting of three and more assets.

The implementation is based on the method described by Galenko, A., Popova, E. and Popova, I. in “Trading in the presence of cointegration”

__init__()

Constructor of the multivariate cointegration trading signal class.

The log price dataframe and the cointegration vectors are stored for repeating use.

property asset_df: DataFrame

Property that gives read-only access to the in-sample asset price dataframe.

Returns:

(pd.DataFrame) Dataframe of asset prices.

static calc_log_price(price_df: DataFrame) DataFrame

Calculate the log price of each asset for cointegration coefficient calculation.

Parameters:

price_df – (pd.DataFrame) Dataframe that contains the raw asset price.

Returns:

(pd.DataFrame) Log prices of the assets.

static calc_price_diff(price_df: DataFrame) DataFrame

Calculate the price difference of day t and day t-1 of each asset.

Parameters:

price_df – (pd.DataFrame) Dataframe that contains the raw asset price.

Returns:

(pd.DataFrame) Log prices of the assets.

fillna_inplace(nan_method: str = 'ffill', order: int = 3)

Replace the class attribute dataframes with imputed training dataframe.

Parameters:
  • nan_method – (str) Missing value imputation method. If “ffill” then use front-fill; if “spline” then use cubic spline.

  • order – (int) Polynomial order for spline function.

fit(log_price: DataFrame, sig_level: str = '95%', suppress_warnings: bool = False) array

Use Johansen test to retrieve the cointegration vector.

Parameters:
  • log_price – (pd.DataFrame) Log price dataframe used to derive cointegration vector.

  • sig_level – (str) Cointegration test significance level. Possible options are “90%”, “95%”, and “99%”.

  • suppress_warnings – (bool) Boolean flag to suppress the cointegration warning message.

Returns:

(np.array) The cointegration vector, b.

get_coint_vec() Tuple[DataFrame, ...]

Generate contegration vector to generate trading signals.

Returns:

(np.array) The cointegration vector, b.

static plot_returns(returns: DataFrame, figw: float = 15.0, figh: float = 15.0, title: str = 'Returns', start_date: Timestamp | None = None, end_date: Timestamp | None = None) Figure

Plot the equity curve only.

Parameters:
  • returns – (pd.DataFrame) Daily returns dataframe.

  • figw – (float) Figure width.

  • figh – (float) Figure height.

  • title – (str) Figure title.

  • start_date – (pd.Timestamp) Start point of the plot.

  • end_date – (pd.Timestamp) End point of the plot.

Returns:

(plt.Figure) A single equity curve plot.

set_train_dataset(price_df: DataFrame)

Provide price series for model to calculate the cointegration coefficient and beta.

Parameters:

price_df – (pd.DataFrame) Price series dataframe which contains both series.

static summary(returns_df: DataFrame) DataFrame

Statistics of the trading strategy returns.

The statistics include: mean, standard deviation, skewness, kurtosis, Sharpe ratio, Sortino ratio, final cumulative returns, percentage of up days and down days, max returns, and min returns.

Parameters:

returns_df – (pd.DataFrame) Daily percentage returns dataframe.

Returns:

(pd.DataFrame) Trading strategy returns statistics dataframe.

Example

# Importing packages
import pandas as pd
from arbitragelab.cointegration_approach.multi_coint import MultivariateCointegration

# Read price series data, set date as index
data = pd.read_csv('X_FILE_PATH.csv', parse_dates=['Date'])
data.set_index('Date', inplace=True)

# Initialize the optimizer
optimizer = MultivariateCointegration()

# Set the training dataset
optimizer = optimizer.set_train_dataset(data)

# Fill NaN values
optimizer.fillna_inplace(nan_method='ffill')

# Generating the cointegration vector to later use in a trading strategy
coint_vec = optimizer.get_coint_vec()

Research Notebooks

The following research notebook can be used to better understand the Multivariate Cointegration Strategy described above.

References