Multivariate Cointegration Framework

Introduction

The cointegration relations between time series imply that the time series are bound together. Over time the time series might drift apart for a short period of time, but they ought to re-converge. This could serve as the basis of a profitable pairs trading strategy, as shown in the Minimum Profit Optimization module. The current module extends the Minimum Profit Optimization framework to three or more cointegrated assets. The corresponding trading strategy was illustrated with an empirical application to trading four European stock market indices at a daily frequency.

Multivariate Cointegration

Cointegration is defined by the stochastic relationships among the asset log returns in the multivariate cointegration framework.

Let \(P_i\), where \(i = 1, 2, \ldots, N\) denote the price of \(N\) assets. The continuously compounded asset returns, i.e. log-returns at time \(t > 0\) can be written as:

\[r_t^i = \ln{P_t^i} - \ln{P_{t-1}^i}\]

Now construct a process \(Y_t\) as a linear combination of the \(N\) asset prices:

\[Y_t = \sum_{i=1}^N b^i \ln{P_t^i}\]

where \(b^i\) denotes the \(i\)-th element for a finite vector \(\mathbf{b}\). The corresponding asset returns series \(Z_t\) can be defined as:

\[Z_t = Y_t - Y_{t-1} = \sum_{i=1}^N b^i r_t^i\]

Assume that the memory of the process \(Y_t\) does not extend into the infinite past, which can be expressed as the following expression in terms of the autocovariance of the process \(Y_t\):

\[\lim_{p \to \infty} \text{Cov} \lbrack Y_t, Y_{t-p} \rbrack = 0\]

Then the log-price process \(Y_t\) is stationary, if and only if the following three conditions on log-returns process \(Z_t\) are satisfied:

\begin{gather*} E[Z_t] = 0 \\ \text{Var }Z_t = -2 \sum_{p=1}^{\infty} \text{Cov} \lbrack Z_t, Z_{t-p} \rbrack \\ \sum_{p=1}^{\infty} p \text{ Cov} \lbrack Z_t, Z_{t-p} \rbrack < \infty \end{gather*}

When \(Y_t\) is stationary, the log-price series of the assets are cointegrated.

For equity markets, the log-returns time series can be assumed as stationary and thus satisfy the above conditions. Therefore, when it comes to empirical applications, the Johansen test could be directly applied to the log price series to derive the vector \(\mathbf{b}\).

Strategy Idea

The core idea of the strategy is to bet on the spread formed by the cointegrated \(N\) assets that have gone apart but are expected to mean revert in the future. The trading strategy, using the notations in the previous section, can be presented as:

For each time period, trade \(-b^i C \sum_{p=1}^{\infty} Z_{t-p}\) value of asset \(i, \: i=1, \ldots, N\)

where \(C\) is a positive scale factor. The profit of this strategy can be calculated:

\[\pi_t = \sum_{i=1}^N -b^i C \bigg[ \sum_{p=1}^{\infty} Z_{t-p} \bigg] r_t^i = -C \sum_{p=1}^{\infty} Z_{t-p} Z_t\]

The expectation of the profit is thus:

\begin{align*} E[\pi_t] & = E \bigg[ -C \sum_{p=1}^{\infty} Z_{t-p} Z_t \bigg] \\ & = -C \sum_{p=1}^{\infty} (Z_{t-p} - E[Z_t])(Z_t - E[Z_t]) \\ & = -C \sum_{p=1}^{\infty} \text{Cov} [Z_t, Z_{t-p}] \\ & = 0.5 \: C \text{ Var} Z_t > 0 \end{align*}

In the above derivation, the two conditions introduced in the previous section were applied:

\(E[Z_t] = 0\), and
\(\text{Var }Z_t = -2 \sum_{p=1}^{\infty} \text{Cov} \lbrack Z_t, Z_{t-p} \rbrack\).

By definition, both \(C\) and the variance \(\text{Var } Z_t\) are positive values, which means the expected profit of this strategy is positive. However, the portfolio resulting from the strategy is not dollar neutral.

To construct a dollar neutral portfolio, the assets need to be partitioned based on the sign of the cointegration coefficient of each asset, \(b^i\), into two disjoint sets, \(L\) and \(S\).

\[ \begin{align}\begin{aligned}i \in L \iff b^i \geq 0\\i \in S \iff b^i < 0\end{aligned}\end{align} \]

Then the notional of each asset to be traded can be calculated:

\[ \begin{align}\begin{aligned}\frac{-b^i C \text{ sgn} \bigg( \sum_{p=1}^{\infty} Z_{t-p} \bigg)}{\sum_{j \in L} b^j}, \: i \in L\\\frac{b^i C \text{ sgn} \bigg( \sum_{p=1}^{\infty} Z_{t-p} \bigg)}{\sum_{j \in L} b^j}, \: i \in S\end{aligned}\end{align} \]

where \(\text{sgn(x)}\) is the sign function that returns the sign of \(x\).

Note

The resulting portfolio will have \(C\) dollars (or other currencies) invested in long positions and \(C\) dollars (or other currencies) in short positions, and thus is dollar-neutral.
The expected profit of the strategy is defined by the log-returns, so altering the notional value of the positions will not change the returns.
The strategy will NOT always long the assets in the set \(L\) (or always short the assets in the set \(S\)).

In a real implementation, the price history of the assets is finite, which indicates that the true value of \(\sum_{p=1}^\infty Z_{t-p}\) cannot be obtained. The assumptions of the multivariate cointegration framework suggest that returns of further history do not have predictability about the current returns (\(\lim_{p \to \infty} \text{Cov} \lbrack Y_t, Y_{t-p} \rbrack = 0\)). Therefore, a lag parameter \(P\) will be introduced and the infinite summation will be replaced by a finite sum \(\sum_{p=1}^P Z_{t-p}\).

Trading the Strategy

The MultivariateCointegration class can be used to generate the cointegration vector, so that later the trading signals (number of shares to long/short per each asset) can be generated using the Multivariate Cointegration Trading Rule described in the Spread Trading section of the documentation.

The strategy is trading at daily frequency and always in the market.

Implementation

Example

# Importing packages
import pandas as pd
from arbitragelab.cointegration_approach.multi_coint import MultivariateCointegration

# Read price series data, set date as index
data = pd.read_csv('X_FILE_PATH.csv', parse_dates=['Date'])
data.set_index('Date', inplace=True)

# Initialize the optimizer
optimizer = MultivariateCointegration()

# Set the training dataset
optimizer = optimizer.set_train_dataset(data)

# Fill NaN values
optimizer.fillna_inplace(nan_method='ffill')

# Generating the cointegration vector to later use in a trading strategy
coint_vec = optimizer.get_coint_vec()

Research Notebooks

The following research notebook can be used to better understand the Multivariate Cointegration Strategy described above.

Multivariate Cointegration Strategy

References

Galenko, A., Popova, E. and Popova, I., 2012. Trading in the presence of cointegration. The Journal of Alternative Investments, 15(1), pp.85-97.