Note

The following implementations and documentation closely follow the publication by Bogomolov, T: Pairs trading based on statistical variability of the spread process. Quantitative Finance, 13(9): 1411–1430.

H-Strategy



In this paper, the author proposes a new non-parametric approach to pairs trading based on the idea of Renko and Kagi charts. This approach exploits statistical information about the variability of the tradable process. The approach does not aim to find a long-run mean of the process and trade towards it like other methods of pairs trading. Instead, it manages the problem of how far the process should move in one direction before trading in the opposite direction potentially becomes profitable, which is done by measuring the variability of the process.

H-construction

Suppose \(P(t)\) is a continuous time series on the time interval \([0, T]\).

Renko construction

Step 1: Generate the Renko Process

The Renko process \(X(i)\) is defined as,

\[X(i) : X(i) = P(\tau_i), i = 0, 1, ..., N,\]

where \(\tau_i\), \(i = 0, 1, ..., N\) is an increasing sequence of time moments such that

for some arbitrary \(H > 0\), \(\tau_0 = 0\) and \(P(\tau_0) = P(0)\),

\[H \leq \max \limits_{t \in [0,T]} P(t) - \min \limits_{t \in [0,T]} P(t),\]
\[\tau_i = inf\{u \in [\tau_{i - 1}, T] : |P(u) − P(\tau_{i - 1})| = H\}.\]

Step 2: Determine Turning Points

We create another sequence of time moments \(\{(\tau^a_n, \tau^b_n), n = 0, 1, ..., M\}\) based on the sequence \({\tau_i}\). The sequence \(\{\tau^a_n\}\) defines time moments when the renko process \(X(i)\) has a local maximum or minimum, that is the process \(X(i) = P(\tau_i)\) changes its direction, and the sequence \(\{\tau^b_n\}\) defines the time moments when the local maximum or minimum is detected.

More precisely, when take \(\tau^a_0 = \tau_0\) and \(\tau^b_0 = \tau_1\) then

\[\tau^b_n = min\{\tau_i > \tau^b_{n-1}: (P(\tau_i) − P(\tau_{i-1}))(P(\tau_{i-1}) − P(\tau_{i-2})) < 0\},\]
\[\tau^a_n = \{\tau_{i - 1} : \tau^b_n = \tau_i\}.\]

Kagi construction

The Kagi construction is similar to the Renko construction with the only difference being that to create the sequence of time moments \(\{(\tau^a_n, \tau^b_n), n = 0, 1, ..., M\}\) for the Kagi construction we use local maximums and minimums of the process \(P(t)\) rather than the process \(X(i)\) derived from it. The sequence \(\{\tau^a_n\}\) then defines the time moments when the price process \(P(t)\) has a local maximum or minimum and the sequence \(\{\tau^b_n\}\) defines the time moments when that local maximum or minimum is recognized, that is, the time when the process \(P(t)\) moves away from its last local maximum or minimum by a distance equal to \(H\).

More precisely, \(\tau^a_0\), \(\tau^b_0\) and \(S_0\) is defined as,

\[\tau^b_0 = inf\{u \in [0, T] : \max \limits_{t \in [0,u]} P(t) − \min \limits_{t \in [0,u]} P(t) = H\},\]
\[\tau^a_0 = inf\{u < \tau^b_0: |P(u) − P(\tau^b_0)| = H\},\]
\[S_0 = sign(P(\tau^a_0) − P(\tau^b_0)),\]

where \(S_0\) can take two values: \(1\) for a local maximum and \(−1\) for a local minimum.

Then we define \((\tau^a_n, \tau^b_n)\), \(n > 0\) recursively. The construction of the full sequence \(\{(\tau^a_n, \tau^b_n), n = 0, 1, ..., M\}\) is done inductively by alternating the following cases.

\(Case\ 1: \ \ S_{n-1} = -1\)

if \(S_{n-1} = -1\), then \(\tau^a_n, \tau^b_n\) and \(S_n\) is defined as,

\[\tau^b_n = inf\{u \in [\tau^a_{n-1}, T] : P(u) − \min \limits_{t \in [\tau^a_{n-1}\ \ ,\ u]} P(t) = H\},\]
\[\tau^a_n = inf\{u < \tau^b_n: P(u) = \min \limits_{t \in [\tau^a_{n-1}\ \ ,\ \tau^b_n]} P(t)\},\]
\[S_n = 1.\]

\(Case\ 2: \ \ S_{n-1} = 1\)

if \(S_{n-1} = 1\), then \(\tau^a_n, \tau^b_n\) and \(S_n\) is defined as,

\[\tau^b_n = inf\{u \in [\tau^a_{n-1}, T] : \max \limits_{t \in [\tau^a_{n-1}\ \ ,\ u]} P(t) - P(u) = H\},\]
\[\tau^a_n = inf\{u < \tau^b_n: P(u) = \max \limits_{t \in [\tau^a_{n-1}\ \ ,\ \tau^b_n]} P(t)\},\]
\[S_n = -1.\]

H-statistics

H-inversion

H-inversion counts the number of times the process \(P(t)\) changes its direction for selected \(H\), \(T\) and \(P(t)\). It is given by

\[N_T (H, P) = \max \{n : \tau^{b}_{n} = T\} = N,\]

where \(H\) denotes the threshold of the H-construction, and \(P\) denotes the process \(P(t)\).

H-distances

H-distances counts the sum of vertical distances between local maximums and minimums to the power \(p\). It is given by

\[V^p_T (H, P) = \sum_{n = 1}^{N}|P(\tau^a_n) − P(\tau^a_{n−1})|^p.\]

H-volatility

H-volatility of order p measures the variability of the process \(P(t)\) for selected \(H\) and \(T\). It is given by

\[\xi^p_T = {V^p_T (H, P)}/{N_T (H, P)}.\]

Strategies

Momentum Strategy

The investor buys (sells) an asset at a stopping time \(\tau^b_n\) when he or she recognizes that the process passed its previous local minimum (maximum)and the investor expects a continuation of the movement. The signal \(s_t\) is given by

\[\begin{split}s_t = \left\{\begin{array}{l} +1,\ if\ t = \tau^b_n\ and\ P(\tau^b_n) - P(\tau^a_n) > 0\\ -1,\ if\ t = \tau^b_n\ and\ P(\tau^b_n) - P(\tau^a_n) < 0\\ 0,\ otherwise \end{array}\right.\end{split}\]

where \(+1\) indicates opening a long trade or closing a short trade, \(-1\) indicates opening a short trade or closing a long trade and \(0\) indicates holding the previous position.

The profit from one trade according to the momentum H-strategy over time from \(\tau^b_{n−1}\) to \(\tau^b_{n}\) is

\[Y_{\tau^b_n} = (P(\tau^b_n) − P(\tau^b_{n−1})) · sign(P(\tau^a_n) − P(\tau^a_{n−1}))\]

and the total profit from time \(0\) till time \(T\) is

\[Y_T(H, P) = (\xi^1_T (H, P) − 2H) \cdot N_T (H, P)\]

Contrarian Strategy

The investor sells (buys) an asset at a stopping time \(\tau^b_n\) when he or she decides that the process has passed far enough from its previous local minimum (maximum), and the investor expects a movement reversion. The signal \(s_t\) is given by

\[\begin{split}s_t = \left\{\begin{array}{l} +1,\ if\ t = \tau^b_n\ and\ P(\tau^b_n) - P(\tau^a_n) < 0\\ -1,\ if\ t = \tau^b_n\ and\ P(\tau^b_n) - P(\tau^a_n) > 0\\ 0,\ otherwise \end{array}\right.\end{split}\]

where \(+1\) indicates opening a long trade or closing a short trade, \(-1\) indicates opening a short trade or closing a long trade and \(0\) indicates holding the previous position.

The profit from one trade according to the momentum H-strategy over time from \(\tau^b_{n−1}\) to \(\tau^b_{n}\) is

\[Y_{\tau^b_n} = (P(\tau^b_n) − P(\tau^b_{n−1})) · sign(P(\tau^a_{n−1}) - P(\tau^a_n)),\]

and the total profit from time $0$ till time $T$ is

\[Y_T(H, P) = (2H - \xi^1_T (H, P)) \cdot N_T (H, P).\]

Properties

It is clear that the choice of H-strategy depends on the value of H-volatility. If \(\xi^1_T > 2H\), then to achieve a positive profit the investor should employ a momentum H-strategy. If, on the other hand, \(\xi^1_T < 2H\) then the investor should use a contrarian H-strategy.

Suppose \(P(t)\) follows the Wiener process, the H-volatility \(\xi^1_T = 2H\). As a result, it is impossible to profit by trading on the process \(P(t)\). We can also see that H-volatility \(\xi^1_T = 2H\) is a property of a martingale. Likewise \(\xi^1_T > 2H\) could be a property of a sub-martingale or a super-martingale or a process that regularly switches back-and-forth over time between a sub-martingale and a super-martingale.

In this paper, the author proposes that for any mean-reverting process, regardless of its distribution, the H-volatility is less than \(2H\). Hence, theoretically, trading the mean-reverting process by the contrarian H-strategy is profitable for any choice of \(H\).

Pairs Selection

  • Purpose: Select trading pairs from the assets pool by using the properties of the H-construction.

  • Algorithm:

    1. Determine the assets pool and the length of historical data.

    2. Take log-prices of all assets based on the history, combine them in all possible pairs and build a spread process for each pair.

      • \(spread_{ij} = log(P_i) - log(P_j)\)

    3. For each spread process, calculate its standard deviation, and set it as the threshold of the H-construction.

    4. Determine the construction type of the H-construction.

      • It could be either Renko or Kagi.

    5. Build the H-construction on the spread series formed by each possible pair.

    6. The top N pairs with the highest/lowest H-inversion are used for pairs trading.

      • Mean-reverting process tends to have higher H-inversion.

Implementation

HConstruction

class HConstruction(series: Series, threshold: float, method: str = 'Kagi')

This class implements a statistical arbitrage strategy described in the following publication: Bogomolov, T. (2013). Pairs trading based on statistical variability of the spread process. Quantitative Finance, 13(9): 1411–1430.

__init__(series: Series, threshold: float, method: str = 'Kagi')

Initializes the module parameters.

Parameters:
  • series – (pd.Series) A time series for building the H-construction. The dimensions should be n x 1.

  • threshold – (float) The threshold of the H-construction.

  • method – (str) The method used to build the H-construction. The options are [“Kagi”, “Renko”].

HConstruction.h_inversion() int

Calculates H-inversion statistic, which counts the number of times the series changes its direction for the selected threshold.

Returns:

(int) The value of the H-inversion.

HConstruction.h_distances(p: int = 1) float

Calculates the sum of vertical distances between local maximums and minimums to the power p.

Parameters:

p – (int) The number of powers when calculating the distance.

Returns:

(float) The sum of vertical distances between local maximums and minimums.

HConstruction.h_volatility(p: int = 1) float

Calculates H-volatility statistic of order p, which is a measure of the variability of the series for the selected threshold.

Parameters:

p – (int) The order of H-volatility.

Returns:

(float) The value of the H-volatility.

HConstruction.get_signals(method: str = 'contrarian') Series

Gets the signals at each timestamp based on the method described in the paper.

Parameters:

method – (str) The method used to determine the signals. The options are [“contrarian”, “momentum”].

Returns:

(pd.Series) The time series contains the signals at each timestamp.

HConstruction.extend_series(series: Series)

Extends the original series used as input during initialization and and rebuilds the H-construction on the extended series.

Parameters:

series – (pd.Series) A time series for extending the original series used as input during initialization. The dimensions should be n x 1.

HSelection

class HSelection(data: DataFrame, method: str = 'Kagi')

This class implements a pairs selection strategy described in the following publication: Bogomolov, T. (2013). Pairs trading based on statistical variability of the spread process. Quantitative Finance, 13(9): 1411–1430.

__init__(data: DataFrame, method: str = 'Kagi')

Initializes the module parameters.

Parameters:
  • data – (pd.DataFrame) Price data with columns containing asset prices. The dimensions should be n x m, where n denotes the length of the data and m denotes the number of assets.

  • method – (str) The method used to build the H-construction for each possible pair of assets. The options are [“Kagi”, “Renko”].

HSelection.select(minimum_length: int | None = None)

Calculates H-inversion statistic for the spread series formed by each possible pair, and stores the results.

Parameters:

minimum_length – (int) Minimum length of consistent index required for the selected pair to do H-construction.

HSelection.get_pairs(num: int, method: str = 'highest', allow_repeat: bool = False) list

Gets top N pairs with the highest/lowest H-inversion.

Parameters:
  • num – (int) The number of pairs that the user wants to get.

  • method – (str) The method used to select pairs. The options are [“highest”, “lowest”].

  • allow_repeat – (bool) Whether the user allows the same asset to appear repeatedly in different pairs.

Returns:

(list) The list contains the informations of the top N pairs. Each element in the list will contains three things: [H-inversion statistic, Threshold of the H-construction, Tuple contains the column names of two selected assets].

Examples

HConstruction

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import yfinance as yf
>>> from arbitragelab.time_series_approach.h_strategy import HConstruction
>>> data = yf.download("KO PEP", start="2019-01-01", end="2020-12-31", progress=False)[
...     "Adj Close"
... ]
>>> # Construct spread series
>>> series = np.log(data["KO"]) - np.log(data["PEP"])
>>> threshold = series["2019"].std()
>>> hc = HConstruction(series["2020"], threshold, "Kagi")
>>> # Get H-statistics
>>> hc.h_inversion()  
19
>>> hc.h_distances()  
1.475...
>>> hc.h_volatility()  
0.0776...
>>> # Extract signals
>>> signals = hc.get_signals("contrarian")
>>> signals  
Date
2020-01-02 0.0...
>>> # A quick backtest
>>> positions = signals.replace(0, np.nan).ffill()
>>> returns = data["KO"]["2020"].pct_change() - data["PEP"]["2020"].pct_change()
>>> total_returns = ((positions.shift(1) * returns).dropna() + 1).cumprod()
>>> fig = total_returns.plot()
>>> fig  
<Axes:...>

HSelection

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import yfinance as yf
>>> from arbitragelab.time_series_approach.h_strategy import HSelection
>>> # Fetch data
>>> tickers = "AAPL MSFT AMZN META GOOGL GOOG TSLA NVDA JPM"
>>> data = yf.download(tickers, start="2019-01-01", end="2020-12-31", progress=False)[
...     "Adj Close"
... ]
>>> hs = HSelection(data)
>>> hs.select()  # Calculate H-inversion statistic
>>> pairs = hs.get_pairs(5, "highest", False)
>>> # Inspect the first pair
>>> # Each pair contains [H-inversion statistic, H-construction threshold, Asset pair]
>>> pairs[0]  
[34, 0.0034..., ('GOOG', 'GOOGL')]
>>> # Inspect another pair
>>> pairs[1]  
[12, 0.132..., ('AAPL', 'NVDA')]

Research Notebooks

The following research notebook can be used to better understand the method described above.

Research Article


Presentation Slides


References