arbitragelab.spread_selection.cointegration

This module implements the ML based Pairs Selection Framework described by Simão Moraes Sarmento and Nuno Horta in “A Machine Learning based Pairs Trading Investment Strategy.”.

Module Contents

Classes

CointegrationSpreadSelector

Implementation of the Proposed Pairs Selection Framework in the following paper:

class CointegrationSpreadSelector(prices_df: pandas.DataFrame = None, baskets_to_filter: list = None)

Bases: arbitragelab.spread_selection.base.AbstractPairsSelector

Implementation of the Proposed Pairs Selection Framework in the following paper: “A Machine Learning based Pairs Trading Investment Strategy.”. H&T team improved it to work not only with pairs, but also with spreads.

__slots__ = ()
set_prices(prices_df: pandas.DataFrame, baskets_to_filter: list)

Sets up the price series needed for the next step.

Parameters:
  • prices_df – (pd.DataFrame) Asset prices universe.

  • baskets_to_filter – (list) List of tuples of tickers baskets to filter (can be pairs (AAA, BBB) or higher dimensions (AAA, BBB, CCC)).

construct_spreads(hedge_ratio_calculation: str) dict

For self.baskets_to_filter construct spreads and log hedge ratio calculated based on hedge_ratio_calculation.

Parameters:

hedge_ratio_calculation – (str) Defines how hedge ratio is calculated. Can be either ‘OLS’, ‘TLS’ (Total Least Squares), ‘min_half_life’, ‘min_adf’, ‘johansen’, ‘box_tiao’.

Returns:

(dict) Dictionary of generated spreads (tuple: pd.Series).

select_spreads(hedge_ratio_calculation: str = 'OLS', adf_cutoff_threshold: float = 0.95, hurst_exp_threshold: float = 0.5, min_crossover_threshold: int = 12, min_half_life: float = 365) list

Apply cointegration selection rules (ADF, Hurst, Min SMA crossover, Min Half-Life) to filter-out pairs/baskets.

Check to see if pairs comply with the criteria supplied in the paper: the pair being cointegrated, the Hurst exponent being <0.5, the spread moves within convenient periods and finally that the spread reverts to the mean with enough frequency.

Parameters:
  • hedge_ratio_calculation – (str) Defines how hedge ratio is calculated. Can be either ‘OLS’, ‘TLS’ (Total Least Squares), ‘min_half_life’, ‘min_adf’, ‘johansen’, ‘box_tiao’.

  • adf_cutoff_threshold – (float) ADF test threshold used to define if the spread is cointegrated. Can be 0.99, 0.95 or 0.9.

  • hurst_exp_threshold – (float) Max Hurst threshold value.

  • min_crossover_threshold – (int) Minimum amount of mean crossovers per analysed period.

  • min_half_life – (float) Minimum Half-Life of mean reversion value in units of time used.

Returns:

(list) Tuple list of final pairs.

apply_filtering_rules(adf_cutoff_threshold: float = 0.95, hurst_exp_threshold: float = 0.5, min_crossover_threshold: int = 12, min_half_life: float = 365) list

Apply cointegration selection rules (ADF, Hurst, Min SMA crossover, Min Half-Life) to filter-out pairs/baskets.

Check to see if pairs comply with the criteria supplied in the paper: the pair being cointegrated, the Hurst exponent being <0.5, the spread moves within convenient periods and finally that the spread reverts to the mean with enough frequency.

Parameters:
  • adf_cutoff_threshold – (float) ADF test threshold used to define if the spread is cointegrated. Can be 0.99, 0.95 or 0.9.

  • hurst_exp_threshold – (float) Max Hurst threshold value.

  • min_crossover_threshold – (int) Minimum amount of mean crossovers per analysed period.

  • min_half_life – (float) Minimum Half-Life of mean reversion value in units of time used.

Returns:

(list) Tuple list of final pairs.

generate_spread_statistics(spread_series: pandas.Series, log_info: bool = True) dict

Generate spread filtering statistics (Hurst, ADF, HL, Crossovers).

Parameters:
  • spread_series – (pd.Series) Spread values series.

  • log_info – (bool) Flag indicating that information should be logged into self.selection_logs.

Returns:

(dict) Dictionary with statistics.