arbitragelab.copula_approach.pairs_selection

Module for implementing some quick pairs selection algorithms for copula-based strategies.

Module Contents

Classes

PairsSelector

The class that quickly select pairs for copula-based trading strategies.

class PairsSelector

The class that quickly select pairs for copula-based trading strategies.

This class selects potential pairs for copula-based trading strategies. Methods include Spearman’s rho, Kendall’s tau and Euclidean distance on normalized prices. Those methods are relatively quick to perform and is generally used in literature for copula-based pairs trading framework. For more sophisticated ML based pairs selection methods, please refer to arbitragelab.ml_approach.

rank_pairs(stocks_universe: pandas.DataFrame, method: str = 'kendall tau', nan_option: str | None = 'forward fill', keep_num_pairs: int | None = None) pandas.Series

Rank all pairs from the stocks universe by a given method.

method choices are ‘spearman rho’, ‘kendall tau’, ‘euc distance’. nan_options choices are ‘backward fill’, ‘linear interp’, None. keep_num_pairs choices are all integers not greater than the number of all available pairs. None Means keeping all pairs.

Spearman’s rho is calculated faster, however the performance suffers from outliers and tied ranks when compared to Kendall’s tau. Euclidean distance is calculated based on a pair’s cumulative return (normalized prices). User should keep in mind that Spearman’s rho and Kendall’s tau will generally give similar results but the top pairs may still drift away in terms of normalized prices.

Note: ALL NaN values need to be filled, otherwise you will likely get the scores all in NaN value. We suggest ‘forward fill’ to avoid look-ahead bias. User should be aware that ‘linear interp’ will linearly interpolate the NaN value and will thus introduce look-ahead bias. This method will internally fill the NaN values. Also the very first row of data cannot have NaN values.

Parameters:
  • stocks_universe – (pd.DataFrame) The stocks universe to be analyzed. Require no multi-indexing for columns.

  • method – (pd.DataFrame) Optional. The method to pick pairs. One can choose from [‘spearman rho’, ‘kendall tau’, ‘euc distance’] for Spearman’s rho, Kendall’s tau and Euclidean distance. Defaults to ‘kendall tau’.

  • nan_option – (Union[str, None]) Optional. The method to fill NaN value. one can choose from [‘forward fill’, ‘linear interp’, None]. Defaults to ‘forward fill’.

  • keep_num_pairs – (Union[int, None]) Optional. The number of top ranking pairs to keep. Defaults to None, which means all pairs will be returned.

Returns:

(pd.Series) The selected pairs ranked descending in their scores (top ‘correlated’ pairs on the top).

static spearman_rho(s1: pandas.Series, s2: pandas.Series) float

Calculating Spearman’s rho for a pair of stocks.

Complexity is O(N logN).

Parameters:
  • s1 – (pd.Series) Prices series for a stock.

  • s2 – (pd.Series) Prices series for a stock.

Returns:

(float) Spearman’s rho value.

static kendall_tau(s1: pandas.Series, s2: pandas.Series) float

Calculating Kendall’s tau for a pair of stocks.

Complexity is O(N^2).

Parameters:
  • s1 – (pd.Series) Prices series for a stock.

  • s2 – (pd.Series) Prices series for a stock.

Returns:

(float) Kendall’s tau value.

static euc_distance(s1: pandas.Series, s2: pandas.Series) float

Calculating the negative sum of euclidean distance (2-norm) for a pair of stocks on their normalized prices.

Complexity is O(N). The result is multiplied by -1 because we want to keep the top results having the smallest distance in positive value (thus largest in negative value).

Parameters:
  • s1 – (pd.Series) Prices series for a stock.

  • s2 – (pd.Series) Prices series for a stock.

Returns:

(float) Negative sum of Euclidean distance value.