arbitragelab.distance_approach.basic_distance_approach
Implementation of the statistical arbitrage distance approach proposed by Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. in “Pairs trading: Performance of a relative-value arbitrage rule.” (2006) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=141615.
Module Contents
Classes
Class for creation of trading signals following the strategy by Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. |
- class DistanceStrategy
Class for creation of trading signals following the strategy by Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. in “Pairs trading: Performance of a relative-value arbitrage rule.” (2006) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=141615.
- form_pairs(train_data, method='standard', industry_dict=None, num_top=5, skip_top=0, selection_pool=50, list_names=None)
Forms pairs based on input training data.
This method includes procedures from the pairs formation step of the distance strategy.
First, the input data is being normalized using max and min price values for each series: Normalized = (Price - Min(Price)) / (Max(Price) - Min(Price))
Second, the normalized data is used to find a pair for each element - another series of prices that would have a minimum sum of square differences between normalized prices. Only unique pairs are picked in this step (pairs (‘AA’, ‘BD’) and (‘BD’, ‘AA’) are assumed to be one pair (‘AA’, ‘BD’)). During this step, if one decides to match pairs within the same industry group, with the industry dictionary given, the sum of square differences is calculated only for the pairs of prices within the same industry group.
Third, based on the desired number of top pairs to chose and the pairs to skip, they are taken from the list of created pairs in the previous step. Pairs are sorted so that ones with a smaller sum of square distances are placed at the top of the list.
Finally, the historical volatility for the portfolio of each chosen pair is calculated. Portfolio here is the difference of normalized prices of two elements in a pair. Historical volatility will later be used in the testing(trading) step of the distance strategy. The formula for calculating a portfolio price here: Portfolio_price = Normalized_price_A - Normalized_price_B
Note: The input dataframe to this method should not contain missing values, as observations with missing values will be dropped (otherwise elements with fewer observations would have smaller distance to all other elements).
- Parameters:
train_data – (pd.DataFrame/np.array) Dataframe with training data used to create asset pairs.
num_top – (int) Number of top pairs to use for portfolio formation.
skip_top – (int) Number of first top pairs to skip. For example, use skip_top=10 if you’d like to take num_top pairs starting from the 10th one.
list_names – (list) List containing names of elements if Numpy array is used as input.
method – (str) Methods to use for sorting pairs [
standard
by default,variance
,zero_crossing
].selection_pool – (int) Number of pairs to use before sorting them with the selection method.
industry_dict – (dict) Dictionary matching ticker to industry group.
- selection_method(method, num_top, skip_top)
Select pairs based on the method. This module helps sorting selected pairs for the given method in the formation period.
- Parameters:
method – (str) Methods to use for sorting pairs [
standard
by default,variance
,zero_crossing
].num_top – (int) Number of top pairs to use for portfolio formation.
- :param skip_top:(int) Number of first top pairs to skip. For example, use skip_top=10
if you’d like to take num_top pairs starting from the 10th one.
- trade_pairs(test_data, divergence=2)
Generates trading signals for formed pairs based on new testing(trading) data.
This method includes procedures from the trading step of the distance strategy.
First, the input test data is being normalized with the min and max price values from the pairs formation step (so we’re not using future data when creating signals). Normalized = (Test_Price - Min(Train_Price)) / (Max(Train_Price) - Min(Train_Price))
Second, pair portfolios (differences of normalized price series) are constructed based on the chosen top pairs from the pairs formation step.
Finally, for each pair portfolio trading signals are created. The logic of the trading strategy is the following: we open a position when the portfolio value (difference between prices) is bigger than divergence * historical_standard_deviation. And we close the position when the portfolio price changes sign (when normalized prices of elements cross).
Positions are being opened in two ways. We open a long position on the first element from pair and a short position on the second element. The price of a portfolio is then:
Portfolio_price = Normalized_price_A - Normalized_price_B
If Portfolio_price > divergence * st_deviation, we open a short position on this portfolio.
IF Portfolio_price < - divergence * st_deviation, we open a long position on this portfolio.
Both these positions will be closed once Portfolio_price reaches zero.
- Parameters:
test_data – (pd.DataFrame/np.array) Dataframe with testing data used to create trading signals. This dataframe should contain the same columns as the dataframe used for pairs formation.
divergence – (float) Number of standard deviations used to open a position in a strategy. In the original example, 2 standard deviations were used.
- get_signals()
Outputs generated trading signals for pair portfolios.
- Returns:
(pd.DataFrame) Dataframe with trading signals for each pair. Trading signal here is the target quantity of portfolios to hold.
- get_portfolios()
Outputs pair portfolios used to generate trading signals.
- Returns:
(pd.DataFrame) Dataframe with portfolios for each pair.
- get_scaling_parameters()
Outputs minimum and maximum values used for normalizing each price series.
Formula used for normalization: Normalized = (Price - Min(Price)) / (Max(Price) - Min(Price))
- Returns:
(pd.DataFrame) Dataframe with columns ‘min_value’ and ‘max_value’ for each element.
- get_pairs()
Outputs pairs that were created in the pairs formation step and sorted by the method.
- Returns:
(list) List containing tuples of two strings, for names of elements in a pair.
- get_num_crossing()
Outputs pairs that were created in the pairs formation step with its number of zero crossing.
- Returns:
(dict) Dictionary with keys as pairs and values as the number of zero crossings for pairs.
- count_number_crossing()
Calculate the number of zero crossings for the portfolio dataframe generated with train dataset.
As the number of zero crossings in the formation period does have some usefulness in predicting future convergence, this method calculates the number of times the normalized spread crosses the value zero which measures the frequency of divergence and convergence between two securities.
- Returns:
(dict) Dictionary with keys as pairs and values as the number of zero crossings for pairs.
- plot_portfolio(num_pair)
Plots a pair portfolio (difference between element prices) and trading signals generated for it.
- Parameters:
num_pair – (int) Number of the pair from the list to use for plotting.
- Returns:
(plt.Figure) Figure with portfolio plot and trading signals plot.
- plot_pair(num_pair)
Plots prices for a pair of elements and trading signals generated for their portfolio.
- Parameters:
num_pair – (int) Number of the pair from the list to use for plotting.
- Returns:
(plt.Figure) Figure with prices for pairs plot and trading signals plot.
- static normalize_prices(data, min_values=None, max_values=None)
Normalizes given dataframe of prices.
Formula used: Normalized = (Price - Min(Price)) / (Max(Price) - Min(Price))
- Parameters:
data – (pd.DataFrame) Dataframe with prices.
min_values – (pd.Series) Series with min values to use for price scaling. If None, will be calculated from the given dataset.
max_values – (pd.Series) Series with max values to use for price scaling. If None, will be calculated from the given dataset.
- Returns:
(pd.DataFrame, pd.Series, pd.Series) Dataframe with normalized prices and series with minimum and maximum values used to normalize price series.
- static find_pair(data, industry_dict=None)
Finds the pairs with smallest distances in a given dataframe.
Closeness measure here is the sum of squared differences in prices. Duplicate pairs are dropped, and elements in pairs are sorted in alphabetical order. So pairs (‘AA’, ‘BC’) and (‘BC’, ‘AA’) are treated as one pair (‘AA’, ‘BC’).
- Parameters:
data – (pd.DataFrame) Dataframe with normalized price series.
industry_dict – (dictionary) Dictionary matching ticker to industry group.
- Returns:
(dict) Dictionary with keys as closest pairs and values as their distances.
- static sort_pairs(pairs, num_top=5, skip_top=0)
Sorts pairs of elements and returns top_num of closest ones.
The skip_top parameter can be used to skip a number of first top portfolios. For example, if we’d like to pick pairs number 10-15 from the top list, we set num_top = 5, skip_top = 10.
- Parameters:
pairs – (dict) Dictionary with keys as pairs and values as distances between elements in a pair.
num_top – (int) Number of closest pairs to take.
skip_top – (int) Number of top closest pairs to skip.
- Returns:
(list) List containing sorted pairs as tuples of strings, representing elements in a pair.
- static find_volatility(data, pairs)
Calculates historical volatility of portfolios(differences of prices) for set of pairs.
- Parameters:
data – (pd.DataFrame) Dataframe with price series to use for calculation.
pairs – (list) List of tuples with two elements to use for calculation.
- Returns:
(dict) Dictionary with keys as pairs of elements and values as their historical volatility.
- static find_portfolios(data, pairs)
Calculates portfolios (difference of price series) based on given prices dataframe and set of pairs to use.
When creating a portfolio, we long one share of the first element and short one share of the second element.
- Parameters:
data – (pd.DataFrame) Dataframe with price series for elements.
pairs – (list) List of tuples with two str elements to use for calculation.
- Returns:
(pd.DataFrame) Dataframe with pairs as columns and their portfolio values as rows.
- static signals(portfolios, variation, divergence)
Generates trading signals based on the idea described in the original paper.
A position is being opened when the difference between prices (portfolio price) diverges by more than divergence (two in the original paper) historical standard deviations. This position is being closed once pair prices are crossing (portfolio price reaches zero).
Positions are being opened in both buy and sell directions.
- Parameters:
portfolios – (pd.DataFrame) Dataframe with portfolio price series for pairs.
variation – (dict) Dictionary with keys as pairs and values as the historical standard deviations of their pair portfolio.
divergence – (float) Number of standard deviations used to open a position.
- Returns:
(pd.DataFrame) Dataframe with target quantity to hold for each portfolio.