arbitragelab.distance_approach.pearson_distance_approach

Implementation of the statistical arbitrage distance approach proposed by Chen, H., Chen, S. J., and Li, F. in “Empirical Investigation of an Equity Pairs Trading Strategy.” (2012) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1361293.

Module Contents

Classes

PearsonStrategy

Class for creation of portfolios following the strategy by Chen, H., Chen, S. J., and Li, F.

class PearsonStrategy

Class for creation of portfolios following the strategy by Chen, H., Chen, S. J., and Li, F. in “Empirical Investigation of an Equity Pairs Trading Strategy.” (2012) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1361293.

form_portfolio(train_data, risk_free=0.0, num_pairs=50, weight='equal')

Forms portfolio based on the input train data.

For each stock i in year t+1(or the last month of year t), this method computes the Pearson correlation coefficients between the returns of stock i and returns of all other stocks in the given train data set. Usually, the formation period is set to 4 years but it may be changed upon the user’s needs.

Then the method finds top n stocks with the highest correlations to stock i as its pairs in the formation period. For each month in year t+1, this method computes the pairs portfolio return as the equal-weighted average return of the n pairs stocks from the previous month.

The hypothesis in this approach is that if a stock’s return deviates from its pairs portfolio returns more than usual, this divergence is expected to be reversed in the next month. And the returns of this stock are expected to be abnormally high/low in comparison to other stocks.

In this method, for stock i, this method uses a new variable, return difference, which captures the return divergence between i’s stock return and its pairs-portfolio return.

Parameters:
  • train_data – (pd.DataFrame) Daily price data with date in its index and stocks in its columns.

  • risk_free – (pd.Series/float) Daily risk-free rate data as a series or a float number.

  • num_pairs – (int) Number of top pairs to use for portfolio formation.

  • weight – (str) Weighting Scheme for portfolio returns [equal by default, correlation].

trade_portfolio(test_data=None, test_risk_free=0.0, long_pct=0.1, short_pct=0.1)

Trade portfolios by generating trading signals in the test data.

In each month in the test period, all stocks are sorted in descending order based on their previous month’s return divergence from its pairs portfolio created in the formation period. Then a long-short portfolio is constructed with top p % of the stocks are “long stocks” and bottom q % of stocks are “short stocks”.

If the test data is not given in this method, it automatically results in signals from the last month of the training data.

Parameters:
  • test_data – (pd.DataFrame) Daily price data with date in its index and stocks in its columns.

  • test_risk_free – (pd.Series/float) Daily risk-free rate data as a series or a float number.

  • long_pct – (float) Percentage of long stocks in the sorted return divergence.

  • short_pct – (float) Percentage of short stocks in the sorted return divergence.

get_trading_signal()

Outputs trading signal in monthly basis. 1 for a long position, -1 for a short position and 0 for closed position.

Returns:

(pd.DataFrame) A dataframe with multi index of year and month for given test period.

get_beta_dict()

Outputs beta, a regression coefficients for each stock, in the formation period.

Returns:

(dict) A dictionary with stock in its key and beta in its value

get_pairs_dict()

Outputs top n pairs selected during the formation period for each of the stock.

Returns:

(dict) A dictionary with stock in its key and pairs in its value