Note

The following documentation closely follows a paper by Chen et al. (2012): Empirical investigation of an equity pairs trading strategy.

As well as a paper by Perlin, M. S. (2009): Evaluation of pairs-trading strategy at the Brazilian financial market.

And also a paper by Christopher Krauss (2015): Statistical arbitrage pairs trading strategies: Review and outlook.

Pearson Approach



After the distance approach was introduced in the paper by Gatev et al. (2006), a lot of research has been conducted to further develop the original distance approach. One of the adjustments is the Pearson correlation approach proposed by Chen et al.(2012). In this paper, the authors use the same data set and time frame as in the work by Gatev et al.(2006), but they used Pearson correlation on return level for forming pairs. In the formation period(5 years in the paper), pairwise return correlations for all pairs in the universe are calculated based on monthly return data. Then the authors construct a new variable, Return Divergence (\(D_{i j t}\)), to capture the return divergence between a single stock’s return and its pairs-portfolio returns:

\[D_{i j t}=\beta\left(R_{i t}-R_{f}\right)-\left(R_{j t}-R_{f}\right)\]

where \(\beta\) denotes the regression coefficient of stock’s monthly return \(R_{i t}\) on its pairs-portfolio return \(R_{j t}\), and \(R_{f}\) is the risk-free rate.

The hypothesis in this approach is that if a stock’s return deviates from its pairs portfolio returns more than usual, this divergence is expected to be reversed in the next month. And the returns of this stock are expected to be abnormally high/low in comparison to other stocks.

Therefore, after calculating the return divergence of all the stocks, a long-short portfolio is constructed based on the long and short ratio given by the user.

Pairs Portfolio Formation

This stage of PearsonStrategy consists of the following steps:

  1. Data preprocessing

As the method has to compute all of the pairs’ correlation values in the following steps, for \(m\) stocks, there are \(\frac{m*(m-1)}{2}\) correlations to be computed in the formation period. As the number of observations for the correlations grows exponentially with the number of stocks, this estimation is computationally intensive.

Therefore, to reduce the computation burden, this method uses monthly stock returns data in the formation period (ex. 60 monthly stock returns if the formation period is 5 years). If the daily price data is given, the method calculates the monthly returns before moving into the next steps.

  1. Finding pairs

For each stock, the method finds \(n\) stocks with the highest correlations to the stock as its pairs using monthly stock returns. After pairs are formed, returns of pairs, which refer to as pairs portfolios in the paper, are needed to create \(beta\) in the following step. Therefore, this method uses two different weighting metrics in calculating the returns.

The first is an equal-weighted portfolio. The method by default computes the pairs portfolio returns as the equal-weighted average returns of the top n pairs of stocks. The second is a correlation-weighted portfolio. If this metric is chosen, the method uses the stock’s correlation values to each of the pairs and forms a portfolio weighted by these values and the weights are calculated by the formula:

\[w_{k}=\frac{\rho_{k}}{\sum_{i=1}^{n} \rho_{i}}\]

where \(w_{k}\) is the weight of stock k in the portfolio and \(\rho_{i}\) is a correlation of the stock and one of its pairs.

  1. Calculating beta

After pairs portfolio returns are calculated, the method derives beta from the monthly return of the stock and its pairs portfolio. By using linear regression, setting stock return as independent variable and pairs portfolio return as the dependent variable, the methods set beta as a regression coefficient. Then the beta is stored in a dictionary for future uses in trading. Below is a figure showing two stocks with high beta (ENPH) and low beta (RE).

../_images/pearson_approach_beta_stocks.png

Implementation

Implementation of the statistical arbitrage distance approach proposed by Chen, H., Chen, S. J., and Li, F. in “Empirical Investigation of an Equity Pairs Trading Strategy.” (2012) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1361293.

class PearsonStrategy

Class for creation of portfolios following the strategy by Chen, H., Chen, S. J., and Li, F. in “Empirical Investigation of an Equity Pairs Trading Strategy.” (2012) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1361293.

__init__()

Initialize Pearson strategy.

PearsonStrategy.form_portfolio(train_data, risk_free=0.0, num_pairs=50, weight='equal')

Forms portfolio based on the input train data.

For each stock i in year t+1(or the last month of year t), this method computes the Pearson correlation coefficients between the returns of stock i and returns of all other stocks in the given train data set. Usually, the formation period is set to 4 years but it may be changed upon the user’s needs.

Then the method finds top n stocks with the highest correlations to stock i as its pairs in the formation period. For each month in year t+1, this method computes the pairs portfolio return as the equal-weighted average return of the n pairs stocks from the previous month.

The hypothesis in this approach is that if a stock’s return deviates from its pairs portfolio returns more than usual, this divergence is expected to be reversed in the next month. And the returns of this stock are expected to be abnormally high/low in comparison to other stocks.

In this method, for stock i, this method uses a new variable, return difference, which captures the return divergence between i’s stock return and its pairs-portfolio return.

Parameters:
  • train_data – (pd.DataFrame) Daily price data with date in its index and stocks in its columns.

  • risk_free – (pd.Series/float) Daily risk-free rate data as a series or a float number.

  • num_pairs – (int) Number of top pairs to use for portfolio formation.

  • weight – (str) Weighting Scheme for portfolio returns [equal by default, correlation].

Trading Signal Generation

After calculating the betas for all of the stocks in the formation period, the next step is to generate trading signals by calculating the return divergence for each of the stocks. In this method, test data is not necessarily required if only a trading signal for the last month of the formation period is needed. However, if one wants to see the backtesting results of the method and test with test data, a successive dataset after the formation period is required to generate the proper trading signals. The steps are as follows:

  1. Data Preprocessing

The same data preprocessing is done with the formation period as the data needs to be in the same format. As in the formation period, the risk-free rate can be given in the form of either a float number or a series of data.

  1. Calculating the Return Divergence

For every month of test data, starting from the very last month of the train data, return divergences are calculated with the beta created in the formation period. The equation for calculating the return divergence is in the first section of this documentation. Note that while the return divergence between a portfolio of \(n\) most-highly correlated stocks with stock i and stock \(i\) is used as a sorting variable, only individual stock \(i\) enters the portfolio construction, not those \(n\) stocks. The portfolio of those \(n\) stocks only serves as a benchmark for portfolio sorting.

  1. Finding Trading Signals

Then, all stocks are sorted in descending order based on their previous month’s return divergence. If the percentages of long and short stocks are given, say \(p\%\) and \(q\%\), the top \(p\%\) of the sorted stocks are chosen for the long stocks and the bottom \(q\%\) of the sorted stocks are chosen for the short stocks. If a user wants to construct a dollar-neutral portfolio, one should choose the same percentage for \(p\) and \(q\). Finally, a new dataframe is created with all of the trading signals: 1 if a stock is in a long position, -1 if it is in a short position and 0 if it does not have any position.

Implementation

PearsonStrategy.trade_portfolio(test_data=None, test_risk_free=0.0, long_pct=0.1, short_pct=0.1)

Trade portfolios by generating trading signals in the test data.

In each month in the test period, all stocks are sorted in descending order based on their previous month’s return divergence from its pairs portfolio created in the formation period. Then a long-short portfolio is constructed with top p % of the stocks are “long stocks” and bottom q % of stocks are “short stocks”.

If the test data is not given in this method, it automatically results in signals from the last month of the training data.

Parameters:
  • test_data – (pd.DataFrame) Daily price data with date in its index and stocks in its columns.

  • test_risk_free – (pd.Series/float) Daily risk-free rate data as a series or a float number.

  • long_pct – (float) Percentage of long stocks in the sorted return divergence.

  • short_pct – (float) Percentage of short stocks in the sorted return divergence.

Results output

The PearsonStrategy class contains multiple methods to get results in the desired form.

Functions that can be used to get data:

  • get_trading_signal() outputs trading signal in monthly basis. 1 for a long position, -1 for a short position and 0 for closed position.

  • get_beta_dict() outputs beta, a regression coefficients for each stock, in the formation period.

  • get_pairs_dict() outputs top n pairs selected during the formation period for each of the stock.

Implementation

PearsonStrategy.get_trading_signal()

Outputs trading signal in monthly basis. 1 for a long position, -1 for a short position and 0 for closed position.

Returns:

(pd.DataFrame) A dataframe with multi index of year and month for given test period.

PearsonStrategy.get_beta_dict()

Outputs beta, a regression coefficients for each stock, in the formation period.

Returns:

(dict) A dictionary with stock in its key and beta in its value

PearsonStrategy.get_pairs_dict()

Outputs top n pairs selected during the formation period for each of the stock.

Returns:

(dict) A dictionary with stock in its key and pairs in its value

Examples

Code Example

# Importing packages
import pandas as pd
from arbitragelab.distance_approach.pearson_distance_approach import PearsonStrategy

# Getting the dataframe with price time series for a set of assets
data = pd.read_csv('X_FILE_PATH.csv', index_col=0, parse_dates = [0])
risk_free = pd.read_csv('Y_FILE_PATH.csv', index_col=0, parse_dates = [0])

# Dividing the dataset into two parts - the first one for pairs formation
train_data = data.loc[:'2019-01-01']
risk_free_train = risk_free.loc[:'2019-01-01']

# And the second one for signals generation
test_data = data.loc['2019-01-01':]
risk_free_test = risk_free.loc['2019-01-01':]

# Performing the portfolio formation stage of the PearsonStrategy
# Choosing top 50 pairs to calculate beta and set other parameters as default
strategy = PearsonStrategy()
strategy.form_portfolio(train_data, risk_free_train, num_pairs=50)

# Checking a list of beta that were calculated
beta_dict = strategy.get_beta_dict()

# Checking a list of pairs in the formation period
pairs = strategy.get_pairs_dict()

# Now generating signals for formed pairs and beta
strategy.trade_portfolio(test_data, risk_free_test)

# Checking generated trading signals in the test period
signals = strategy.get_trading_signal()

Research Notebooks

The following research notebook can be used to better understand the Pearson approach described above.

Research Article


Presentation Slides


References