arbitragelab.other_approaches.pca_approach

This module implements the PCA approach described by by Marco Avellaneda and Jeong-Hyun Lee in “Statistical Arbitrage in the U.S. Equities Market”.

Module Contents

Classes

PCAStrategy

This strategy creates mean reverting portfolios using Principal Components Analysis. The idea of the strategy

class PCAStrategy(n_components: int = 15)

This strategy creates mean reverting portfolios using Principal Components Analysis. The idea of the strategy is to estimate PCA factors affecting the dynamics of assets in a portfolio. Thereafter, for each asset in a portfolio, we define OLS residuals by regressing asset returns on PCA factors. These residuals are used to calculate S-scores to generate trading signals and the regression coefficients are used to construct eigen portfolios for each asset. If the eigen portfolio shows good mean-reverting properties and the S-score deviates enough from its mean value, that eigen portfolio is being traded. The output trading signals of this strategy are weights for each asset in a portfolio at each given time. These weights are are a composition of all eigen portfolios that satisfy the required properties.

static standardize_data(matrix: pandas.DataFrame)

A function to standardize data (returns) that is being fed into the PCA.

The standardized returns (R)are calculated as:

R_standardized = (R - mean(R)) / st.d.(R)

Parameters:

matrix – (pd.DataFrame) DataFrame with returns that need to be standardized.

Returns:

(pd.DataFrame. pd.Series) a tuple with two elements: DataFrame with standardized returns and Series of standard deviations.

get_factorweights(matrix: pandas.DataFrame) pandas.DataFrame

A function to calculate weights (scaled eigen vectors) to use for factor return calculation.

Weights are calculated from PCA components as:

Weight = Eigen vector / st.d.(R)

So the output is a dataframe containing the weight for each asset in a portfolio for each eigen vector.

Parameters:

matrix – (pd.DataFrame) Dataframe with index and columns containing asset returns.

Returns:

(pd.DataFrame) Weights (scaled PCA components) for each index from the matrix.

get_residuals(matrix: pandas.DataFrame, pca_factorret: pandas.DataFrame)

A function to calculate residuals given matrix of returns and factor returns.

First, for each asset in a portfolio, we fit its returns to PCA factor returns as:

Returns = beta_0 + beta * PCA_factor_return + residual

Residuals are used to generate trading signals and beta coefficients are used as weights to later construct eigenportfolios for each asset.

Parameters:
  • matrix – (pd.DataFrame) Dataframe with index and columns containing asset returns.

  • pca_factorret – (pd.DataFrame) Dataframe with PCA factor returns for assets.

Returns:

(pd.DataFrame, pd.Series) Dataframe with residuals and series of beta coefficients.

static get_sscores(residuals: pandas.DataFrame, k: float) pandas.Series

A function to calculate S-scores for asset eigen portfolios given dataframes of residuals and a mean reversion speed threshold.

From residuals, a discrete version of the OU process is created for each asset eigen portfolio.

If the OU process of the asset shows a mean reversion speed above the given threshold k, it can be traded and the S-score is being calculated for it.

The output of this function is a dataframe with S-scores that are directly used to determine if the eigen portfolio of a given asset should be traded at this period.

In the original paper, it is advised to choose k being less than half of a window for residual estimation. If this window is 60 days, half of it is 30 days. So k > 252/30 = 8.4. (Assuming 252 trading days in a year)

Parameters:
  • residuals – (pd.DataFrame) Dataframe with residuals after fitting returns to PCA factor returns.

  • k – (float) Required speed of mean reversion to use the eigen portfolio in trading.

Returns:

(pd.Series) Series of S-scores for each asset for a given residual dataframe.

get_signals(matrix: pandas.DataFrame, k: float = 8.4, corr_window: int = 252, residual_window: int = 60, sbo: float = 1.25, sso: float = 1.25, ssc: float = 0.5, sbc: float = 0.75, size: float = 1) pandas.DataFrame

A function to generate trading signals for given returns matrix with parameters.

First, the correlation matrix to get PCA components is calculated using a corr_window parameter. From this, we get weights to calculate PCA factor returns. These weights are being recalculated each time we generate (residual_window) number of signals.

It is expected that corr_window>residual_window. In the original paper, corr_window is set to 252 days and residual_window is set to 60 days. So with corr_window==252, the first 252 observation will be used for estimation and the first signal will be generated for the 253rd observation.

Next, we pick the last (residual_window) observations to compute PCA factor returns and fit them to residual_window observations to get residuals and regression coefficients.

Based on the residuals the S-scores are being calculated. These S-scores are calculated as:

s_i = (X_i(t) - m_i) / sigma_i

Where X_i(t) is the OU process generated from the residuals, m_i and sigma_i are the calculated properties of this process.

The S-score is being calculated only for eigen portfolios that show mean reversion speed above the given threshold k.

In the original paper, it is advised to choose k being less than half of a window for residual estimation. If this window is 60 days, half of it is 30 days. So k > 252/30 = 8.4. (Assuming 252 trading days in a year)

So, we can have mean-reverting eigen portfolios for each asset in our portfolio. But this portfolio is worth investing in only if it shows good mean reversion speed and the S-score has deviated enough from its mean value. Based on this logic we pick promising eigen portfolios and invest in them. The trading signals we get are the target weights for each of the assets in our portfolio at any given time.

Trading rules to enter a mean-reverting portfolio based on the S-score are:

Enter a long position if s-score < −sbo Close a long position if s-score > −ssc Enter a short position if s-score > +sso Close a short position if s-score < +sbc

The authors empirically chose the optimal values for the above parameters based on stock prices for years 2000-2004 as: sbo = sso = 1.25; sbc = 0.75; ssc = 0.5.

Opening a long position on an eigne portfolio means buying one dollar of the corresponding asset and selling beta_i1 dollars of weights of other assets from component1, beta_i2 dollars of weights of other assets from component2 and so on. Opening a short position means selling the corresponding asset and buying betas of other assets.

Parameters:
  • matrix – (pd.DataFrame) Dataframe with returns for assets.

  • k – (float) Required speed of mean reversion to use the eigen portfolio in trading.

  • corr_window – (int) Look-back window used for correlation matrix estimation.

  • residual_window – (int) Look-back window used for residuals calculation.

  • sbo – (float) Parameter for signal generation for the S-score.

  • sso – (float) Parameter for signal generation for the S-score.

  • ssc – (float) Parameter for signal generation for the S-score.

  • sbc – (float) Parameter for signal generation for the S-score.

  • size – (float) Number of units invested in assets when opening trades. So when opening a long position, buying (size) units of stock and selling (size) * betas units of other stocks.

Returns:

(pd.DataFrame) DataFrame with target weights for each asset at every observation. It is being calculated as a combination of all eigen portfolios that are satisfying the mean reversion speed requirement and S-score values.