Mispricing Index Copula Trading Strategy



Note

The following strategy closely follows the implementations:

Note

The authors claimed a relatively robust 8-10% returns from this strategy in the formation period (6 mo). We are pretty positive that the rules proposed in the paper were implemented correctly in the MPICopulaTradingRule module with thorough unit testing on every possible case, and thus it is very unlikely that we made logical mistakes. However the P&L is very sensitive to the opening and exiting logic and parameter values, input data and copula choice, and it cannot lead to the claimed returns, after trying all the possible interpretations of ambiguities.

We found out that using an AND for opening and OR for exiting lead to a much less sensitive strategy and generally leads to a much better performance, and we provide such an option in the module.

We still implement this module for people who intend to explore possibilities with copula, however the user should be aware of the nature of the proposed framework. Interested reader may read through the Possible Issues part and see where this strategy can be improved, and is encouraged to make changes in the source code, where we grouped the exit and open logic in one function for ease of alteration.

Introduction to the Strategy Concepts

For convenience, the mispricing index implemented in the strategy will be referred to MPI when no ambiguity arises.

A Quick Review of the Basic Copula Strategy

Before we introduce the MPI strategy, let’s recall the basic copula strategy and understand its pros and cons.

The basic copula strategy proposed by [Liew et al. 2013] works with price series (or log-price series, which has identical suggested trading signals under the copula framework) and looks at conditional probabilities. For example, if \(P(X \le x_t | Y = y_t)\) is small for stock pair \((X, Y)\) with prices \((x_t, y_t)\), then stock \(X\) is considered undervalued given the current price of \(Y\). Then we derive long/short positions regarding the spread based on this conditional probability. This conditional probability is calculated from some copula fitted to the training data, and generally other models cannot produce this value.

Although this approach is in general reasonably sound, it has one critical drawback, that the price series is not in general stationary. For example, if one adopt the assumption that stocks move in a lognormal (One may also argue that such assumption can be quite situational. We won’t get into the details here.) fashion, then almost surely the price will reach any given level with enough time.

This implies that, if the basic copula framework was working on two stocks that have an upward(or downward) drift in the trading period, it may go out of range of the training period, and the conditional probabilities calculated from which will always be extreme values as \(0\) and \(1\), bringing in nonsense trading signals. One possible way to overcome this inconvenience is to keep the training set up to date so it is less likely to have new prices out of range. Another way is to work with a more likely stationary time series, for example, returns.

How is the MPI Strategy Constructed?

At first glance, the MPI strategy documented in [Xie et al. 2016] looks quite bizarre. However, it is reasonably consistent when one goes through the logic of its construction: In order to use returns to generate trading signals, one needs to be creative about utilizing the information. It is one thing to know the dependence structure of a pair of stocks, it is another thing to trade based on it because intrinsically stocks are traded on prices, not returns.

If one regards using conditional probabilities as a distance measure, then it is natural to think about how far the returns have cumulatively driven the prices apart (or together), thereby introducing trading opportunities.

Hence we introduce the following concepts for the strategy framework:

Mispricing Index

MPI is defined as the conditional probability of returns, i.e.,

\[MI_t^{X\mid Y} = P(R_t^X < r_t^X \mid R_t^Y = r_t^Y)\]
\[MI_t^{Y\mid X} = P(R_t^Y < r_t^Y \mid R_t^X = r_t^X)\]

for stocks \((X, Y)\) with returns random variable at day \(t\): \((R_t^X, R_t^Y)\) and returns value at day \(t\): \((r_t^X, r_t^Y)\). Those two values determine how mispriced each stock is, based on that day’s return. Note that so far only one day’s return information contributes, and we want to add it up to cumulatively use returns to gauge how mispriced the stocks are. Therefore we introduce the flag series:

Flag and Raw Flag

A more descriptive name than flag, in my opinion, would be cumulative mispricing index. The raw flag series (with a star) is the cumulative sum of daily MPIs minus 0.5, i.e.,

\[FlagX^*(t) = FlagX^*(t-1) + (MI_t^{X\mid Y} - 0.5), \quad FlagX^*(0) = 0.\]
\[FlagY^*(t) = FlagY^*(t-1) + (MI_t^{Y\mid X} - 0.5), \quad FlagY^*(0) = 0.\]

Or equivalently

\[FlagX^*(t) = \sum_{s=0}^t (MI_s^{X\mid Y} - 0.5)\]
\[FlagY^*(t) = \sum_{s=0}^t (MI_s^{Y\mid X} - 0.5)\]

If one plots the raw flags series, they look quite similar to cumulative returns from their price series, which is what they were designed to do: Accumulate information from daily returns to reflect information on prices. Therefore, you may consider it as a fancy way to represent the returns series.

However, the real flag series (without a star, \(FlagX(t)\), \(FlagY(t)\)) will be reset to 0 whenever there is an exiting signal, which brings us to the trading logic.

Trading Logic

Default Opening and Exiting Rules

The authors propose a dollar-neutral trade scheme worded as follows:

Suppose stock \(X\), \(Y\) are associated with \(FlagX\), \(FlagY\) respectively.

Opening rules: (\(D = 0.6\) in the paper)

  • When \(FlagX\) reaches \(D\), short \(X\) and buy \(Y\) in equal amounts. (\(-1\) Position)

  • When \(FlagX\) reaches \(-D\), short \(Y\) and buy \(X\) in equal amounts. (\(1\) Position)

  • When \(FlagY\) reaches \(D\), short \(Y\) and buy \(X\) in equal amounts. (\(1\) Position)

  • When \(FlagY\) reaches \(-D\), short \(X\) and buy \(Y\) in equal amounts. (\(-1\) Position)

Exiting rules: (\(S = 2\) in the paper)

  • If trades are opened based on \(FlagX\), then they are closed if \(FlagX\) returns to zero or reaches stop-loss position \(S\) or \(-S\).

  • If trades are opened based on \(FlagY\), then they are closed if \(FlagY\) returns to zero or reaches stop-loss position \(S\) or \(-S\).

  • After trades are closed, both \(FlagX\) and \(FlagY\) are reset to \(0\).

The rationale behind the dollar-neutral choice might be that (the authors did not mention this), because the signals are generated by returns, it makes sense to “reset” returns when entering into a long/short position.

Ambiguities

The authors did not specify what will happen if the following occurs:

  1. When \(FlagX`reaches :math:`D\) (or \(-D\)) and \(FlagY\) reaches \(D\) (or \(-D\)) together.

  2. When in a long(or short) position, receives a short(or long) trigger.

  3. When receiving an opening and exiting signal together.

  4. When the position was open based on \(FlagX\) (or \(FlagY\)), \(FlagY\) (or \(FlagX\)) reaches \(S\) or \(-S\).

Here is our take on the above issues:

  1. Do nothing.

  2. Change to the trigger position. For example, a long position with a short trigger will go short.

  3. Go for the exiting signal.

  4. Do nothing.

Choices for Open and Exit Logic

The above default logic is essentially an OR-OR logic for open and exit: When at least one of the 4 open conditions is satisfied, an open signal (long or short) is triggered; Similarly for the exit logic, to exit only one of them needs to be satisfied. The opening trigger is in general too sensitive and leads to too many trades, and [Rad et al. 2016] suggested using AND-OR logic instead. Thus, to achieve more flexibility, we allow the user to choose AND, OR for both open and exit logic and hence there are 4 possible combinations. Based on our tests we found AND-OR to be the most reasonable choice in general, but in certain situations other choices may have an edge.

The default is OR-OR, as suggested in [Xie et al. 2014], and you can switch to other logic in the get_positions_and_flags method by setting open_rule and exit_rule to your own liking. For instance open_rule='and', exit_rule='or'.

Note

There are some nuiances on how the logic is carried. In the paper [Xie et al. 2014], they tracked which stock led to opening of a position, and it influences the exit. This tracking procedure makes no sense for other 3 trading logics. The variable open_based_on is present in lower level functions that are (python) private, and they track which stock triggered the last opening. Thus this variable is not used (although still calculated, but it is likely incorrect) when using other logic.

../_images/returns_and_samples.png

Sampling from the various fitted copulas, and plot the empirical density from training data from BKD and ESC.

../_images/mpi_normalized_prices.png
../_images/mpi_flags_positions.png
../_images/mpi_units.png

A visualised output of flags, positions and units to hold using a Student-t copula. The stock pair considered is BKD and ESC.

Implementation

Module that uses copula for trading strategy based on (cumulative) mispricing index.

class MPICopulaTradingRule(opening_triggers: tuple = (-0.6, 0.6), stop_loss_positions: tuple = (-2, 2))

Copula trading strategy based on mispricing index(MPI).

This strategy uses mispricing indices from a pair of stocks to form positions. It is more specific than the original BacicCopulaStrategy as its logic is built upon the usage of return series, not price series from stocks. Indeed, it uses flag series, defined as the cumulative centered mispricing index, with certain reset conditions to form positions. A very important note is that, flag series are not uniquely defined based on the authors’ description. In some cases the reset conditions depends on whether the reset priority is higher or opening a position priority is higher. In this implementation as CopulaStrategyMPI, the reset priority is the highest. If one wishes to change the precedence, it is in method _get_position_and_reset_flag.

The implementation is based on the following paper: Xie, W., Liew, R.Q., Wu, Y. and Zou, X., 2014. Pairs Trading with Copulas.

Compared to the original BasicCopulaTradingRule class, it includes the following fundamental functionalities:

  1. Convert price series to return series.

  2. Calculate MPI and flags (essentially cumulative mispricing index).

  3. Use flags to form positions.

__init__(opening_triggers: tuple = (-0.6, 0.6), stop_loss_positions: tuple = (-2, 2))

Initiate an MPICopulaTradingRule class.

One can choose to initiate with no arguments, and later set a copula as the system’s Copula.

Parameters:
  • opening_triggers – (tuple) Optional. The thresholds for MPI to trigger a long/short position for the pair’s trading framework. Format is (long trigger, short trigger). Defaults to (-0.6, 0.6).

  • stop_loss_positions – (tuple) Optional. One of the conditions for MPI to trigger an exiting trading signal. Defaults to (-2, 2).

calc_mpi(returns: DataFrame) DataFrame

Calculate mispricing indices from returns.

Mispricing indices are technically cumulative conditional probabilities calculated from a copula based on returns data. i.e., MPI_1(r1, r2) = P(R1 <= r1 | R2 = r2), where r1, r2 are the value of returns for two stocks. Similarly MPI_2(r1, r2) = P(R2 <= r2 | R1 = r1).

Parameters:

returns – (pd.DataFrame) Return data frame for the stock pair.

Returns:

(pd.DataFrame) Mispricing indices for the pair of stocks.

get_condi_probs(quantile_data: DataFrame) DataFrame

Get conditional probabilities given the data. The input data needs to be quantile. The system should have a copula fitted to use. Make sure the quantile data does not have any NaN values. :param quantile_data: (pd.DataFrame) Data frame in quantiles with two columns. :return: (pd.DataFrame) The conditional probabilities calculated.

get_positions_and_flags(returns: ~pandas.core.frame.DataFrame, init_pos: int = 0, enable_reset_flag: bool = True, open_rule: str = 'or', exit_rule: str = 'or') -> (<class 'pandas.core.series.Series'>, <class 'pandas.core.frame.DataFrame'>)

Get the positions and flag series based on returns series.

Flags are defined as the accumulative, corrected MPIs. i.e., flag(t) = flag(t-1) + (mpi(t)-0.5). Note that flags reset when an exiting signal is present, so it is not a markov chain, a.k.a. it depends on history. This method at first calculates the MPIs based on return series. Then it loops through the mpi series to form flag series and positions. Suppose the upper opening trigger is D_u and the lower opening trigger is D_l, the stop-loss has upper threshold slp_u and lower threshold slp_l.

For the open OR and exit OR logic (method default) as described in [Xie et al. 2014], it goes as follows:

  • If flag1 >= D_u, short stock 1 and long stock 2. i.e., position = -1;

  • If flag1 <= D_l, short stock 2 and long stock 1. i.e., position = 1;

  • If flag1 >= D_u, short stock 2 and long stock 1. i.e., position = 1;

  • If flag1 >= D_l, short stock 1 and long stock 2. i.e., position = -1;

  • If trades are open based on flag1, then exit if flag1 returns to 0, or reaches slp_u or slp_l;

  • If trades are open based on flag2, then exit if flag2 returns to 0, or reaches slp_u or slp_l;

  • Once an exit trigger is activated, then BOTH flag1 and flag2 are reset to 0.

We also implemented OR-AND, AND-OR, AND-AND options for open-exit logic. For all those three methods, it does not keep track which stock opened the position, since it makes no logical sense. The AND-OR logic is the one used more often in other literatures such as [Rad et al. 2016], and is much more stable.

Note 1: The original description of the strategy in the paper states that the position should be interpreted as dollar neutral. i.e., buying stock A and sell B in equal dollar amounts. Here in this class we do not have this feature built-in to calculate ratios for forming positions and we still use -1, 1, 0 to indicate short, long and no position, as we think it offers better flexibility for the user to choose.

Note 2: The positions calculated on a certain day are corresponds to information given on THAT DAY. Thus for forming an equity curve, backtesting or actual trading, one should forward-roll the position by at least 1.

Parameters:
  • returns – (pd.DataFrame) Returns data frame for the stock pair.

  • init_pos – (int) Optional. Initial position. Takes value 0, 1, -1, corresponding to no position, long or short. Defaults to 0.

  • enable_reset_flag – (bool) Optional. Whether allowing the flag series to be reset by exit triggers. Defaults to True.

  • open_rule – (str) Optional. The logic for deciding to open a position from combining mispricing info from the two stocks. Choices are [‘and’, ‘or’]. ‘and’ means both stocks need to be mispriced to justify an opening. ‘or’ means only one stock need to be mispriced to open a position. Defaults to ‘or’.

  • exit_rule – (str) Optional. The logic for deciding to exit a position from combining mispricing info from the two stocks. Choices are [‘and’, ‘or’]. ‘and’ means both stocks need to be considered to justify an exit. ‘or’ means only one stock need to be considered to exit a position. Defaults to ‘or’.

Returns:

(pd.Series, pd.DataFrame) The calculated position series in a pd.Series, and the two flag series in a pd.DataFrame.

static positions_to_units_dollar_neutral(prices_df: DataFrame, positions: Series, multiplier: float = 1) DataFrame

Change the positions series into units held for each security for a dollar neutral strategy.

Originally the positions calculated by this strategy is given with values in {0, 1, -1}. To be able to actually trade using the dollar neutral strategy as given by the authors in the paper, one needs to know at any given time how much units to hold for each stock. The result will be returned in a pd.DataFrame. The user can also multiply the final result by changing the multiplier input. It means by default it uses 1 dollar for calculation unless changed. It also means there is no reinvestment on gains.

Note: This method assumes the 0th column in prices_df is the long unit (suppose it is called stock 1), 1st column the shrot unit (suppose it is called stock 2). For example, 1 in positions means buy stock 1 with 0.5 dollar and sell stock 2 to gain 0.5 dollar.

Note2: The short units will be given in its actual value. i.e., short 0.54 units is given as -0.54 in the output.

Parameters:
  • prices_df – (pd.DataFrame) Prices data frame for the two securities.

  • positions – (pd.Series) The suggested positions with values in {0, 1, -1}. Need to have the same length as prices_df.

  • multiplier – (float) Optional. Multiply the calculated result by this amount. Defalts to 1.

Returns:

(pd.DataFrame) The calculated positions for each security. The row and column index will be taken from prices_df.

set_cdf(cdf_x: Callable[[float], float], cdf_y: Callable[[float], float])

Set marginal C.D.Fs functions which transform X, Y values into probabilities, usually ECDFs are used. One can use construct_ecdf_lin function from copula_calculations module.

Parameters:
  • cdf_x – (func) Marginal C.D.F. for series X.

  • cdf_y – (func) Marginal C.D.F. for series Y.

set_copula(copula: object)

Set fit copula to self.copula.

Parameters:

copula – (object) Fit copula object.

static to_returns(pair_prices: DataFrame, fill_init_nan: Sequence[float] = (0, 0)) DataFrame

Convert a pair’s prices DataFrame to its returns DataFrame.

Returns (excess) defined as: r(t) = P(t) / P(t-1) - 1.

Note that the 0th row will be NaN value, and needs to be filled.

Parameters:
  • pair_prices – (pd.DataFrame) Prices data frame of the stock pair.

  • fill_init_nan – (Sequence[float]) Optional. What to fill the NaN value at the initial row. Defaults to (0, 0).

Returns:

(pd.DataFrame) Returns data frame for the stock pair.

Example

# Importing the module and other libraries
from arbitragelab.trading.copula_strategy_mpi import MPICopulaTradingRule
from arbitragelab.copula_approach import construct_ecdf_lin
from arbitragelab.copula_approach.archimedean import N14
import matplotlib.pyplot as plt
import pandas as pd

# Instantiating the module
CSMPI = MPICopulaTradingRule(opening_triggers=(-0.6, 0.6), stop_loss_positions=(-2, 2))

# Loading the data in prices of stock X and stock Y
prices = pd.read_csv('FILE_PATH' + 'stock_X_Y_prices.csv').set_index('Date').dropna()

# Convert prices to returns
returns = CSMPI.to_returns(prices)

# Split data into train and test sets
training_len = int(len(prices) * 0.7)
returns_train = returns.iloc[:training_len, :]
returns_test = returns.iloc[training_len:, :]
prices_train = prices.iloc[:training_len, :]
prices_test = prices.iloc[training_len:, :]

# Adding the N14 copula (it can be fitted with tools from the Copula Approach)
cop = N14(theta=2)
CSMPI.set_copula(cop)

# Constructing cdf for x and y
cdf_x = construct_ecdf_lin(returns['BKD'])
cdf_y = construct_ecdf_lin(returns['ESC'])
CSMPI.set_cdf(cdf_x, cdf_y)

# Forming positions and flags using trading data, assuming holding no position initially.
# Default uses OR-OR logic for open-exit.
positions, flags = CSMPI.get_positions_and_flags(returns=returns_test)

# Use AND-OR logic.
positions_and_or, flags_and_or = CSMPI.get_positions_and_flags(returns=returns_test,
                                                               open_rule='and',
                                                               exit_rule='or')

# Changing the positions series to units to hold for
# a dollar-neutral strategy for $10000 investment
units = CSMPI.positions_to_units_dollar_neutral(prices_df=prices_test,
                                                positions=positions,
                                                multiplier=10000)

Possible Issues

The following are critiques for the default strategy. For a thorough comparison in large amounts of stocks across several decades, read For the AND-OR strategy, read more in [Rad et al. 2016] on comparisons with other common strategies, using the AND-OR logic.

  1. The default strategy’s outcome is quite sensitive to the values of opening and exiting triggers to the point that a well-fitted copula with a not-so-good set of parameters can actually lose money.

  2. The trading signal is generated from the flags series, and the flags series will be calculated from the copula that we use to model. Therefore the explainability suffers. Also, it is based on the model in second order, and therefore the flag series and the suggested positions will be quite different across different copulas, making it not stable and not directly comparable mutually.

  3. The way the flags series are defined does not handle well when both stocks are underpriced/overpriced concurrently.

  4. Because flags will be reset to 0 once there is an exiting signal, it implicitly models the returns as martingales that do not depend on the current price level of the stock itself and the other stock. Such an assumption may be situational, and the user should be aware. (White noise returns do not imply that the prices are well cointegrated.)

  5. The strategy is betting the flags series having dominating mean-reversion behaviors, for a pair of cointegrated stocks. It is not mathematically clear what justifies the rationale.

  6. If accumulating mispricing index is basically using returns to reflect prices, and the raw flags look basically the same as normalized prices, why not just directly use normalized prices instead?

Research Notebooks

The following research notebook can be used to better understand the copula strategy described above.

Research Article


Presentation Slides


References