arbitragelab.trading.copula_strategy_mpi
Module that uses copula for trading strategy based on (cumulative) mispricing index.
Module Contents
Classes
Copula trading strategy based on mispricing index(MPI). |
- class MPICopulaTradingRule(opening_triggers: tuple = (-0.6, 0.6), stop_loss_positions: tuple = (-2, 2))
Copula trading strategy based on mispricing index(MPI).
This strategy uses mispricing indices from a pair of stocks to form positions. It is more specific than the original BacicCopulaStrategy as its logic is built upon the usage of return series, not price series from stocks. Indeed, it uses flag series, defined as the cumulative centered mispricing index, with certain reset conditions to form positions. A very important note is that, flag series are not uniquely defined based on the authors’ description. In some cases the reset conditions depends on whether the reset priority is higher or opening a position priority is higher. In this implementation as CopulaStrategyMPI, the reset priority is the highest. If one wishes to change the precedence, it is in method _get_position_and_reset_flag.
The implementation is based on the following paper: Xie, W., Liew, R.Q., Wu, Y. and Zou, X., 2014. Pairs Trading with Copulas.
Compared to the original BasicCopulaTradingRule class, it includes the following fundamental functionalities:
Convert price series to return series.
Calculate MPI and flags (essentially cumulative mispricing index).
Use flags to form positions.
- set_copula(copula: object)
Set fit copula to self.copula.
- Parameters:
copula – (object) Fit copula object.
- set_cdf(cdf_x: Callable[[float], float], cdf_y: Callable[[float], float])
Set marginal C.D.Fs functions which transform X, Y values into probabilities, usually ECDFs are used. One can use construct_ecdf_lin function from copula_calculations module.
- Parameters:
cdf_x – (func) Marginal C.D.F. for series X.
cdf_y – (func) Marginal C.D.F. for series Y.
- static to_returns(pair_prices: pandas.DataFrame, fill_init_nan: Sequence[float] = (0, 0)) pandas.DataFrame
Convert a pair’s prices DataFrame to its returns DataFrame.
Returns (excess) defined as: r(t) = P(t) / P(t-1) - 1.
Note that the 0th row will be NaN value, and needs to be filled.
- Parameters:
pair_prices – (pd.DataFrame) Prices data frame of the stock pair.
fill_init_nan – (Sequence[float]) Optional. What to fill the NaN value at the initial row. Defaults to (0, 0).
- Returns:
(pd.DataFrame) Returns data frame for the stock pair.
- calc_mpi(returns: pandas.DataFrame) pandas.DataFrame
Calculate mispricing indices from returns.
Mispricing indices are technically cumulative conditional probabilities calculated from a copula based on returns data. i.e., MPI_1(r1, r2) = P(R1 <= r1 | R2 = r2), where r1, r2 are the value of returns for two stocks. Similarly MPI_2(r1, r2) = P(R2 <= r2 | R1 = r1).
- Parameters:
returns – (pd.DataFrame) Return data frame for the stock pair.
- Returns:
(pd.DataFrame) Mispricing indices for the pair of stocks.
- get_condi_probs(quantile_data: pandas.DataFrame) pandas.DataFrame
Get conditional probabilities given the data. The input data needs to be quantile. The system should have a copula fitted to use. Make sure the quantile data does not have any NaN values. :param quantile_data: (pd.DataFrame) Data frame in quantiles with two columns. :return: (pd.DataFrame) The conditional probabilities calculated.
- static positions_to_units_dollar_neutral(prices_df: pandas.DataFrame, positions: pandas.Series, multiplier: float = 1) pandas.DataFrame
Change the positions series into units held for each security for a dollar neutral strategy.
Originally the positions calculated by this strategy is given with values in {0, 1, -1}. To be able to actually trade using the dollar neutral strategy as given by the authors in the paper, one needs to know at any given time how much units to hold for each stock. The result will be returned in a pd.DataFrame. The user can also multiply the final result by changing the multiplier input. It means by default it uses 1 dollar for calculation unless changed. It also means there is no reinvestment on gains.
Note: This method assumes the 0th column in prices_df is the long unit (suppose it is called stock 1), 1st column the shrot unit (suppose it is called stock 2). For example, 1 in positions means buy stock 1 with 0.5 dollar and sell stock 2 to gain 0.5 dollar.
Note2: The short units will be given in its actual value. i.e., short 0.54 units is given as -0.54 in the output.
- Parameters:
prices_df – (pd.DataFrame) Prices data frame for the two securities.
positions – (pd.Series) The suggested positions with values in {0, 1, -1}. Need to have the same length as prices_df.
multiplier – (float) Optional. Multiply the calculated result by this amount. Defalts to 1.
- Returns:
(pd.DataFrame) The calculated positions for each security. The row and column index will be taken from prices_df.
- get_positions_and_flags(returns: pandas.DataFrame, init_pos: int = 0, enable_reset_flag: bool = True, open_rule: str = 'or', exit_rule: str = 'or')
Get the positions and flag series based on returns series.
Flags are defined as the accumulative, corrected MPIs. i.e., flag(t) = flag(t-1) + (mpi(t)-0.5). Note that flags reset when an exiting signal is present, so it is not a markov chain, a.k.a. it depends on history. This method at first calculates the MPIs based on return series. Then it loops through the mpi series to form flag series and positions. Suppose the upper opening trigger is D_u and the lower opening trigger is D_l, the stop-loss has upper threshold slp_u and lower threshold slp_l.
For the open OR and exit OR logic (method default) as described in [Xie et al. 2014], it goes as follows:
If flag1 >= D_u, short stock 1 and long stock 2. i.e., position = -1;
If flag1 <= D_l, short stock 2 and long stock 1. i.e., position = 1;
If flag1 >= D_u, short stock 2 and long stock 1. i.e., position = 1;
If flag1 >= D_l, short stock 1 and long stock 2. i.e., position = -1;
If trades are open based on flag1, then exit if flag1 returns to 0, or reaches slp_u or slp_l;
If trades are open based on flag2, then exit if flag2 returns to 0, or reaches slp_u or slp_l;
Once an exit trigger is activated, then BOTH flag1 and flag2 are reset to 0.
We also implemented OR-AND, AND-OR, AND-AND options for open-exit logic. For all those three methods, it does not keep track which stock opened the position, since it makes no logical sense. The AND-OR logic is the one used more often in other literatures such as [Rad et al. 2016], and is much more stable.
Note 1: The original description of the strategy in the paper states that the position should be interpreted as dollar neutral. i.e., buying stock A and sell B in equal dollar amounts. Here in this class we do not have this feature built-in to calculate ratios for forming positions and we still use -1, 1, 0 to indicate short, long and no position, as we think it offers better flexibility for the user to choose.
Note 2: The positions calculated on a certain day are corresponds to information given on THAT DAY. Thus for forming an equity curve, backtesting or actual trading, one should forward-roll the position by at least 1.
- Parameters:
returns – (pd.DataFrame) Returns data frame for the stock pair.
init_pos – (int) Optional. Initial position. Takes value 0, 1, -1, corresponding to no position, long or short. Defaults to 0.
enable_reset_flag – (bool) Optional. Whether allowing the flag series to be reset by exit triggers. Defaults to True.
open_rule – (str) Optional. The logic for deciding to open a position from combining mispricing info from the two stocks. Choices are [‘and’, ‘or’]. ‘and’ means both stocks need to be mispriced to justify an opening. ‘or’ means only one stock need to be mispriced to open a position. Defaults to ‘or’.
exit_rule – (str) Optional. The logic for deciding to exit a position from combining mispricing info from the two stocks. Choices are [‘and’, ‘or’]. ‘and’ means both stocks need to be considered to justify an exit. ‘or’ means only one stock need to be considered to exit a position. Defaults to ‘or’.
- Returns:
(pd.Series, pd.DataFrame) The calculated position series in a pd.Series, and the two flag series in a pd.DataFrame.