arbitragelab.copula_approach.mixed_copulas.ctg_mix_copula

Module that implements Clayton, Student-t and Gumbel mixed copula.

Module Contents

Classes

CTGMixCop

Clayton, Student-t and Gumbel mixed copula.

class CTGMixCop(cop_params: tuple = None, weights: tuple = None)

Bases: arbitragelab.copula_approach.mixed_copulas.base.MixedCopula

Clayton, Student-t and Gumbel mixed copula.

Mixed copula for trading strategy method described in the following article B Sabino da Silva, F., Ziegelman, F. and Caldeira, J., 2017. Mixed Copula Pairs Trading Strategy on the S&P 500. Flávio and Caldeira, João, Mixed Copula Pairs Trading Strategy on the S&P, 500.

Note: Algorithm for fitting mixed copula was adapted from

Cai, Z. and Wang, X., 2014. Selection of mixed copula model via penalized likelihood. Journal of the American Statistical Association, 109(506), pp.788-801.

__slots__ = ()
fit(data: pandas.DataFrame, max_iter: int = 25, gamma_scad: float = 0.6, a_scad: float = 6, weight_margin: float = 0.01) float

Fitting cop_params and weights by expectation maximization (EM) from real data.

Changes the mix copulas weights and copula parameters internally. Also returns the sum of log likelihood. The data will be converted to quantile by empirical cumulative distribution function.

Implementation of EM method based on a non-parametric adaptation of the article: Cai, Z. and Wang, X., 2014. Selection of mixed copula model via penalized likelihood. Journal of the American Statistical Association, 109(506), pp.788-801.

It contains the following procedure:

  1. Expectation step computes and updates the weights conditional on the copula parameters, using an iterative method.

  2. Maximization step maximizes an adapted log-likelihood function Q with penalty terms given the weights, using a Truncated Newton method, by minimizing Q over cop_params.

Note: For the tuning parameters gamma_scad and a_scad, the final result is relatively sensitive based on their value. The default values were tested on limited data sets using stocks price series and returns series. However, the user is expected to tune them when necessary. Another approach is to estimate them using cross validation by the user. It is always a good practice to plot the sampling with the actual data for a sanity check.

Parameters:
  • data – (pd.DataFrame) Data in (n, 2) pd.DataFrame used to fit the mixed copula.

  • max_iter – (int) Optional. Maximum iteration for the EM method. The class default value 25 is just an empirical estimation and the user is expected to change it when needed.

  • gamma_scad – (float) Optional. Tuning parameter for the SCAD penalty term. Defaults to 0.6.

  • a_scad – (float) Optional. Tuning parameter for the SCAD penalty term. Defaults to 6.

  • weight_margin – (float) Optional. A small number such that if below this threshold, the weight will be considered 0. Defaults to 1e-2.

Returns:

(float) Sum of log likelihood for the fit.

describe() pandas.Series

Describe the components and coefficients of the mixed copula.

The description includes descriptive name, class name, the copula dependency parameter for each mixed copula component and their weights.

Returns:

(pd.Series) The description of the specific mixed copula.

get_cop_density(u: float, v: float, eps: float = 1e-05) float

Calculate probability density of the bivariate copula: P(U=u, V=v).

Result is analytical. Also the u and v will be remapped into [eps, 1-eps] to avoid edge values that may result in infinity or NaN.

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

  • eps – (float) Optional. The distance to the boundary 0 or 1, such that the value u, v will be mapped back. Defaults to 1e-5.

Returns:

(float) The probability density (aka copula density).

get_cop_eval(u: float, v: float, eps: float = 0.0001) float

Calculate cumulative density of the bivariate copula: P(U<=u, V<=v).

Result is analytical except for Student-t copula. Also at the u and v will be remapped into [eps, 1-eps] to avoid edge values that may result in infinity or NaN.

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

  • eps – (float) Optional. The distance to the boundary 0 or 1, such that the value u, v will be mapped back. Defaults to 1e-4.

Returns:

(float) The cumulative density.

get_condi_prob(u: float, v: float, eps: float = 1e-05) float

Calculate conditional probability function: P(U<=u | V=v).

Result is analytical. Also at the u and v will be remapped into [eps, 1-eps] to avoid edge values that may result in infinity or NaN.

Note: This probability is symmetric about (u, v).

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

  • eps – (float) Optional. The distance to the boundary 0 or 1, such that the value u, v will be mapped back. Defaults to 1e-5.

Returns:

(float) The conditional probability.

sample(num: int) numpy.array

Generate pairs according to P.D.F., stored in a 2D np.array of shape (num, 2).

Parameters:

num – (int) Number of points to generate.

Return sample_pairs:

(np.array) Shape=(num, 2) array, sampled data for this copula.

static theta_hat(tau: float) float

Calculate theta hat from Kendall’s tau from sample data.

Parameters:

tau – (float) Kendall’s tau from sample data.

Returns:

(float) The associated theta hat for this very copula.

get_log_likelihood_sum(u: numpy.array, v: numpy.array) float

Get log-likelihood value sum.

Parameters:
  • u – (np.array) 1D vector data of X pseudo-observations. Need to be uniformly distributed [0, 1].

  • v – (np.array) 1D vector data of Y pseudo-observations. Need to be uniformly distributed [0, 1].

Returns:

(float) Log-likelihood sum value.

c(u: float, v: float) float

Placeholder for calculating copula density.

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

C(u: float, v: float) float

Placeholder for calculating copula evaluation.

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

condi_cdf(u: float, v: float) float

Placeholder for calculating copula conditional probability.

Parameters:
  • u – (float) A real number in [0, 1].

  • v – (float) A real number in [0, 1].

plot_cdf(plot_type: str = '3d', grid_size: int = 50, levels: list = None, **kwargs) matplotlib.pyplot.axis

Plot either ‘3d’ or ‘contour’ plot of copula CDF.

Parameters:
  • plot_type – (str) Either ‘3d’ or ‘contour’(2D) plot.

  • grid_size – (int) Mesh grid granularity.

  • kwargs – (dict) User-specified params for ‘ax.plot_surface’/’plt.contour’.

  • levels – (list) List of float values that determine the number and levels of lines in a contour plot. If not provided, these are calculated automatically.

Returns:

(plt.axis) Axis object.

plot_scatter(num_points: int = 100) matplotlib.axes.Axes

Plot copula scatter plot of generated pseudo-observations.

Parameters:

num_points – (int) Number of samples to generate.

Returns:

(plt.axis) Axis object.

plot_pdf(plot_type: str = '3d', grid_size: int = 50, levels: list = None, **kwargs) matplotlib.figure.Figure

Plot either ‘3d’ or ‘contour’ plot of copula PDF.

Parameters:
  • plot_type – (str) Either ‘3d’ or ‘contour’(2D) plot.

  • grid_size – (int) Mesh grid granularity.

  • levels – (list) List of float values that determine the number and levels of lines in a contour plot. If not provided, these are calculated automatically.

Returns:

(plt.axis) Axis object.