arbitragelab.copula_approach.copula_calculation

Back end module that handles maximum likelihood related copula calculations.

Functions include:

  • Finding (marginal) cumulative distribution function from data.

  • Finding empirical cumulative distribution function from data with linear interpolation.

  • Maximum likelihood estimation of theta_hat (empirical theta) from data.

  • Calculating the sum log-likelihood given a copula and data.

  • Calculating SIC (Schwarz information criterion).

  • Calculating AIC (Akaike information criterion).

  • Calculating HQIC (Hannan-Quinn information criterion).

  • Fitting Student-t Copula.

  • SCAD penalty functions.

  • Adjust weights for mixed copulas for normality.

For more information about the SCAD penalty functions on fitting mixed copulas, please refer to Cai, Z. and Wang, X., 2014. Selection of mixed copula model via penalized likelihood. Journal of the American Statistical Association, 109(506), pp.788-801.

Module Contents

Functions

find_marginal_cdf(→ Callable[[float], float])

Find the cumulative density function (CDF). i.e., P(X<=x).

construct_ecdf_lin(→ Callable)

Construct an empirical cumulative density function with linear interpolation between data points.

to_quantile(→ Tuple[pandas.DataFrame, list])

Convert the data frame to quantile by row.

sic(→ float)

Schwarz information criterion (SIC), aka Bayesian information criterion (BIC).

aic(→ float)

Akaike information criterion.

hqic(→ float)

Hannan-Quinn information criterion.

scad_penalty(→ float)

SCAD (smoothly clipped absolute deviation) penalty function.

scad_derivative(→ float)

The derivative of SCAD (smoothly clipped absolute deviation) penalty function w.r.t x.

adjust_weights(→ numpy.array)

Adjust the weights of mixed copula components.

fit_copula_to_empirical_data(→ tuple)

Fit copula to empirical data and generate goodness-of-fit statistics as well as empirical CDFs used in estimation.

find_marginal_cdf(x: numpy.array, empirical: bool = True, **kwargs) Callable[[float], float]

Find the cumulative density function (CDF). i.e., P(X<=x).

User can choose between an empirical CDF or a CDF selected by maximum likelihood.

Parameters:
  • x – (np.array) Data. Will be scaled to [0, 1].

  • empirical – (bool) Whether to use empirical estimation for CDF.

  • kwargs – (dict) Setting the floor and cap of probability. prob_floor: (float) Probability floor. prob_cap: (float) Probability cap.

Returns:

(func) The cumulative density function from data.

construct_ecdf_lin(train_data: numpy.array, upper_bound: float = 1 - 1e-05, lower_bound: float = 1e-05) Callable

Construct an empirical cumulative density function with linear interpolation between data points.

The function it returns agrees with the ECDF function from statsmodels in values, but also applies linear interpolation to fill the gap. Features include: Allowing training data to have nan values; Allowing the cumulative density output to have an upper and lower bound, to avoid singularities in some applications with probability 0 or 1.

Parameters:
  • train_data – (np.array) The data to train the output ecdf function.

  • upper_bound – (float) The upper bound value for the returned ecdf function.

  • lower_bound – (float) The lower bound value for the returned ecdf function.

Returns:

(Callable) The constructed ecdf function.

to_quantile(data: pandas.DataFrame) Tuple[pandas.DataFrame, list]

Convert the data frame to quantile by row.

Not in place. Also returns the marginal cdfs of each column. This can work with more than just 2 columns.

The method returns:

  • quantile_data: (pd.DataFrame) The calculated quantile data in a data frame with the original indexing.

  • cdf_list: (list) The list of marginal cumulative density functions.

Parameters:

data – (pd.DataFrame) The original data in DataFrame.

Returns:

(tuple) quantile_data: (pd.DataFrame) The calculated quantile data in a data frame with the original indexing. cdf_list: (list) The list of marginal cumulative density functions.

sic(log_likelihood: float, n: int, k: int = 1) float

Schwarz information criterion (SIC), aka Bayesian information criterion (BIC).

Parameters:
  • log_likelihood – (float) Sum of log-likelihood of some data.

  • n – (int) Number of instances.

  • k – (int) Number of parameters estimated by max likelihood.

Returns:

(float) Value of SIC.

aic(log_likelihood: float, n: int, k: int = 1) float

Akaike information criterion.

Parameters:
  • log_likelihood – (float) Sum of log-likelihood of some data.

  • n – (int) Number of instances.

  • k – (int) Number of parameters estimated by max likelihood.

Returns:

(float) Value of AIC.

hqic(log_likelihood: float, n: int, k: int = 1) float

Hannan-Quinn information criterion.

Parameters:
  • log_likelihood – (float) Sum of log-likelihood of some data.

  • n – (int) Number of instances.

  • k – (int) Number of parameters estimated by max likelihood.

Returns:

(float) Value of HQIC.

scad_penalty(x: float, gamma: float, a: float) float

SCAD (smoothly clipped absolute deviation) penalty function.

It encourages sparse solutions for fitting data to models. As a piecewise function, this implementation is branchless.

Parameters:
  • x – (float) The variable.

  • gamma – (float) One of the parameters in SCAD.

  • a – (float) One of the parameters in SCAD.

Returns:

(float) Evaluated result.

scad_derivative(x: float, gamma: float, a: float) float

The derivative of SCAD (smoothly clipped absolute deviation) penalty function w.r.t x.

It encourages sparse solutions for fitting data to models.

Parameters:
  • x – (float) The variable.

  • gamma – (float) One of the parameters in SCAD.

  • a – (float) One of the parameters in SCAD.

Returns:

(float) Evaluated result.

adjust_weights(weights: numpy.array, threshold: float) numpy.array

Adjust the weights of mixed copula components.

Dropping weights smaller or equal to a given threshold, and redistribute the weight. For example, if we set the threshold to 0.02 and the original weight is [0.49, 0.02, 0.49], then it will be re-adjusted to [0.5, 0, 0.5].

Parameters:
  • weights – (np.array) The original weights to be adjusted.

  • threshold – (float) The threshold that a weight will be considered 0.

Returns:

(np.array) The readjusted weight.

fit_copula_to_empirical_data(x: numpy.array, y: numpy.array, copula: arbitragelab.copula_approach.base.Copula) tuple

Fit copula to empirical data and generate goodness-of-fit statistics as well as empirical CDFs used in estimation.

If fitting a Student-t copula, it also includes a max likelihood fit for nu using COBYLA method from scipy.optimize.minimize. nu’s fit range is [1, 15]. When the user wishes to use nu > 15, please delegate to Gaussian copula instead. This step is relatively slow.

The output returns:
  • result_dict: (dict) The name of the copula and its SIC, AIC, HQIC values;

  • copula: (Copula) The fitted copula with parameters satisfying maximum likelihood;

  • s1_cdf: (func) The cumulative density function for stock 1, using training data;

  • s2_cdf: (func) The cumulative density function for stock 2, using training data.

Parameters:
  • x – (np.array) 1D stock time series data in desired form.

  • y – (np.array) 1D stock time series data in desired form.

  • copula – (Copula) Copula class to fit.

Returns:

(dict, Copula, func, func) The name of the copula and its SIC, AIC, HQIC values; The fitted copula with parameters satisfying maximum likelihood; The cumulative density function for series 1, using training data; The cumulative density function for series 2, using training data.