arbitragelab.copula_approach.vinecop_generate
Module that generates vine copulas.
Built on top of the pyvinecoplib
package. See https://github.com/vinecopulib/pyvinecopulib for more details.
Module Contents
Classes
R-Vine copula class, housing C-vine copula and D-vine copula as special sub cases. |
|
Class for C-vine copulas. |
- class RVineCop
R-Vine copula class, housing C-vine copula and D-vine copula as special sub cases.
- class CVineCop(cvine_cop: pyvinecopulib.Vinecop = None)
Bases:
RVineCop
Class for C-vine copulas.
This is a wrapper class of pv.Vinecop that provides useful methods for statistical arbitrage. One key note is that, to keep the notation consistent with literature consensus, all variables started with pv are indexed from 1, not 0. In this way it is easier to see the nodes of the tree.
- fit_auto(data: pandas.DataFrame, pv_target_idx: int = 1, if_renew: bool = True, alt_cvine_structure=False)
Auto fit data to the C-vine copula by selecting the center stock.
The method will loop through all possible C-vine structures and choose the best fit by AIC. The targeted stock will never show up in the center of the tree, thus some C-vine structures need not be included. This method is relatively slow, and the complexity is O(n!) with n being the number of stocks and there is no way to work around this. Hence please keep n <= 7.
The original method suggested in [Stubinger et al. 2016] says the target stock should not be at the center of each level of the C-vine tree, due to the way conditional probabilities are calculated. However the C-vine structure effectively orders the variables by their importance of dependency, the target stock makes sense to be the most important node. I.e, for the C-vine structure (4, 2, 1, 3), variable 3 should be the target stock, in contrast to variable 4 as suggested in [Stubinger et al. 2016]. Therefore we provide both options by toggling the alt_cvine_structure variable.
The original pv.Vinecop will be returned just in case there is some usage not covered by this class.
- Parameters:
data – (pd.DataFrame) The quantile data to be used to fit the C-vine copula.
pv_target_idx – (int) Optional. The stock to be targeted for trading. This is indexed from 1, hence 1 corresponds to the 0th column data in the data frame. Defaults to 1.
if_renew – (bool) Optional. Whether to update the class attribute cvine_cop. Defaults to True.
alt_cvine_structure – (bool) Optional. Whether to use the alternative method to generate possible C-vines. Defaults to False.
- Returns:
(pv.Vinecop) The fitted pv.Vinecop object.
- get_condi_probs(u: pandas.DataFrame | numpy.array, pv_target_idx: int = 1, eps: float = 0.0001) pandas.Series | float
Get the conditional probabilities of the C-vine copula for a target.
- For example, if we have 5 stocks and pv_target_idx = 2, then this method calculates:
P(U2 <= u2 | U1=u1, U3=u3, U4=u2, U5=u5)
The calculation is numerical by integrating along the margin. By default it uses the 0th element of u as the target. Also this function’s value is wrapped within [eps, 1-eps] to avoid potential edge values by default.
- Parameters:
u – (Union[pd.DataFrame, np.array]) The quantiles data to be used. The input can be a pandas dataframe, or a numpy array vector. The formal case yields the result with in pandas series in matching indices, and the latter yields a single float number.
pv_target_idx – (int) Optional. The stock to be targeted for trading. This is indexed from 1, hence 1 corresponds to the 0th column data in a pandas data frame. In this case it is the only variable not conditioned on. Defaults to 1.
eps – (float) Optional. The small value that keeps results within [eps, 1-eps]. Defaults to 1e-4.
- Returns:
(Union[pd.Series, float]) The calculated pdf. If the input is a dataframe then the result is a series with matching indices. If the input is a 1D np.array then the result is a float.
- get_cop_densities(u: pandas.DataFrame | numpy.array, num_threads: int = 1) pandas.Series | float
Calculate probability density of the vine copula.
Result is analytical. You may also take advantage of multi-thread calculation. The result will be either a pandas series of numbers or a single number depends on the dimension of the input.
- Parameters:
u – (Union[pd.DataFrame, np.array]) The quantiles data to be used. The input can be a pandas dataframe, or a numpy array vector. The formal case yields the result with in pandas series in matching indices, and the latter yields a single float number.
num_threads – (int) Optional. The number of threads to use for calculation. Defaults to 1.
- Returns:
(Union[pd.Series, float]) The calculated pdf. If the input is a dataframe then the result is a series with matching indices. If the input is a 1D np.array then the result is a float.
- get_cop_evals(u: pandas.DataFrame) pandas.Series
Calculate cumulative density of the vine copula.
Result is numerical through Monte-Carlo integration. You may also take advantage of multi-thread calculation. The result will be either a pandas series of numbers or a single number depends on the dimension of the input.
- Parameters:
u – (Union[pd.DataFrame, np.array]) The quantiles data to be used. The input can be a pandas dataframe, or a numpy array vector. The formal case yields the result with in pandas series in matching indices, and the latter yields a single float number.
- Returns:
(Union[pd.Series, float]) The calculated cdf. If the input is a dataframe then the result is a series with matching indices. If the input is a 1D np.array then the result is a float.
- simulate(n: int, qrn: bool = False, num_threads: int = 1, seeds: List[int] = None) numpy.array
Simulate from a vine copula model.
- Parameters:
n – (int) Number of observations.
qrn – (bool) Optional. Set to True for quasi-random numbers. Defaults to False.
num_threads – (int) Optional. The number of threads to use for calculation. Defaults to 1.
seeds – (List[int]) Optional. Seeds of the random number generator. If empty then the random generator will be seeded randomly. Defaults to None.
- Returns:
(pd.array) The generated random samples from the vine copula.
- aic(u: pandas.DataFrame, num_threads: int = 1) float
Evaluates the Akaike information criterion (AIC).
- Parameters:
u – (pd.DataFrame) The quantile data used for evaluation.
num_threads – (int) Optional. The number of threads to use for calculation. Defaults to 1.
- Returns:
(float) Calculated AIC value.
- bic(u: pandas.DataFrame, num_threads: int = 1) float
Evaluates the Bayesian information criterion (BIC).
- Parameters:
u – (pd.DataFrame) The quantile data used for evaluation.
num_threads – (int) Optional. The number of threads to use for calculation. Defaults to 1.
- Returns:
(float) Calculated BIC value.
- loglik(u: pandas.DataFrame, num_threads: int = 1) float
Evaluates the Sum of log-likelihood.
- Parameters:
u – (pd.DataFrame) The quantile data used for evaluation.
num_threads – (int) Optional. The number of threads to use for calculation. Defaults to 1.
- Returns:
(float) Calculated sum of log-likelihood.