Codependence Matrix
The functions in this part of the module are used to generate dependence and distance matrices using the codependency and distance metrics described previously.
Dependence Matrix function is used to compute codependences between elements in a given dataframe of elements using various codependence metrics like Mutual Information, Variation of Information, Distance Correlation, Spearman’s Rho, GPR distance, and GNPR distance.
Distance Matrix function can be used to compute a distance matrix from a given codependency matrix using distance metrics like angular, squared angular and absolute angular.
Note
Underlying Literature
The following sources elaborate extensively on the topic:
Codependence (Presentation Slides) by Marcos Lopez de Prado.
Implementation
- get_dependence_matrix(df: DataFrame, dependence_method: str, theta: float = 0.5, n_bins: int | None = None, normalize: bool = True, estimator: str = 'standard', target_dependence: str = 'comonotonicity', gaussian_corr: float = 0.7, var_threshold: float = 0.2) DataFrame
This function returns a dependence matrix for elements given in the dataframe using the chosen dependence method.
List of supported algorithms to use for generating the dependence matrix:
information_variation
,mutual_information
,distance_correlation
,spearmans_rho
,gpr_distance
,gnpr_distance
,optimal_transport
.- Parameters:
df – (pd.DataFrame) Features.
dependence_method – (str) Algorithm to be use for generating dependence_matrix.
theta – (float) Type of information being tested in the GPR and GNPR distances. Falls in range [0, 1]. (0.5 by default)
n_bins – (int) Number of bins for discretization in
information_variation
andmutual_information
, if None the optimal number will be calculated. (None by default)normalize – (bool) Flag used to normalize the result to [0, 1] in
information_variation
andmutual_information
. (True by default)estimator – (str) Estimator to be used for calculation in
mutual_information
. [standard
,standard_copula
,copula_entropy
] (standard
by default)target_dependence – (str) Type of target dependence to use in
optimal_transport
. [comonotonicity
,countermonotonicity
,gaussian
,positive_negative
,different_variations
,small_variations
] (comonotonicity
by default)gaussian_corr – (float) Correlation coefficient to use when creating
gaussian
andsmall_variations
copulas. [from 0 to 1] (0.7 by default)var_threshold – (float) Variation threshold to use for coefficient to use in
small_variations
. Sets the relative area of correlation in a copula. [from 0 to 1] (0.2 by default)
- Returns:
(pd.DataFrame) Dependence matrix.
- get_distance_matrix(X: DataFrame, distance_metric: str = 'angular') DataFrame
Applies distance operator to a dependence matrix.
This allows to turn a correlation matrix into a distance matrix. Distances used are true metrics.
List of supported distance metrics to use for generating the distance matrix:
angular
,squared_angular
, andabsolute_angular
.- Parameters:
X – (pd.DataFrame) Dataframe to which distance operator to be applied.
distance_metric – (str) The distance metric to be used for generating the distance matrix.
- Returns:
(pd.DataFrame) Distance matrix.
Example
import pandas as pd
from arbitragelab.codependence import (get_dependence_matrix, get_distance_matrix)
# Import dataframe of returns for assets in a portfolio
asset_returns = pd.read_csv(DATA_PATH, index_col='Date', parse_dates=True)
# Calculate distance correlation matrix
distance_corr = get_dependence_matrix(asset_returns, dependence_method='distance_correlation')
# Calculate Pearson correlation matrix
pearson_corr = asset_returns.corr()
# Calculate absolute angular distance from a Pearson correlation matrix
abs_angular_dist = absolute_angular_distance(pearson_corr)