Codependence Matrix



The functions in this part of the module are used to generate dependence and distance matrices using the codependency and distance metrics described previously.

  1. Dependence Matrix function is used to compute codependences between elements in a given dataframe of elements using various codependence metrics like Mutual Information, Variation of Information, Distance Correlation, Spearman’s Rho, GPR distance, and GNPR distance.

  2. Distance Matrix function can be used to compute a distance matrix from a given codependency matrix using distance metrics like angular, squared angular and absolute angular.

Note

Underlying Literature

The following sources elaborate extensively on the topic:

Implementation

get_dependence_matrix(df: DataFrame, dependence_method: str, theta: float = 0.5, n_bins: int | None = None, normalize: bool = True, estimator: str = 'standard', target_dependence: str = 'comonotonicity', gaussian_corr: float = 0.7, var_threshold: float = 0.2) DataFrame

This function returns a dependence matrix for elements given in the dataframe using the chosen dependence method.

List of supported algorithms to use for generating the dependence matrix: information_variation, mutual_information, distance_correlation, spearmans_rho, gpr_distance, gnpr_distance, optimal_transport.

Parameters:
  • df – (pd.DataFrame) Features.

  • dependence_method – (str) Algorithm to be use for generating dependence_matrix.

  • theta – (float) Type of information being tested in the GPR and GNPR distances. Falls in range [0, 1]. (0.5 by default)

  • n_bins – (int) Number of bins for discretization in information_variation and mutual_information, if None the optimal number will be calculated. (None by default)

  • normalize – (bool) Flag used to normalize the result to [0, 1] in information_variation and mutual_information. (True by default)

  • estimator – (str) Estimator to be used for calculation in mutual_information. [standard, standard_copula, copula_entropy] (standard by default)

  • target_dependence – (str) Type of target dependence to use in optimal_transport. [comonotonicity, countermonotonicity, gaussian, positive_negative, different_variations, small_variations] (comonotonicity by default)

  • gaussian_corr – (float) Correlation coefficient to use when creating gaussian and small_variations copulas. [from 0 to 1] (0.7 by default)

  • var_threshold – (float) Variation threshold to use for coefficient to use in small_variations. Sets the relative area of correlation in a copula. [from 0 to 1] (0.2 by default)

Returns:

(pd.DataFrame) Dependence matrix.

get_distance_matrix(X: DataFrame, distance_metric: str = 'angular') DataFrame

Applies distance operator to a dependence matrix.

This allows to turn a correlation matrix into a distance matrix. Distances used are true metrics.

List of supported distance metrics to use for generating the distance matrix: angular, squared_angular, and absolute_angular.

Parameters:
  • X – (pd.DataFrame) Dataframe to which distance operator to be applied.

  • distance_metric – (str) The distance metric to be used for generating the distance matrix.

Returns:

(pd.DataFrame) Distance matrix.

Example

import pandas as pd
from arbitragelab.codependence import (get_dependence_matrix, get_distance_matrix)

 # Import dataframe of returns for assets in a portfolio
 asset_returns = pd.read_csv(DATA_PATH, index_col='Date', parse_dates=True)

 # Calculate distance correlation matrix
 distance_corr = get_dependence_matrix(asset_returns, dependence_method='distance_correlation')

 # Calculate Pearson correlation matrix
 pearson_corr = asset_returns.corr()

 # Calculate absolute angular distance from a Pearson correlation matrix
 abs_angular_dist = absolute_angular_distance(pearson_corr)

Presentation Slides

../_images/codependence_slides.png

References