`arbitragelab.codependence.gnpr_distance`

Implementation of distance using the Generic Non-Parametric Representation approach from “Some contributions to the clustering of financial time series and applications to credit default swaps” by Gautier Marti https://www.researchgate.net/publication/322714557

Module Contents

Functions

`spearmans_rho`(→ float)	Calculates a statistical estimate of Spearman's rho - a copula-based dependence measure.
`gpr_distance`(→ float)	Calculates the distance between two Gaussians under the Generic Parametric Representation (GPR) approach.
`gnpr_distance`(→ float)	Calculates the empirical distance between two random variables under the Generic Non-Parametric Representation

spearmans_rho(x: numpy.array, y: numpy.array) → float

Calculates a statistical estimate of Spearman’s rho - a copula-based dependence measure.

Formula for calculation: rho = 1 - (6)/(T*(T^2-1)) * Sum((X_t-Y_t)^2)

It is more robust to noise and can be defined if the variables have an infinite second moment. This statistic is described in more detail in the work by Gautier Marti https://www.researchgate.net/publication/322714557 (p.54)

This method is a wrapper for the scipy spearmanr function. For more details about the function and its parameters, please visit scipy documentation https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.spearmanr.html

Parameters:

x – (np.array/pd.Series) X vector
y – (np.array/pd.Series) Y vector (same number of observations as X)

Returns:

(float) Spearman’s rho statistical estimate

gpr_distance(x: numpy.array, y: numpy.array, theta: float) → float

Calculates the distance between two Gaussians under the Generic Parametric Representation (GPR) approach.

According to the original work https://www.researchgate.net/publication/322714557 (p.70): “This is a fast and good proxy for distance d_theta when the first two moments … predominate”. But it’s not a good metric for heavy-tailed distributions.

Parameter theta defines what type of information dependency is being tested: - for theta = 0 the distribution information is tested - for theta = 1 the dependence information is tested - for theta = 0.5 a mix of both information types is tested

With theta in [0, 1] the distance lies in range [0, 1] and is a metric. (See original work for proof, p.71)

Parameters:

x – (np.array/pd.Series) X vector.
y – (np.array/pd.Series) Y vector (same number of observations as X).
theta – (float) Type of information being tested. Falls in range [0, 1].

Returns:

(float) Distance under GPR approach.

gnpr_distance(x: numpy.array, y: numpy.array, theta: float, n_bins: int = 50) → float

Calculates the empirical distance between two random variables under the Generic Non-Parametric Representation (GNPR) approach.

Formula for the distance is taken from https://www.researchgate.net/publication/322714557 (p.72).

Parameter theta defines what type of information dependency is being tested: - for theta = 0 the distribution information is tested - for theta = 1 the dependence information is tested - for theta = 0.5 a mix of both information types is tested

With theta in [0, 1] the distance lies in the range [0, 1] and is a metric. (See original work for proof, p.71)

This method is modified as it uses 1D Optimal Transport Distance to measure distribution distance. This solves the issue of defining support and choosing a number of bins. The number of bins can be given as an input to speed up calculations. Big numbers of bins can take a long time to calculate.

Parameters:

x – (np.array/pd.Series) X vector.
y – (np.array/pd.Series) Y vector (same number of observations as X).
theta – (float) Type of information being tested. Falls in range [0, 1].
n_bins – (int) Number of bins to use to split the X and Y vector observations. (100 by default)

Returns:

(float) Distance under GNPR approach.

arbitragelab.codependence.gnpr_distance

Module Contents

Functions

`arbitragelab.codependence.gnpr_distance`