arbitragelab.cointegration_approach.sparse_mr_portfolio
This module selects sparse mean-reverting portfolios out of an asset universe. The methods implemented in this module include the following:
Box-Tiao canonical decomposition.
Greedy search.
Semidefinite relaxation.
Graphical LASSO regression for sparse covariance selection.
Column-wise LASSO and multi-task LASSO regression for sparse VAR(1) coefficient matrix estimation.
Semidefinite programming approach to predictability optimization under a minimum volatility constraint.
Semidefinite programming approach to portmanteau statistics optimization under a minimum volatility constraint.
Semidefinite programming approach to crossing statistics optimization under a minimum volatility constraint.
Module Contents
Classes
Module for sparse mean reversion portfolio selection. |
- class SparseMeanReversionPortfolio(assets: pandas.DataFrame)
Module for sparse mean reversion portfolio selection.
- property assets: pandas.DataFrame
Getter for the asset price data.
- Returns:
(pd.DataFrame) The price history of each asset.
- property demeaned: pandas.DataFrame
Getter for the demeaned price data.
- Returns:
(pd.DataFrame) The processed price history of each asset with zero mean.
- property standardized: pandas.DataFrame
Getter for the standardized price data.
- Returns:
(pd.DataFrame) The standardized price history of each asset with zero mean and unit variance.
- static mean_rev_coeff(weights: numpy.array, assets: pandas.DataFrame, interval: str = 'D') Tuple[float, float]
Calculate the Ornstein-Uhlenbeck model mean reversion speed and half-life.
- Parameters:
weights – (np.array) The weightings for each asset.
assets – (pd.DataFrame) The price history of each asset.
interval – (str) The time interval, or the frequency, of the price data. Options are [‘D’, ‘M’, ‘Y’].
- Returns:
(float, float) Mean reversion coefficient; half-life of the OU process.
- autocov(nlags: int, symmetrize: bool = True, use_standardized: bool = True) numpy.array
Calculate the autocovariance matrix.
- Parameters:
nlags – Lag of autocovariance. If nlags = 0, return the covariance matrix.
symmetrize – (bool) If True, symmetrize the autocovariance matrix \(\frac{A^T + A}{2}\); otherwise, return the original autocovariance matrix.
use_standardized – (bool) If True, use standardized data; otherwise, use demeaned data.
- Returns:
(np.array) Autocovariance or covariance matrix.
- least_square_VAR_fit(use_standardized: bool = False) numpy.array
Calculate the least square estimate of the VAR(1) matrix.
- Parameters:
use_standardized – (bool) If True, use standardized data; otherwise, use demeaned data.
- Returns:
(np.array) Least square estimate of VAR(1) matrix.
- box_tiao(threshold: int = 7) numpy.array
Perform Box-Tiao canonical decomposition on the assets dataframe.
- Parameters:
threshold – (int) Round precision cutoff threshold. For example, a threshold of n means that a number less than \(10^{-n}\) will be rounded to zero.
- Returns:
(np.array) The weighting of each asset in the portfolio. There will be N decompositions for N assets, where each column vector corresponds to one portfolio. The order of the weightings correspond to the descending order of the eigenvalue.
- static greedy_search(cardinality: int, var_est: numpy.array, cov_est: numpy.array, threshold: int = 7, maximize: bool = False) numpy.array
Greedy search algorithm for sparse decomposition.
- Parameters:
cardinality – (int) Number of assets to include in the portfolio.
var_est – (np.array) Estimated VAR(1) coefficient matrix.
cov_est – (np.array) Estimated covariance matrix.
threshold – (int) Round precision cutoff threshold. For example, a threshold of n means that a number less than \(10^{-n}\) will be treated as zero.
maximize – (bool) If True, maximize predictability; otherwise, minimize predictability.
- Returns:
(np.array) Weight of each selected assets.
- sdp_predictability_vol(rho: float, variance: float, use_standardized: bool = True, verbose: bool = True, max_iter: int = 10000) numpy.array
Semidefinite relaxation optimization of predictability with a volatility threshold following the formulation of Cuturi and d’Aspremont (2015).
\begin{align*} \text{minimize } & \mathbf{Tr}(\gamma_1 \gamma_0^{-1} \gamma_1^T Y) + \rho \lVert Y \rVert_1 \\ \text{subject to } & \mathbf{Tr}(\gamma_0 Y) >= V \\ & \mathbf{Tr}(Y) = 1 \\ & Y \succeq 0 \end{align*}where \(\gamma_i\) is the lag-\(k\) sample autocovariance (when \(k=0\), it is the sample covariance). \(V\) is the variance lower bound of the portfolio.
- Parameters:
rho – (float) Regularization parameter of the \(l_1\)-norm in the objective function.
variance – (float) Variance lower bound for the portfolio.
verbose – (bool) If True, print the SDP solver iteration details for debugging; otherwise, suppress the debug output.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
max_iter – (int) Set number of iterations for the SDP solver.
- Returns:
(np.array) The optimized matrix \(Y\).
- sdp_portmanteau_vol(rho: float, variance: float, nlags: int = 3, use_standardized: bool = True, verbose: bool = True, max_iter: int = 10000) numpy.array
Semidefinite relaxation optimization of portmanteau statistic with a volatility threshold following the formulation of Cuturi and d’Aspremont (2015).
\begin{align*} \text{minimize } & \sum_{i=1}^p \mathbf{Tr}(\gamma_i Y)^2 + \rho \lVert Y \rVert_1 \\ \text{subject to } & \mathbf{Tr}(\gamma_0 Y) >= V \\ & \mathbf{Tr}(Y) = 1 \\ & Y \succeq 0 \end{align*}where \(\gamma_i\) is the lag-\(k\) sample autocovariance (when \(k=0\), it is the sample covariance). \(V\) is the variance lower bound of the portfolio.
- Parameters:
rho – (float) Regularization parameter of the \(l_1\)-norm in the objective function.
variance – (float) Variance lower bound for the portfolio.
nlags – (int) Order of portmanteau statistic \(p\).
verbose – (bool) If True, print the SDP solver iteration details for debugging; otherwise, suppress the debug output.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
max_iter – (int) Set number of iterations for the SDP solver.
- Returns:
(np.array) The optimized matrix \(Y\).
- sdp_crossing_vol(rho: float, mu: float, variance: float, nlags: int = 3, use_standardized: bool = True, verbose: bool = True, max_iter: int = 10000) numpy.array
Semidefinite relaxation optimization of crossing statistic with a volatility threshold following the formulation of Cuturi and d’Aspremont (2015).
\begin{align*} \text{minimize } & \mathbf{Tr}(\gamma_1 Y) + \mu \sum_{i=2}^p \mathbf{Tr}(\gamma_i Y)^2 + \rho \lVert Y \rVert_1 \\ \text{subject to } & \mathbf{Tr}(\gamma_0 Y) >= V \\ & \mathbf{Tr}(Y) = 1 \\ & Y \succeq 0 \end{align*}where \(\gamma_i\) is the lag-\(k\) sample autocovariance (when \(k=0\), it is the sample covariance). \(V\) is the variance lower bound of the portfolio.
- Parameters:
rho – (float) Regularization parameter of the \(l_1\)-norm in the objective function.
mu – (float) Regularization parameter of higher-order autocovariance.
variance – (float) Variance lower bound for the portfolio.
nlags – (int) Order of portmanteau statistic \(p\).
verbose – (bool) If True, print the SDP solver iteration details for debugging; otherwise, suppress the debug output.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
max_iter – (int) Set number of iterations for the SDP solver.
- Returns:
(np.array) The optimized matrix \(Y\).
- LASSO_VAR_tuning(sparsity: float, multi_task_lasso: bool = False, alpha_min: float = -5.0, alpha_max: float = 0.0, n_alphas: int = 100, max_iter: int = 1000, use_standardized: bool = True) float
Tune the l1-regularization coefficient (alpha) of LASSO regression for a sparse estimate of the VAR(1) matrix.
- Parameters:
sparsity – (float) Percentage of zeros required in the VAR(1) matrix.
multi_task_lasso – (bool) If True, use multi-task LASSO for sparse estimate, where the LASSO will yield full columns of zeros; otherwise, do LASSO column-wise.
alpha_min – (float) Minimum l1-regularization coefficient.
alpha_max – (float) Maximum l1-regularization coefficient.
n_alphas – (int) Number of l1-regularization coefficient for the parameter search.
max_iter – (int) Maximum number of iterations for LASSO regression.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
- Returns:
(float) Minimum alpha that satisfies the sparsity requirement.
- LASSO_VAR_fit(alpha: float, multi_task_lasso: bool = True, max_iter: int = 1000, threshold: int = 10, use_standardized: bool = True) numpy.array
Fit the LASSO model with the designated alpha for a sparse VAR(1) coefficient matrix estimate.
- Parameters:
alpha – (float) Optimized l1-regularization coefficient.
multi_task_lasso – (bool) If True, use multi-task LASSO for sparse estimate, where the LASSO will yield full columns of zeros; otherwise, do LASSO column-wise.
max_iter – (int) Maximum number of iterations of LASSO regression.
threshold – (int) Round precision cutoff threshold. For example, a threshold of n means that a number less than \(10^{-n}\) will be treated as zero.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
- Returns:
(np.array) Sparse estimate of VAR(1) matrix.
- covar_sparse_tuning(max_iter: int = 1000, alpha_min: float = 0.0, alpha_max: float = 1.0, n_alphas: int = 100, clusters: int = 3, use_standardized: bool = True) float
Tune the regularization parameter (alpha) of the graphical LASSO model for a sparse estimate of the covariance matrix.
- Parameters:
max_iter – (int) Maximum number of iterations for graphical LASSO fit.
alpha_min – (float) Minimum regularization parameter.
alpha_max – (float) Maximum regularization parameter.
n_alphas – (int) Number of regularization parameter for parameter search.
clusters – (int) Number of smaller clusters desired from the precision matrix. The higher the number, the larger the best alpha will be. This parameter cannot exceed the number of assets.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
- Returns:
(float) Optimal alpha to split the graph representation of the inverse covariance matrix into designated number of clusters.
- covar_sparse_fit(alpha: float, max_iter: int = 1000, threshold: int = 10, use_standardized: bool = True) Tuple[numpy.array, numpy.array]
Fit the graphical LASSO model using the optimized alpha for a sparse covariance matrix estimate.
- Parameters:
alpha – (float) Optimized regularization coefficient of graphical LASSO.
max_iter – (int) Maximum number of iterations for graphical LASSO fit.
threshold – (int) Round precision cutoff threshold. For example, a threshold of n means that a number less than \(10^{-n}\) will be treated as zero.
use_standardized – (bool) If True, use standardized data for optimization; otherwise, use de-meaned data.
- Returns:
(np.array, np.array) Sparse estimate of covariance matrix; inverse of the sparse covariance matrix, i.e. precision matrix as graph representation.
- find_clusters(precision_matrix: numpy.array, var_estimate: numpy.array) networkx.Graph
Use the intersection of the graph \(\Gamma^{-1}\) and the graph \(A^T A\) to pinpoint the clusters of assets to perform greedy search or semidefinite relaxation on.
- Parameters:
precision_matrix – (np.array) The inverse of the estimated sparse covariance matrix.
var_estimate – (np.array) The sparse estimate of VAR(1) coefficient matrix.
- Returns:
(networkx.Graph) A graph representation of the clusters.
- sparse_eigen_deflate(sdp_result: numpy.array, cardinality: int, tol: float = 1e-06, max_iter: int = 100, verbose: bool = True) numpy.array
Calculate the leading sparse eigenvector of the SDP result. Deflate the original leading eigenvector to the input cardinality using Truncated Power method (Yuan and Zhang, 2013).
The Truncated Power method is ported from the Matlab code provided by the original authors.
- Parameters:
sdp_result – (np.array) The optimization result from semidefinite relaxation.
cardinality – (int) Desired cardinality of the sparse eigenvector.
tol – (float) Convergence tolerance of the Truncated Power method.
max_iter – (int) Maximum number of iterations for Truncated Power method.
verbose – (bool) If True, print the Truncated Power method iteration details; otherwise, suppress the debug output.
- Returns:
(np.array) Leading sparse eigenvector of the SDP result.
- static check_symmetric(matrix: numpy.array, rtol: float = 1e-05, atol: float = 1e-08) bool
Check if a matrix is symmetric.
- Parameters:
matrix – (np.array) The matrix under inspection.
rtol – (float) Relative tolerance for np.allclose.
atol – (float) Absolute tolerance for np.allclose.
- Returns:
(bool) True if the matrix symmetric, False otherwise.
- static is_semi_pos_def(matrix: numpy.array) bool
Check if a matrix is positive definite.
- Parameters:
matrix – (np.array) The matrix under inspection.
- Returns:
(bool) True if the matrix is positive definite, False otherwise.