arbitragelab.util.generate_dataset
This module generates synthetic classification dataset of INFORMED, REDUNDANT, and NOISE explanatory variables based on the book Machine Learning for Asset Manager (code snippet 6.1)
Module Contents
Functions
|
A function to generate synthetic classification data sets. |
- get_classification_data(n_features=100, n_informative=25, n_redundant=25, n_samples=10000, random_state=0, sigma=0.0)
A function to generate synthetic classification data sets.
- Parameters:
n_features – (int) Total number of features to be generated (i.e. informative + redundant + noisy).
n_informative – (int) Number of informative features.
n_redundant – (int) Number of redundant features.
n_samples – (int) Number of samples (rows) to be generate.
random_state – (int) Random seed.
sigma – (float) This argument is used to introduce substitution effect to the redundant features in the dataset by adding gaussian noise. The lower the value of sigma, the greater the substitution effect.
- Returns:
(pd.DataFrame, pd.Series) X and y as features and labels respectively.