Warning

In order to use this module, you should additionally install TensorFlow v2.8.0. and Keras v2.3.1. For more details, please visit our ArbitrageLab installation guide.

Neural Networks



Introduction

Neural networks exist in a variety of different architectures and have been implemented in numerous financial applications. However, the most widely used architecture for the analysis of stock markets is known as the MLP neural network.

A generic neural network is built with at least three layers comprising an input, hidden and output layer. The input layer structure is determined by the number of explanatory variables depicted as nodes in the architecture. The hidden layer represents the capacity of complexity in which the model can support or ‘fit’. Moreover, both the input and hidden layers contain what is known as a bias node. The value attributed to this node is a fixed value and is equal to one. Its purpose is similar to the functionality of which the intercept serves in more traditional regression models. The final and third layer of a standard neural network, the output layer, is governed by a structure of nodes corresponding to a number of response variables. Furthermore, each of these layers is linked via a node-to-node interconnecting system enabling a functional network of ‘neurons’.

On the whole, neural networks learn and identify relationships in data using neurons similar to how the human brain works. They are a non-parametric tool and use a series of waves and neurons to capture even very complex relationships between the predictor inputs and the target variables. They can overcome messy data such as noise and imprecision in the measurement system. Neural networks are appropriate for regression as well as classification, time series analysis and clustering.

Multi Layer Perceptron

The MLP allows the user to select a set of activation functions to explore including identity, logistic, hyperbolic tangent, negative exponential and sine. These activation functions can be used for both hidden and output neurons. MLP also trains networks using a variety of algorithms such as gradient descent and conjugant descent.

Implementation

class MultiLayerPerceptron(frame_size: int, hidden_size: int = 2, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'adam', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'relu', output_layer_act_func: str = 'linear')

Vanilla Multi Layer Perceptron implementation.

__init__(frame_size: int, hidden_size: int = 2, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'adam', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'relu', output_layer_act_func: str = 'linear')

Initialization of variables.

Parameters:
  • frame_size – (int) The size of the input dataset.

  • hidden_size – (int) Number of hidden units.

  • num_outputs – (int) Number of output units.

  • loss_fn – (str) String name of loss function to be used during training and testing.

  • optmizer – (str) String (name of optimizer) or optimizer instance.

  • metrics – (str) Metric to be use when evaluating the model during training and testing.

  • hidden_layer_activation_function – (str) String name of the activation function used by the hidden layer.

  • output_layer_act_func – (str) String name of the activation function used by the output layer.

build()

Builds and compiles model architecture.

Returns:

(Model) Resulting model.

Example

# Import package necessary for splitting the dataset.
from sklearn.model_selection import train_test_split
# Import package to generate a synthetic dataset.
from sklearn.datasets import make_regression
# Import package to quantify final prediction score.
from sklearn.metrics import r2_score

# Import the mlp implementation from arbitragelab.
from arbitragelab.ml_approach.neural_networks import MultiLayerPerceptron

# Generate 500 samples with 100 features, to be used as our dataset.
X, y = make_regression(500)

# Get number of samples to be given to the network.
_, frame_size = X.shape

# Initialize a basic regression neural network.
regressor = MultiLayerPerceptron(frame_size, num_outputs=1, loss_fn="mean_squared_error",
                                 optmizer="adam", metrics=[],
                                 hidden_layer_activation_function="relu",
                                 output_layer_act_func="linear")

# This will compile the keras model structure implemented.
regressor.build()

# Will supply information about the structure of the model.
regressor.summary()

# Prepare dataset for training and testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=False)

# Fit compiled model with training data.
regressor.fit(X_train, y_train,
              batch_size=20, epochs=400,
              verbose=1)

# Plot loss vs epochs.
regressor.plot_loss()

# Finally use the fitted model to predict test set.
predictions = r2_score(y_test, regressor.predict(X_test))

Recurrent Neural Network (LSTM)

Recurrent neural networks (RNNs) are neural networks that leverage backpropagation through time (BPTT) algorithm to determine the gradients. Through this process, RNNs tend to run into two problems, known as exploding gradients and vanishing gradients. These issues are defined by the size of the gradient, which is the slope of the loss function along the error curve. When the gradient is too small, it continues to become smaller, updating the weight parameters until they become insignificant. When that occurs, the algorithm is no longer learning. Exploding gradients occur when the gradient is too large, creating an unstable model.

Long Short Term Memory (LSTM) was first introduced by (Hochreiter and Schmidhuber 1997) as a solution to overcome error back-flow problems in RNN. An LSTM is capable of retaining and propagating information through the dynamics of the LSTM memory cell, hidden state, and gating mechanism.

../_images/rnn_lstm_example.png

Visual interpretation of the internal structures of RNNs and LSTMs. (Olah 2015).

Implementation

class RecurrentNeuralNetwork(input_shape: tuple, hidden_size: int = 10, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'adam', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'relu', output_layer_act_func: str = 'linear')

Recurrent Neural Network implementation.

__init__(input_shape: tuple, hidden_size: int = 10, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'adam', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'relu', output_layer_act_func: str = 'linear')

Initialization of Variables.

Parameters:
  • input_shape – (tuple) Three dimensional tuple explaining the structure of the windowed data. Ex; (No_of_samples, Time_steps, No_of_features).

  • hidden_size – (int) Number of hidden units.

  • num_outputs – (int) Number of output units.

  • loss_fn – (str) String name of loss function to be used during training and testing.

  • optmizer – (str) String (name of optimizer) or optimizer instance.

  • metrics – (str) Metric to be use when evaluating the model during training and testing.

  • hidden_layer_activation_function – (str) String name of the activation function used by the hidden layer.

  • output_layer_act_func – (str) String name of the activation function used by the output layer.

build()

Builds and compiles model architecture.

Returns:

(Model) Resulting model.

Example

# Import package necessary for splitting the dataset.
from sklearn.model_selection import train_test_split
# Import package to generate a synthetic dataset.
from sklearn.datasets import make_regression
# Import package to quantify final prediction score.
from sklearn.metrics import r2_score

# Import the rnn implementation from arbitragelab.
from arbitragelab.ml_approach.neural_networks import RecurrentNeuralNetwork

# Generate 500 samples with 100 features, to be used as our dataset.
X, y = make_regression(500)

# Prepare dataset for training and testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=False)

n_features = 1

# Reshape from [samples, timesteps] into [samples, timesteps, features].
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], n_features))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], n_features))

_, frame_size, no_features = X_train.shape

# Initialize a basic regression recurrent neural network.
regressor = RecurrentNeuralNetwork((frame_size, no_features), num_outputs=1,
                                   loss_fn="mean_squared_error", optmizer="adam",
                                   metrics=[], hidden_layer_activation_function="relu",
                                   output_layer_act_func="linear")

# This will compile the keras model structure implemented.
regressor.build()

# Will supply information about the structure of the model.
regressor.summary()

# Fit compiled model with training data.
regressor.fit(X_train, y_train,
              batch_size=20, epochs=400,
              verbose=1)

# Plot loss vs epochs.
regressor.plot_loss()

# Finally use the fitted model to predict test set.
predictions = r2_score(y_test, regressor.predict(X_test))

Higher Order Neural Network

As explained by (Giles and Maxwell 1987), HONNs exhibit adequate learning and storage capabilities due to the fact that the order of the network can be structured in a manner which resembles the order of the problem.

Although the extent of their use in finance has so far been limited, (Knowles et al. 2009) show that, with shorter computational times and limited input variables, ‘the best HONN models show a profit increase over the MLP of around 8%’.

An example showing the capability of HONNs, is the XOR problem. The Exclusive OR problem could not be solved with a network without a hidden layer or by a single layer of first-order units, as it is not linearly separable. However, the same problem is easily solved if the patterns are represented in three dimensions in terms of an enhanced representation (Pao, 1989), by just using a single layer network with second-order terms.

../_images/mlp_loss_xor.png ../_images/mlp_decision_region_xor.png

MLP trying to solve the Exclusive OR problem.

../_images/honn_loss_xor.png ../_images/honn_decision_region_xor.png

HONN output.

Typically HONNs are split into two types; the first type uses feature engineering to expand the input dataset in representing the higher-order relationships in the original dataset. The second type uses architectural modifications to augment the ability of the network to find higher-order relations in the dataset.

../_images/honn_types.png

HONNs can be classified into single and multiple layer structures, as explained by (Liatsis et al. 2009).

Implementation

class FeatureExpander(methods: list = [], n_orders: int = 1)

Higher-order term Feature Expander implementation. The implementation consists of two major parts. The first part consists of using a collection of orthogonal polynomials’ coefficients, ordered from lowest order term to highest. The implemented series are [‘chebyshev’, ‘legendre’, ‘laguerre’, ‘power’] polynomials. The second part is a combinatorial version of feature crossing, which involves the generation of feature collections of the n order and multiplying them together. This can be used by adding [‘product’] in the ‘methods’ parameter in the constructor.

__init__(methods: list = [], n_orders: int = 1)

Initializes main variables.

Parameters:
  • methods – (list) Possible expansion methods are [‘chebyshev’, ‘legendre’, ‘laguerre’, ‘power’, ‘product’].

  • n_orders – (int) Number of orders.

fit(frame: DataFrame)

Stores the dataset inside the class object.

Parameters:

frame – (pd.DataFrame) Dataset to store.

transform() DataFrame

Returns the original dataframe with features requested from the ‘methods’ parameter in the constructor.

Returns:

(pd.DataFrame) Original DataFrame with the expanded values appended to it.

Example

# Import package necessary for splitting the dataset.
from sklearn.model_selection import train_test_split
# Import package to generate a synthetic dataset.
from sklearn.datasets import make_regression
# Import package to quantify final prediction score.
from sklearn.metrics import r2_score

# Import the feature expander implementation from arbitragelab.
from arbitragelab.ml_approach.feature_expander import FeatureExpander

# Import the mlp implementation from arbitragelab.
from arbitragelab.ml_approach.neural_networks import MultiLayerPerceptron

# Generate 500 samples with 100 features, to be used as our dataset.
X, y = make_regression(500)

expanded_X = FeatureExpander(methods=['product', 'power'], n_orders=2).fit(X).transform()

# Get number of samples to be given to the network.
n_frames, frame_size = expanded_X.shape

# Initialize a basic regression neural network.
regressor = MultiLayerPerceptron(frame_size, num_outputs=1, loss_fn="mean_squared_error",
                                 optmizer="adam", metrics=[],
                                 hidden_layer_activation_function="relu",
                                 output_layer_act_func="linear")

# This will compile the keras model structure implemented.
regressor.build()

# Will supply information about the structure of the model.
regressor.summary()

# Prepare dataset for training and testing.
X_train, X_test, y_train, y_test = train_test_split(expanded_X, y, test_size=0.3, shuffle=False)

# Fit compiled model with training data.
regressor.fit(X_train, y_train,
              batch_size=20, epochs=100,
              verbose=1)

# Plot loss vs epochs.
regressor.plot_loss()

# Finally use the fitted model to predict test set.
predictions = r2_score(y_test, regressor.predict(X_test))

Multiple Layer NNs (Ghazali et al. 2009)

Multilayered HONNs incorporate hidden layers, in addition to the output layer. A popular example of such structures is the sigma-pi network, which consists of layers of sigma-pi units (Rumelhart, Hinto & Williams, 1986). A sigma-pi unit consists of a summing unit connected to a number of product units, whose order is determined by the number of input connections. Another architecture that belongs to this category is the pi-sigma network (Shin & Ghosh, 1992). This consists of a layer of summing units, connected to a single product unit. The output of the product unit is usually passed through a nonlinear transfer function.

The main difference between the pi-sigma and the sigma-pi networks is that the former utilise a smaller number of weights, however, they are not universal approximators. To address this disadvantage, (Shin & Ghosh 1991) proposed an extension to the pi-sigma network, the so-called ridge polynomial neural network (RPN), which consists of a number of increasing order pi-sigma units. Most of the above networks have one layer of trainable weights, and hence simple weights updating procedures can be used for their training.

../_images/pi_sigma_nn.png

Visual representation of the Pi-Sigma Neural Network architecture. (Ghazali, R. and Al-Jumeily, D., 2009)

Implementation

class PiSigmaNeuralNetwork(frame_size: int, hidden_size: int = 2, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'sgd', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'linear', output_layer_act_func: str = 'sigmoid')

Pi Sigma Neural Network implementation.

__init__(frame_size: int, hidden_size: int = 2, num_outputs: int = 1, loss_fn: str = 'mean_squared_error', optmizer: str = 'sgd', metrics: str = 'accuracy', hidden_layer_activation_function: str = 'linear', output_layer_act_func: str = 'sigmoid')

Initialization of variables.

Parameters:
  • frame_size – (int) The size of the input dataset.

  • hidden_size – (int) Number of hidden units.

  • num_outputs – (int) Number of output units.

  • loss_fn – (str) String name of loss function to be used during training and testing.

  • optmizer – (str) String (name of optimizer) or optimizer instance.

  • metrics – (str) Metric to be use when evaluating the model during training and testing.

  • hidden_layer_activation_function – (str) String name of the activation function used by the hidden layer.

  • output_layer_act_func – (str) String name of the activation function used by the output layer.

build()

Builds and compiles model architecture.

Returns:

(Model) Resulting model.

Example

# Import package necessary for splitting the dataset.
from sklearn.model_selection import train_test_split
# Import package to generate a synthetic dataset.
from sklearn.datasets import make_regression
# Import package to quantify final prediction score.
from sklearn.metrics import r2_score

# Import the psnn implementation from arbitragelab.
from arbitragelab.ml_approach.neural_networks import PiSigmaNeuralNetwork

# Generate 500 samples with 100 features, to be used as our dataset.
X, y = make_regression(500)

# Get number of samples to be given to the network.
n_frames, frame_size = X.shape

# Initialize a basic regression pi sigma neural network.
regressor = PiSigmaNeuralNetwork(frame_size, num_outputs=1, loss_fn="mean_squared_error",
                                 optmizer="adam", metrics=[],
                                 hidden_layer_activation_function="relu",
                                 output_layer_act_func="linear")

# This will compile the keras model structure implemented.
regressor.build()

# Will supply information about the structure of the model.
regressor.summary()

# Prepare dataset for training and testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=False)

# Fit compiled model with training data.
regressor.fit(X_train, y_train,
              batch_size=20, epochs=100,
              verbose=1)

# Plot loss vs epochs.
regressor.plot_loss()

# Finally use the fitted model to predict test set.
predictions = r2_score(y_test, regressor.predict(X_test))

Presentation Slides


References