Neural Networks

Neural Networks

Choose the math engine

Before you start your work with neural networks, choose the device to be used for calculations. This can be a CPU or a GPU. Create a math engine for the required device and pass the reference to it when creating the network and the layers.

Data blobs

All data used in the network operation (inputs, outputs, trainable parameters) is stored in blobs. A blob is a 7-dimensional array, and each of its dimensions has a specific meaning:

BatchLength is a "time" axis, used to denote data sequences; it is mainly used in recurrent networks
BatchWidth corresponds to the batch, used to pass several independent objects together
ListSize is the dimensions for the objects that are connected (for example, pixels out of one image) but do not form a sequence
Height is the height of a matrix or an image
Width is the width of a matrix or an image
Depth is the width of a 3-dimensional image
Channels corresponds to channels for multi-channel image formats and is also used to work with one-dimensional vectors.

The blobs may contain one of the two types of data: float (CT_Float) and integer (CT_Int). Both data types are 32-bit.

If the data type is not specified directly anywhere in this documentation, that means float is used.

General principles

The layer concept

A layer is an element of the network that performs some operation: anything from the input data reshape or a simple math function calculation, up to convolution or LSTM (Long short-term memory).

If the operation needs input data, it will be taken from the layer input. Each layer input contains one data blob, and if several blobs are needed, the layer will have several inputs. Each layer input should be connected to another layer's output.

If the operation returns results that should be used by other layers, they will be passed to the layer outputs. Each layer output contains one data blob, so depending on the operation it performs the layer may have several outputs. Several other layer inputs may be connected to the same output, but you may not leave an output unconnected to any inputs.

In addition, the layer may have settings specified by the user before starting calculations, and trainable parameters that are optimized during network training.

The layers also have names that can be used to find a layer in the network. The name should be set at layer creation or before adding it to the network.

See below for the full list of available layers with links to the detailed descriptions.

CDnn class for the network

The neural network is implemented by a CDnn class. A neural network is a directed graph with the vertices corresponding to layers and the arcs corresponding to the connections along which the data is passed from one layer's output to another's input.

Each layer should be added to the network after you assign a unique name to it. A layer may not be connected to several networks at once.

Source layers are used to pass the data into the network. A source layer has no inputs and passes the data blob specified by the user to its only output.

Sink layers with no outputs are used to retrieve the result of the network operation. They provide a function that returns the blob with data.

After all the layers are added and connected the network may be set up for training.

Training the network

To train the network you will need:

a layer (or several layers) that would calculate the loss function to be optimized
additional source layers that contain the correct labels for input data and the object weights
the initializer that would be used to assign the values to the weights before starting to optimize them
the optimizer mechanism that will be used for training

Weights initialization

Before the first training iteration the layers' weights (trainable parameters) are initialized using the CDnnInitializer object. There are two implementations for it:

CDnnUniformInitializer generates the weights using a uniform distribution over a segment from GetLowerBound to GetUpperBound.
CDnnXavierInitializer generates the weights using the normal distribution N(0, 1/n) where n is the input size.
CDnnXavierUniformInitializer generates the weights using the uniform distribution U(-sqrt(1/n), sqrt(1/n)) where n is the input size.

To select the preferred initializer, create an instance of one of these classes and pass it to the network using the CDnn::SetInitializer method. The default initialization methods is Xavier.

The initializer is the same for all the network trainable weights, except for the free term vectors that are initialized with zeros.

Optimizers

The optimizer sets the rules to update the weights during training. It is represented by the CDnnSolver that has 4 implementations:

CDnnSimpleGradientSolver - gradient descent with momentum
CDnnAdaptiveGradientSolver - gradient descent with adaptive momentum (Adam)
CDnnNesterovGradientSolver - Adam with Nesterov momentum (Nadam)
CDnnLambGradientSolver - LAMB

To select the preferred optimizer, create an instance of one of these classes and pass it to the network using the CDnn::SetSolver method.

The additional settings for the optimizer are:

learning rate (CDnnSolver::SetLearningRate)
regularization factors (CDnnSolver::SetL2Regularization and CDnnSolver::SetL1Regularization)

Training iteration

After the initializer and the optimizer have been set, you may start the learning process. To do that, set the input data blobs for all source layers and call the CDnn::RunAndLearnOnce method.

The method call will perform three internal operations:

Reshape - calculates the size and allocates memory for the output blobs of every layer, using the source blobs' size.
RunOnce - performs all calculations on the source blob data.
BackwardAndLearnOnce - calculates the loss function gradient for all trainable weights and updates the trainable weights through backpropagation.

The learning process consists of many iterations, each calling CDnn::RunAndLearnOnce for new source data.

Running the network

Sometimes during learning you will need to get the network response without changing the current parameters, for example, on test data for validation. In this case, use the CDnn::RunOnce method, which, unlike CDnn::RunAndLearnOnce, does not calculate the gradients and update the trainable parameters. This method is also used for working with the trained network.

Serialization

Two classes are defined for serializing the network:

CArchiveFile represents the file used for serialization
CArchive represents the archive used to write and read from CArchiveFile

The serializing direction is determined by the settings with which the file and the archive instances are created:

to save the network into a file, create CArchiveFile with CArchive::store flag and an archive over it with CArchive::SD_Storing flag.
to read the network from the file, use CArchive::load and CArchive::SD_Loading flags instead.

Once the archive has been created, call the CDnn::Serialize method to serialize the network. The direction will be chosen automatically.

Sample code for saving the network

CRandom random( 0x123 );
CDnn net( random, GetDefaultCpuMathEngine() );

/*
... Build and train the network ...
*/

CArchiveFile file( "my_net.archive", CArchive::store );
CArchive archive( &file, CArchive::SD_Storing );
archive.Serialize( net );
archive.Close();
file.Close();

Using the network

// The math engine working on GPU that uses not more than 1GB GPU RAM
IMathEngine* gpuMathEngine = CreateGpuMathEngine( 1024 * 1024 * 1024, GetFmlExceptionHandler() );

{
    CRandom random( 0x123 );
    CDnn net( random, *gpuMathEngine );

    // Load the network
    {
      CArchiveFile file( "my_net.archive", CArchive::load );
      CArchive archive( &file, CArchive::SD_Loading );
      archive.Serialize( net );
      // file and archive will be closed in destructors
    }

    // The blob to store a single 32x32 RGB image
    CPtr<CDnnBlob> dataBlob = CDnnBlob::Create2DImageBlob( *gpuMathEngine, CT_Float, 1, 1, 32, 32, 3 );

    dataBlob->Fill( 0.5f ); // Filling with a constant value

    // Get the pointers to the source and the sink layers
    CPtr<CSourceLayer> src = CheckCast<CSourceLayer>( net.GetLayer( "source" ) );
    CPtr<CSinkLayer> sink = CheckCast<CSinkLayer>( net.GetLayer( "sink" ) );

    src->SetBlob( dataBlob ); // setting the input data
    net.RunOnce(); // running the network
    CPtr<CDnnBlob> resultBlob = sink->GetBlob(); // getting the response

    // Extract the data and put it in an array
    CArray<float> result;
    result.SetSize( resultBlob->GetDataSize() );
    resulBlob->CopyTo( result.GetPtr() );

    // Analyze the network response

    // Destroy all blobs and the network object
}

// Delete the engine after all blobs are deleted
delete gpuMathEngine;

The layers

CBaseLayer is the base class for common layer functionality
The layers used to pass the data to and from the network:
- CSourceLayer transmits a blob of user data into the network
- CSinkLayer is used to retrieve a blob of data with the network response
- CProblemSourceLayer transmits the data from IProblem into the network
- CFullyConnectedSourceLayer transmits the data from IProblem into the network, multiplying the vectors by a trainable weights matrix
- CDataLayer transmits a blob of fixed data into the network
CFullyConnectedLayer is the fully-connected layer
Activation functions:
- CLinearLayer - a linear activation function ax + b
- CELULayer - ELU activation function
- CReLULayer - ReLU activation function
- CLeakyReLULayer - LeakyReLU activation function
- CAbsLayer - abs(x) activation function
- CSigmoidLayer - sigmoid activation function
- CTanhLayer - tanh activation function
- CHardTanhLayer - HardTanh activation function
- CHardSigmoidLayer - HardSigmoid activation function
- CPowerLayer - pow(x, exp) activation function
- CHSwishLayer - h-swish activation function
- CGELULayer - x * sigmoid(1.702 * x) activation function
- CExpLayer - exp activation function
Convolution layers:
- CConvLayer - 2-dimensional convolution
  - CRleConvLayer - convolution for 2-dimensional images in RLE format
- C3dConvLayer - 3-dimensional convolution
- CTranposedConvLayer - transposed 2-dimensional convolution
- C3dTranposedConvLayer - transposed 3-dimensional convolution
- CChannelwiseConvLayer - channelwise convolution
- CTimeConvLayer - sequence convolution along the "time" axis
Pooling layers:
- CMaxPoolingLayer - 2-dimensional max pooling
- CMeanPoolingLayer - 2-dimensional mean pooling
- C3dMaxPoolingLayer - 3-dimensional max pooling
- C3dMeanPoolingLayer - 3-dimensional mean pooling
- CGlobalMaxPoolingLayer - max pooling over whole objects
- CMaxOverTimePoolingLayer - max pooling over sequences along the "time" axis
- CProjectionPoolingLayer - mean pooling along one of the blob dimensions
CSoftmaxLayer calculates softmax function
CDropoutLayer implements random dropout
CBatchNormalizationLayer implements batch normalization
CObjectNormalizationLayer implements normalization over the objects
CLrnLayer implements local response normalization
Elementwise operations with data blobs:
- CEltwiseSumLayer - elementwise sum
- CEltwiseSubLayer - elementwise sub
- CEltwiseMulLayer - elementwise product
- CEltwiseDivLayer - elementwise division
- CEltwiseMaxLayer - elementwise maximum
- CEltwiseNegMulLayer calculates the elementwise product of 1 - first input and the other inputs
Auxiliary operations:
- CTransformLayer changes the blob shape
- CTransposeLayer switches the blob dimensions
- CArgmaxLayer finds maximum values along the given dimension
- CImageResizeLayer changes the size of images in the blob
- CSubSequenceLayer extracts subsequences
- CDotProductLayer calculates the dot product of its inputs
- CAddToObjectLayer adds the content of one input to each of the objects of the other
- CMatrixMultiplicationLayer - mutiplication of two sets of matrices
- CCastLayer - data type conversion
- CInterpolationLayer - interpolation layer
- Blob concatenation:
  - CConcatChannelsLayer concatenates along the Channels dimension
  - CConcatDepthLayer concatenates along the Depth dimension
  - CConcatWidthLayer concatenates along the Width dimension
  - CConcatHeightLayer concatenates along the Height dimension
  - CConcatBatchWidthLayer concatenates along the BatchWidth dimension
  - CConcatObjectLayer concatenates the objects
- Blob splitting:
  - CSplitChannelsLayer splits along the Channels dimension
  - CSplitDepthLayer splits along the Depth dimension
  - CSplitWidthLayer splits along the Width dimension
  - CSplitHeightLayer splits along the Height dimension
  - CSplitListSizeLayer splits along the ListSize dimension
  - CSplitBatchWidthLayer splits along the BatchWidth dimension
  - CSplitBatchLengthLayer splits along the BatchLength dimension
- Working with pixel lists:
  - CPixelToImageLayer creates images from the pixel lists
  - CImageToPixelLayer extracts pixel lists from the images
- Repeating data:
  - CRepeatSequenceLayer repeats sequences several times
  - CUpsampling2DLayer scales up two-dimensional images
- CReorgLayer transforms a multi-channel image into several smaller images with more channels
- CSpaceToDepthLayer splits images into squared blocks and flattens each block
- CDepthToSpaceLayer transforms pixels of images into squared blocks
Loss functions:
- For binary classification:
  - CBinaryCrossEntropyLossLayer - cross-entropy
  - CHingeLossLayer - hinge loss function
  - CSquaredHingeLossLayer - modified squared hinge loss function
  - CBinaryFocalLossLayer - focal loss function (modified cross-entropy)
- For multi-class classification:
  - CCrossEntropyLossLayer - cross-entropy
  - CMultiHingeLossLayer - hinge loss function
  - CMultiSquaredHingeLossLayer - modified squared hinge loss function
  - CFocalLossLayer - focal loss function (modified cross-entropy)
- For regression:
  - CEuclideanLossLayer - Euclidean distance
  - CL1LossLayer - L1 distance
- Additionally:
  - CCenterLossLayer - the auxiliary center loss function that penalizes large variance inside a class
Working with discrete features:
- CMultichannelLookupLayer - vector representation of discrete features
- CAccumulativeLookupLayer - the sum of vector representations of a discrete feature
- CPositionalEmbeddingLayer - the vector representations of a position in sequence
- CEnumBinarizationLayer converts enumeration values to one-hot encoding
- CBitSetVectorizationLayer converts a bitset into a vector of ones and zeros
Recurrent layers:
- CLstmLayer implements long short-term memory (LSTM)
- CGruLayer implements a gated recurrent unit (GRU)
- CQrnnLayer implements a quasi-recurrent layer
- CIrnnLayer implements IRNN
- CIndRnnLayer implement IndRNN
Conditional random field (CRF):
- CCrfLayer represents a CRF
- CCrfLossLayer calculates the loss function for training CRF
- CBestSequenceLayer finds optimal sequences in the results of CRF processing
Connectionist temporal classification (CTC):
- CCtcLossLayer calculates the loss function
- CCtcDecodingLayer finds the optimal sequences in CTC response
Classification quality assessment:
- CAccuracyLayer calculates the proportion of the objects classified correctly
- CPrecisionRecallLayer calculates the proportion of correctly classified objects for each of the two classes in binary classification
- CConfusionMatrixLayer calculates the confusion matrix for multi-class classification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Neural Networks

Choose the math engine

Data blobs

General principles

The layer concept

CDnn class for the network

Training the network

Weights initialization

Optimizers

Training iteration

Running the network

Serialization

Sample code for saving the network

Using the network

The layers

Files

README.md

Latest commit

History

README.md

File metadata and controls

Neural Networks

Choose the math engine

Data blobs

General principles

The layer concept

CDnn class for the network

Training the network

Weights initialization

Optimizers

Training iteration

Running the network

Serialization

Sample code for saving the network

Using the network

The layers