Before you start your work with neural networks, choose the device to be used for calculations. This can be a CPU or a GPU. Create a math engine for the required device and pass the reference to it when creating the network and the layers.
All data used in the network operation (inputs, outputs, trainable parameters) is stored in blobs. A blob is a 7-dimensional array, and each of its dimensions has a specific meaning:
BatchLength
is a "time" axis, used to denote data sequences; it is mainly used in recurrent networksBatchWidth
corresponds to the batch, used to pass several independent objects togetherListSize
is the dimensions for the objects that are connected (for example, pixels out of one image) but do not form a sequenceHeight
is the height of a matrix or an imageWidth
is the width of a matrix or an imageDepth
is the width of a 3-dimensional imageChannels
corresponds to channels for multi-channel image formats and is also used to work with one-dimensional vectors.
The blobs may contain one of the two types of data: float (CT_Float
) and integer (CT_Int
). Both data types are 32-bit.
If the data type is not specified directly anywhere in this documentation, that means float
is used.
A layer is an element of the network that performs some operation: anything from the input data reshape or a simple math function calculation, up to convolution or LSTM (Long short-term memory).
If the operation needs input data, it will be taken from the layer input. Each layer input contains one data blob, and if several blobs are needed, the layer will have several inputs. Each layer input should be connected to another layer's output.
If the operation returns results that should be used by other layers, they will be passed to the layer outputs. Each layer output contains one data blob, so depending on the operation it performs the layer may have several outputs. Several other layer inputs may be connected to the same output, but you may not leave an output unconnected to any inputs.
In addition, the layer may have settings specified by the user before starting calculations, and trainable parameters that are optimized during network training.
The layers also have names that can be used to find a layer in the network. The name should be set at layer creation or before adding it to the network.
See below for the full list of available layers with links to the detailed descriptions.
The neural network is implemented by a CDnn class. A neural network is a directed graph with the vertices corresponding to layers and the arcs corresponding to the connections along which the data is passed from one layer's output to another's input.
Each layer should be added to the network after you assign a unique name to it. A layer may not be connected to several networks at once.
Source layers are used to pass the data into the network. A source layer has no inputs and passes the data blob specified by the user to its only output.
Sink layers with no outputs are used to retrieve the result of the network operation. They provide a function that returns the blob with data.
After all the layers are added and connected the network may be set up for training.
To train the network you will need:
- a layer (or several layers) that would calculate the loss function to be optimized
- additional source layers that contain the correct labels for input data and the object weights
- the initializer that would be used to assign the values to the weights before starting to optimize them
- the optimizer mechanism that will be used for training
Before the first training iteration the layers' weights (trainable parameters) are initialized using the CDnnInitializer
object. There are two implementations for it:
CDnnUniformInitializer
generates the weights using a uniform distribution over a segment fromGetLowerBound
toGetUpperBound
.CDnnXavierInitializer
generates the weights using the normal distributionN(0, 1/n)
wheren
is the input size.CDnnXavierUniformInitializer
generates the weights using the uniform distributionU(-sqrt(1/n), sqrt(1/n))
wheren
is the input size.
To select the preferred initializer, create an instance of one of these classes and pass it to the network using the CDnn::SetInitializer
method. The default initialization methods is Xavier
.
The initializer is the same for all the network trainable weights, except for the free term vectors that are initialized with zeros.
The optimizer sets the rules to update the weights during training. It is represented by the CDnnSolver
that has 4 implementations:
CDnnSimpleGradientSolver
- gradient descent with momentumCDnnAdaptiveGradientSolver
- gradient descent with adaptive momentum (Adam)CDnnNesterovGradientSolver
- Adam with Nesterov momentum (Nadam)CDnnLambGradientSolver
- LAMB
To select the preferred optimizer, create an instance of one of these classes and pass it to the network using the CDnn::SetSolver
method.
The additional settings for the optimizer are:
- learning rate (
CDnnSolver::SetLearningRate
) - regularization factors (
CDnnSolver::SetL2Regularization
andCDnnSolver::SetL1Regularization
)
After the initializer and the optimizer have been set, you may start the learning process. To do that, set the input data blobs for all source layers and call the CDnn::RunAndLearnOnce
method.
The method call will perform three internal operations:
Reshape
- calculates the size and allocates memory for the output blobs of every layer, using the source blobs' size.RunOnce
- performs all calculations on the source blob data.BackwardAndLearnOnce
- calculates the loss function gradient for all trainable weights and updates the trainable weights through backpropagation.
The learning process consists of many iterations, each calling CDnn::RunAndLearnOnce
for new source data.
Sometimes during learning you will need to get the network response without changing the current parameters, for example, on test data for validation. In this case, use the CDnn::RunOnce
method, which, unlike CDnn::RunAndLearnOnce
, does not calculate the gradients and update the trainable parameters. This method is also used for working with the trained network.
Two classes are defined for serializing the network:
CArchiveFile
represents the file used for serializationCArchive
represents the archive used to write and read fromCArchiveFile
The serializing direction is determined by the settings with which the file and the archive instances are created:
- to save the network into a file, create
CArchiveFile
withCArchive::store
flag and an archive over it withCArchive::SD_Storing
flag. - to read the network from the file, use
CArchive::load
andCArchive::SD_Loading
flags instead.
Once the archive has been created, call the CDnn::Serialize
method to serialize the network. The direction will be chosen automatically.
See also more details about the classes used for serialization.
CRandom random( 0x123 );
CDnn net( random, GetDefaultCpuMathEngine() );
/*
... Build and train the network ...
*/
CArchiveFile file( "my_net.archive", CArchive::store );
CArchive archive( &file, CArchive::SD_Storing );
archive.Serialize( net );
archive.Close();
file.Close();
// The math engine working on GPU that uses not more than 1GB GPU RAM
IMathEngine* gpuMathEngine = CreateGpuMathEngine( 1024 * 1024 * 1024, GetFmlExceptionHandler() );
{
CRandom random( 0x123 );
CDnn net( random, *gpuMathEngine );
// Load the network
{
CArchiveFile file( "my_net.archive", CArchive::load );
CArchive archive( &file, CArchive::SD_Loading );
archive.Serialize( net );
// file and archive will be closed in destructors
}
// The blob to store a single 32x32 RGB image
CPtr<CDnnBlob> dataBlob = CDnnBlob::Create2DImageBlob( *gpuMathEngine, CT_Float, 1, 1, 32, 32, 3 );
dataBlob->Fill( 0.5f ); // Filling with a constant value
// Get the pointers to the source and the sink layers
CPtr<CSourceLayer> src = CheckCast<CSourceLayer>( net.GetLayer( "source" ) );
CPtr<CSinkLayer> sink = CheckCast<CSinkLayer>( net.GetLayer( "sink" ) );
src->SetBlob( dataBlob ); // setting the input data
net.RunOnce(); // running the network
CPtr<CDnnBlob> resultBlob = sink->GetBlob(); // getting the response
// Extract the data and put it in an array
CArray<float> result;
result.SetSize( resultBlob->GetDataSize() );
resulBlob->CopyTo( result.GetPtr() );
// Analyze the network response
// Destroy all blobs and the network object
}
// Delete the engine after all blobs are deleted
delete gpuMathEngine;
- CBaseLayer is the base class for common layer functionality
- The layers used to pass the data to and from the network:
- CSourceLayer transmits a blob of user data into the network
- CSinkLayer is used to retrieve a blob of data with the network response
- CProblemSourceLayer transmits the data from
IProblem
into the network - CFullyConnectedSourceLayer transmits the data from
IProblem
into the network, multiplying the vectors by a trainable weights matrix - CDataLayer transmits a blob of fixed data into the network
- CFullyConnectedLayer is the fully-connected layer
- Activation functions:
- CLinearLayer - a linear activation function
ax + b
- CELULayer -
ELU
activation function - CReLULayer -
ReLU
activation function - CLeakyReLULayer -
LeakyReLU
activation function - CAbsLayer -
abs(x)
activation function - CSigmoidLayer -
sigmoid
activation function - CTanhLayer -
tanh
activation function - CHardTanhLayer -
HardTanh
activation function - CHardSigmoidLayer -
HardSigmoid
activation function - CPowerLayer -
pow(x, exp)
activation function - CHSwishLayer -
h-swish
activation function - CGELULayer -
x * sigmoid(1.702 * x)
activation function - CExpLayer -
exp
activation function
- CLinearLayer - a linear activation function
- Convolution layers:
- CConvLayer - 2-dimensional convolution
- CRleConvLayer - convolution for 2-dimensional images in RLE format
- C3dConvLayer - 3-dimensional convolution
- CTranposedConvLayer - transposed 2-dimensional convolution
- C3dTranposedConvLayer - transposed 3-dimensional convolution
- CChannelwiseConvLayer - channelwise convolution
- CTimeConvLayer - sequence convolution along the "time" axis
- CConvLayer - 2-dimensional convolution
- Pooling layers:
- CMaxPoolingLayer - 2-dimensional max pooling
- CMeanPoolingLayer - 2-dimensional mean pooling
- C3dMaxPoolingLayer - 3-dimensional max pooling
- C3dMeanPoolingLayer - 3-dimensional mean pooling
- CGlobalMaxPoolingLayer - max pooling over whole objects
- CMaxOverTimePoolingLayer - max pooling over sequences along the "time" axis
- CProjectionPoolingLayer - mean pooling along one of the blob dimensions
- CSoftmaxLayer calculates softmax function
- CDropoutLayer implements random dropout
- CBatchNormalizationLayer implements batch normalization
- CObjectNormalizationLayer implements normalization over the objects
- CLrnLayer implements local response normalization
- Elementwise operations with data blobs:
- CEltwiseSumLayer - elementwise sum
- CEltwiseSubLayer - elementwise sub
- CEltwiseMulLayer - elementwise product
- CEltwiseDivLayer - elementwise division
- CEltwiseMaxLayer - elementwise maximum
- CEltwiseNegMulLayer calculates the elementwise product of
1 - first input
and the other inputs
- Auxiliary operations:
- CTransformLayer changes the blob shape
- CTransposeLayer switches the blob dimensions
- CArgmaxLayer finds maximum values along the given dimension
- CImageResizeLayer changes the size of images in the blob
- CSubSequenceLayer extracts subsequences
- CDotProductLayer calculates the dot product of its inputs
- CAddToObjectLayer adds the content of one input to each of the objects of the other
- CMatrixMultiplicationLayer - mutiplication of two sets of matrices
- CCastLayer - data type conversion
- CInterpolationLayer - interpolation layer
- Blob concatenation:
- CConcatChannelsLayer concatenates along the Channels dimension
- CConcatDepthLayer concatenates along the Depth dimension
- CConcatWidthLayer concatenates along the Width dimension
- CConcatHeightLayer concatenates along the Height dimension
- CConcatBatchWidthLayer concatenates along the BatchWidth dimension
- CConcatObjectLayer concatenates the objects
- Blob splitting:
- CSplitChannelsLayer splits along the Channels dimension
- CSplitDepthLayer splits along the Depth dimension
- CSplitWidthLayer splits along the Width dimension
- CSplitHeightLayer splits along the Height dimension
- CSplitListSizeLayer splits along the ListSize dimension
- CSplitBatchWidthLayer splits along the BatchWidth dimension
- CSplitBatchLengthLayer splits along the BatchLength dimension
- Working with pixel lists:
- CPixelToImageLayer creates images from the pixel lists
- CImageToPixelLayer extracts pixel lists from the images
- Repeating data:
- CRepeatSequenceLayer repeats sequences several times
- CUpsampling2DLayer scales up two-dimensional images
- CReorgLayer transforms a multi-channel image into several smaller images with more channels
- CSpaceToDepthLayer splits images into squared blocks and flattens each block
- CDepthToSpaceLayer transforms pixels of images into squared blocks
- Loss functions:
- For binary classification:
- CBinaryCrossEntropyLossLayer - cross-entropy
- CHingeLossLayer - hinge loss function
- CSquaredHingeLossLayer - modified squared hinge loss function
- CBinaryFocalLossLayer - focal loss function (modified cross-entropy)
- For multi-class classification:
- CCrossEntropyLossLayer - cross-entropy
- CMultiHingeLossLayer - hinge loss function
- CMultiSquaredHingeLossLayer - modified squared hinge loss function
- CFocalLossLayer - focal loss function (modified cross-entropy)
- For regression:
- CEuclideanLossLayer - Euclidean distance
- CL1LossLayer - L1 distance
- Additionally:
- CCenterLossLayer - the auxiliary center loss function that penalizes large variance inside a class
- For binary classification:
- Working with discrete features:
- CMultichannelLookupLayer - vector representation of discrete features
- CAccumulativeLookupLayer - the sum of vector representations of a discrete feature
- CPositionalEmbeddingLayer - the vector representations of a position in sequence
- CEnumBinarizationLayer converts enumeration values to one-hot encoding
- CBitSetVectorizationLayer converts a bitset into a vector of ones and zeros
- Recurrent layers:
- CLstmLayer implements long short-term memory (LSTM)
- CGruLayer implements a gated recurrent unit (GRU)
- CQrnnLayer implements a quasi-recurrent layer
- CIrnnLayer implements IRNN
- CIndRnnLayer implement IndRNN
- Conditional random field (CRF):
- CCrfLayer represents a CRF
- CCrfLossLayer calculates the loss function for training CRF
- CBestSequenceLayer finds optimal sequences in the results of CRF processing
- Connectionist temporal classification (CTC):
- CCtcLossLayer calculates the loss function
- CCtcDecodingLayer finds the optimal sequences in CTC response
- Classification quality assessment:
- CAccuracyLayer calculates the proportion of the objects classified correctly
- CPrecisionRecallLayer calculates the proportion of correctly classified objects for each of the two classes in binary classification
- CConfusionMatrixLayer calculates the confusion matrix for multi-class classification