This class implements a long short-term memory (LSTM) layer that can be applied to a set of vector sequences.
The output is a sequence containing the same number of vectors, each of GetHiddenSize()
size.
Hidden layer size
void SetHiddenSize(int size);
Sets the hidden layer size. It affects the output size and the size of the state vector inside the LSTM.
void SetDropoutRate(float newDropoutRate);
Sets the dropout probability. If this value is set, the operation will be performed on the input combined with the output of the last run; the result will be passed to the fully connected layer.
void SetRecurrentActivation( TActivationFunction newActivation );
Sets the activation function that is used in forget
, reset
, and input
gates. By default, AF_Sigmoid
is used.
CPtr<CDnnBlob> GetWeightsData() const;
The weight matrix containing the weights for each gate. The matrix is represented by a blob of the following dimensions:
BatchLength * BatchWidth * ListSize
is equal to4 * GetHiddenSize()
.Height * Width * Depth * Channels
is equal to the sum of the same dimension of the input andGetHiddenSize()
.
The BatchLength * BatchWidth * ListSize
axis corresponds to the gate weights, in the following order:
G_Main = 0, // The main output data
G_Forget, // Forget gate
G_Input, // Input gate
G_Reset, // Reset gate
The Height * Width * Depth * Channels
axis corresponds to the weights:
0
to the input size: weights that serve as coefficients for the vectors of the input sequence;- the rest of the coordinates (up to
HiddenSize
) correspond to the weights that serve as coefficients for the output of the previous step.
CPtr<CDnnBlob> GetFreeTermData() const
The free terms are represented by a blob of the total size 4 * GetHiddenSize()
. The order in which they correspond to the gates is the same as above.
The layer may have 1 to 3 inputs:
- The set of vector sequences.
- [Optional] The initial state of the LSTM layer before the first step. If this input is not specified, the initial state is all zeros.
- [Optional] The initial value of the "previous output" to be used on the first step. If this input is not specified, all zeros are used.
BatchLength
- the length of one vector sequence.BatchWidth
- the number of vector sequences in the input set.ListSize
should be1
.Height * Width * Depth * Channels
- the size of each vector in the sequence.
BatchLength
andListSize
should be1
.BatchWidth
should be equal to theBatchWidth
of the first input.Height * Width * Depth * Channels
must be equal to theGetHiddenSize()
.
The layer has two outputs:
- The result of the current step.
- The layer history.
Both outputs are of the following size:
BatchLength
andBatchWidth
are equal to the same sizes of the first input.ListSize
,Height
,Width
, andDepth
equal1
.Channels
equalsGetHiddenSize()
.