This project consists of a deep learning model able to predict the normal flow between two consecutive frames, being the normal flow the projection of the optical flow on the gradient directions of the image. The dataset used to train the model has been TartanAir dataset and the deep learning model is an encoder-decoder with residual blocks based on EVPropNet.
The normal flow is the projection of the optical flow on the gradient directions of the image and serves as a representation of image motion. To compute it, the brightness constancy constraint has to be applied. The brightness constancy constraint is one of the fundamental assumptions in optical flow computation and computer vision. It is based on the idea that, the intensity, or a function of the intensity, at a pixel remains constant over two consecutive frames. The mathematical expression of this constraint is:
where
Approximating the right part of the previous equation with a first order Taylor expansion we obtain:
And subtracting
Finally, keeping in mind that the optical flow
This equation represents the constraint line. That is, for any point
Therefore, to compute the normal flow vector, it is necessary to calculate the unit vector perpendicular to the constraint line and its magnitude, which corresponds to the distance from the origin to the constraint line. The mathematical expressions of this components are:
Obtaining
The deep learning model chosen to predict the normal flow between two consecutive frames has been an autoencoder. This autoencoder is based on EVPropNet, which in turn is based on ResNet. The encoder contains residual blocks with convolutional layers and the decoder contains residual blocks with transpose convolutional layers. The gradients are backpropagated using a mean squared loss computed between groundtruth and predicted normal flow:
The model takes as input two concatenated frames and outputs a matrix of two channels, the components of the normal flow of each pixel. That is, the dimensions of the input and output tensors are
As mention before, ehe dataset used to train the model has been TartanAir dataset. This dataset provides many image sequences of different scenarios created in Unreal Engine. At the same time they provide depth maps, optical flow, camera positions and orientations in each image and more. You need to visit their website, download the scenarios you want and organize the images and their optical flow data the same way they are here in the dataset/train/
folder in the repository. When the data is correctly organized, run the file
python dataset.py
This will create a json file like the one here in the repository with the paths to all images and optical flow data. Then run
python train.py
and the model will start training. A folder like the one here called autoencoder/
will be created. The training checkpoints, as well as the loss values, will be saved here. At the end of the training, an image showing the loss curves will also be saved. You can check the folder in this repository to see what it looks like and the loss curves I have obtained.
To test the model, organise the dataset in the same way as before but using the dataset/test/
folder instead, and run the test file
python test.py
You only have to enter the name of the checkpoint you want to use. In my case, I run
python test.py --checkpoint=checkpoint_395_best.pth
because that's the name of the checkpoint where the loss value was the lowest in my training.
In the following video you can see the results obtained using the hospital scenario of the data set. The video shows 4 screens: the top left screen is the original image, the top right screen is the optical flow, the bottom left screen is the ground truth normal flow and the bottom right screen is the predicted normal flow.