This project is implemented using Jupyter Notebook (.ipynb File)
and Python
.
Face Recognition is a process of face detection and then classifying the faces into different classes. The methods used in this case are based on the application of Viola-Jones, for detecting the face using cascade classifier, and Eigen Faces algorithm, for face classification and it is based on the PCA algorithm. PCA is known as Principal component analysis which in other words is feature reduction or dimensionality reduction algorithm.
Viola-Jones algorithm works for frontal face recognition rather than faces looking sideways. It detects the face location on the grayscale image first then it detects on the colored one. A sliding box is used to trace all the image and Haar features are obtained for each region.
These are the features named Edge, Line, and four-sided features.
These features help the machine to understand the features of the image. For.ex. The edge feature has a one-sided dark and the other is light, This filter is, therefore, good at edge detection like the nose, lips line, etc. To obtain the feature value these features are placed over the region in the sliding rectangle and the sum in the light region is subtracted from the sum of pixels in the dark region.
Integral image is used to obtain the values of Haar-like feature values because otherwise, it is computationally very expensive to calculate it, because of the large number of image pixels.
The algorithm in training is setting a minimum threshold to determine whether something can be classified as a feature or not. The computation can be really expensive depending on the different possible combinations of features for each region of the sliding rectangle.
For each region, a linear combination of the weak classifiers(on the right) is obtained toform a strong classifier(on the left). AdaBoost is used forthis task. Adaboost works like,first we classify images using important feature acc. to us. The images for whichclassification is the wrong Adaboost uses another feature to classify wrong classified images, giving more importance to the wrong classified images. In other words, increasing the weight for these images in the overall algorithm. Similarly, we will go for the next feature and at last giving larger weight to the last image. Thus Adaboost forms the classifier from these Weak classifiers.
- Cascading:→ It is a hack to increase the speed and accuracy of Boosting. The feature is looked in the sub-window on the image and if Feature-i (Fi) is not present in this window then that sub-window is rejected and the process stopped for that window thus preventing further checking of the features. Since we have to do it for each feature it may be slow without Cascading.
How the Eigen-Faces algorithm classifies the Faces (All images are obtained from my code of Facial Recognition*) :
- 1.) Suppose we have M images of dimensions NxN.
- 2.) Then the images are flattened and stored into an array of images .
- 3.) Now, we calculate the average of all the images with shape (N *N,1). This average image is the so-called average face.
-
4.) Now, this average image is subtracted from all the Images to obtain a difference matrix.
-
5.) The shape of the difference matrix is (N*N, M).
-
6.) Now, the Covariance matrix is obtained from the difference matrix by multiplying its transpose with itself.
-
7.) The eigenvectors and eigenvalues of the matrix are obtained. Eigenvectors will determine the direction of maximum variability in the space. Therefore, top k eigenvectors are picked up from the obtained vectors by using the eigenvalues.
-
The eigenvectors (eigenfaces in this case) obtained are:
-
8.) Then, Training images are expressed as the linear combination of these top K eigenfaces. Weights for each image are stored in a weight array.
-
Testing: Now, for testing an image, the image is flattened and the average face of training images is subtracted. Then weights for the test image are obtained. Then the distance of the weights of the test image is calculated from the weights of the training images. The label corresponding to the training image, from which the distance of weights is minimum, is given to the test image and thus the image is classified.
-
Results and Observations : (Images are classified with Label too in the code*)
The algorithm is tested on different values of K i.e. no. of eigenfaces to be taken for linear combination. The graph I got is:
Although, Graph changes with each run of the algorithm because of random shuffling of images and thus splitting into training and testing sets. But in all graphs, it can be seen that the algorithm performs fairly good even on a small no. of images for the value of k which is between k=5 and k=80. It can be observed that Algorithm accuracy got peaks at some values of K and give almost the same performance at higher values of K.
NOTE Further in more detail is explained within the code
NOTE Haar-Cascade classifier training is done using OpenCV methods opencv_createsamples
, opencv_traincascade
. But These methods aren't available in the Opencv which is installed using python
. So, First we need to install full version of Opencv, using CMake
, to utilize the functions for training.
- 1.) Install CMake-Gui on Windows/Linux
- 2.) Clone OpenCV repo:
$ git clone https://github.com/opencv/opencv.git
$ mkdir build
$ cmake-gui
- 3.) Now CMake-Gui will be opened like:
- 4.) Select Source Code Directory in Cmake-Gui to the directory of OpenCV cloned folder (Created In Step 2).
- 5.) Select Build Binaries Directory in Cmake-Gui to the directory build we created (Created In Step 2).
- 6.) Select Configure in Cmake-build and select Visual Studio in the dropdown menu (I selected Sublime in the Pic):
- 7.) Now, Press Generate to generate the files in the build directory.
- 8.) Go to the directory build now (build created in step 2).
- 7.) Now, Search for
createsamples
in this directory and open the location of thecreatesamples
.
NOTE Now opencv_createsamples
and opencv_traincascade
is available to use in the build directory somewhere(You can search).
- 1.) Create Positive and Negative Images for training.
NOTE Positive images will contain faces (In face recognition) and negative images will contain images having no face.
- 2.) Label the positive images using MakeSense.Ai.
- 3.) After Labelling all the positive images manually you will get a csv file like CSV File
- 4.) Now, create a txt file for negative images contating names of the negative images only.
NOTE CSV file for positive images also contain the labels of the image. OpenCV training cascade doesn't require it. So, make sure that csv file contatins the row enteries like
pos/pos-1.pgm 2 0 0 100 40 1 1 80 60
. wherepos/pos-1.pgm
is the path of the image, In2 0 0 100 40 1 1 80 60
Where2
is the no. of faces and from0 0 100 40
0 0
as Top left coordinate of the rectangle drawn around face usingMakeSense.Ai
and100 40
is the width and height of the rectangle. All of this information will be extracted automatically fromMakeSense.Ai
. Only make the order aspath num_objects x y w h
for CSV file. - 5.) Now we have positive images and negative images with their CSV and txt(containing path of the -ve images only) files respectively.
- 6.) Now, run
opencv_createsamples -info yalefaces.info -num 550 -w 48 -h 24 -vec yalefaces.vec
. MakeSure you have conerted .csv file to .info file. - 7.) Now, run
opencv_createsamples -vec yalefaces.vec -w 48 -h 24
. - 8.) Now, run
opencv_traincascade -data data -vec yalefaces.vec -bg negative.txt -numPos 500 -numNeg 500 -numStages 10 -w 48 -h 24 -featureType LBP
. - 9.) Now in our data folder, we have cascade.xml , which is our final cascade and can be used for face detection. It also contains stage wise xml files after each stage. In my case i trained a HOG-feature classifier which can be found in Cascade Classifier
NOTE Now to use your own classifier for face detection use cascade.xml
file in the method of:
import cv2
face_cascade = cv2.CascadeClassifier('cascade.xml')
Feel free to add some useful.
NOTE A great IDEA ~~ U can Implement on large dataset and implement in real-time. We can do it together, Just contact on [email protected]
or [email protected]
.