A people counter app optimized to work with very little latency to work on edge. The model is optimized with Intel Open Vino toolkit. This repo also demonstrates how inference could take place with a video.
Let us first see a worksflow of how Open Vino toolkit converts models
for us. First the model optimizer searches the list of known layers for
each layer in the input model. The inference engine thenn loads the
layers from the input model IR files into the specified device plugin,
which will search a list of known layer implementations for the device.
If your model architecure contains layer or layers that are not in the
list of known layers for the device, the Inference Engine considers the
layer to be unsupported and reports an error. To see the layers that
are supported by each device plugin for the Inference Engine, refer to
the Supported Devices documentation.
When implementing a custom layer for your pre-trained model in the Open Vino toolkit, you will need to add extensions to both the Model Optimizer and the Inference Engine.
The following figure shows the basic processing steps for the Model Optimizer highlighting the two necessary custom layer extensions, the Custom Layer Extractor and the Custom Layer Operation.
The Model Optimizer first extracts information from the input model which includes the topology of the model layers along with parameters, input and output format, etc., for each layer. The model is then optimized from the various known characteristics of the layers, interconnects, and data flow which partly comes from the layer operation providing details including the shape of the output for each layer. Finally, the optimized model is output to the model IR files needed by the Inference Engine to run the model.
There are majorly two custom layer extensions required-
- Custom Layer Extractor
Responsible for identifying the custom layer operation and extracting the parameters for each instance of the custom layer. The layer parameters are stored per instance and used by the layer operation before finally appearing in the output IR. Typically the input layer parameters are unchanged, which is the case covered by this tutorial.
- Custom Layer Operation
Responsible for specifying the attributes that are supported by the custom layer and computing the output shape for each instance of the custom layer from its parameters. The --mo-op
command-line argument shown in the examples below generates a custom layer operation for the Model Optimizer.
The following figure shows the basic flow for the Inference Engine highlighting two custom layer extensions for the CPU and GPU Plugins, the Custom Layer CPU extension and the Custom Layer GPU Extension.
Each device plugin includes a library of optimized implementations to execute known layer operations which must be extended to execute a custom layer. The custom layer extension is implemented according to the target device:
- Custom Layer CPU Extension
A compiled shared library (.so
or .dll
binary) needed by the CPU Plugin for executing the custom layer on the CPU.
- Custom Layer GPU Extension
OpenCL source code (.cl
) for the custom layer kernel that will be compiled to execute on the GPU along with a layer description file (.xml
) needed by the GPU Plugin for the custom layer kernel.
The Model Extension Generator tool generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine.
The script for this is available here- /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py
You could use this script in the following manner:
usage: You can use any combination of the following arguments:
Arguments to configure extension generation in the interactive mode:
optional arguments:
-h, --help show this help message and exit
--mo-caffe-ext generate a Model Optimizer Caffe* extractor
--mo-mxnet-ext generate a Model Optimizer MXNet* extractor
--mo-tf-ext generate a Model Optimizer TensorFlow* extractor
--mo-op generate a Model Optimizer operation
--ie-cpu-ext generate an Inference Engine CPU extension
--ie-gpu-ext generate an Inference Engine GPU extension
--output_dir OUTPUT_DIR
set an output directory. If not specified, the current
directory is used by default.
In industry problems it becomes very important to be able to convert custom layers as your teams might be developing something new or researching on something and your application to work smoothly you would need to know how you could have support for custom layers.
Another common use case would be when using lambda
layers. These layers are where you could add an arbitary peice of code to your model implementation. You would need to have support for these kind of layers and custom layers is your way
I ended up using a model from Intel OpenVino Model Zoo due to poor performance of converted models. I have majorly stressed on model accuracy and inference time, I have included model size as a secondary metric. I have stated the models I experimented with. For more information take a look at Model Research.
SSD MobileNet V2 | SSD Inception V2 | SSD Coco MobileNet V1 | |
---|---|---|---|
Before Conversion | 67 MB | 98 MB | 28 MB |
After Conversion | 65 MB | 96 MB | 26 MB |
SSD MobileNet V2 | SSD Inception V2 | SSD Coco MobileNet V1 | |
---|---|---|---|
Before Conversion | 50 ms | 150 ms | 55 ms |
After Conversion | 60 ms | 155 ms | 60 ms |
I beleive a similar application could be widely useful, here are a few of which I feel have a great impact
- Make Business detection in stores
With reliable people counting application in stores could help businesses make effective deisions and help in-
- Compare performance across different areas
- Calculate people patterns
- Optimise product placing strategies
- Optimize staff placing
- Queue managemnet techniques
We all might have experienced leaving a store due to high amount of queues, a similar application could help in that.
- Increase customer retention
- Calculate percentage of people who leave stores due to long queues
- Optimize staffs
- Help in right placement of billing desks
- Space management
This application could also help in making best use of spaces in public areas and also aid in placement of furniture and/ or artifacts.
- Know the area in a space where people crowd the most
- Discover when stores generate the most traffic which might open sale opurtunities
- In Airports
Airports are one of the busiest places and with so much data with an application like this, some wonderful insights could be derived.
- Identify which shops perform well
- Make sure each shop is placed at the optimal position
- Manage queues
- As a security system
This application could also be used in security systems, The system could identify if a person has broken in a shop and in the case capture a photo or send alerts. This could be an automated and effective solution to thefts.
Lighting, model accuracy, and camera focal length/image size have different effects on a deployed edge model. The potential effects of each of these are as follows:
-
In case of poor lighting model's accuracy may fail dramatically or even completely drop close to zero. However, this can be mitigated with good hardware that can process the images from poorly lit regions before passing it to the model.
-
Natural decrease in model accuracy during conversion or other stages may make the model unusable if the doesn't perform the required task such as detecting risk of crimes as mentioned above in use-cases. A effectuve solution to this would be to put the model into validation mode for an hour or so. During this time the various devices could perform federated learning to give better performance. This might drastically provide a better performance.
-
Distorted input from camera due to change in focal length and/or image size will affect the model because the model may fail to make sense of the input and the distored input may not be detected properly by the model. An approach to solve this would be to use some augmnted images while training models and specifying the threshold skews, this could be a potential solution. However, the ranges would need to be selected correctly or it could lead to a loss of accuracy.
In investigating potential people counter models, I tried each of the following three models:
- Model 1: SSD Mobilenet
- Model Source
- I converted the model to an Intermediate Representation with the following arguments
python mo_tf.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json
- The model was insufficient for the app because it wasn't pretty accurate while doing inference. Here's an image showing mis classification of the model:
-
I tried to improve the model for the app by using some transfer learning techniques, I tried to retrain few of the model layers with some additional data but that did not work too well for this use case.
-
Model 2: SSD Inception V2]
- Model Source
- I converted the model to an Intermediate Representation with the following arguments
python mo_tf.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json
-
The model was insufficient for the app because it had pretty high latency in making predictions ~155 ms whereas the model I now use just takes ~40 ms. It made accurate predictions but due to a very huge tradeoff in inference time, the model could not be used.
-
I tried to improve the model for the app by reducing the precision of weights, however this had a very huge impact on the accuracy.
-
Model 3: SSD Coco MobileNet V1
- Model Source
- I converted the model to an Intermediate Representation with the following arguments
python mo_tf.py --input_model frozen_inference_graph.pb --tensorflow_object_detection_api_pipeline_config pipeline.config --reverse_input_channels --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json
- The model was insufficient for the app because it had a very low inference accuracy. I particularly observed a trend that it was unable to identify people with their back facing towards the camera making this model unusable.
As having explained above the issues I faced with some other models so I ended up using models from the OpenVino Model zoo, I particularly found two models which seemed best for the job
These models are in fact based on the MobileNet model, the MobileNet model performed well for me considering latency and size apart of few inference errors. These models have fixed that error.
I found that person-detection-retail-0013 had a higher overall accuracy and ended up using it.
Download all the pre-requisite libraries and source the openvino installation using the following commands:
pip install requests pyyaml -t /usr/local/lib/python3.5/dist-packages && clear &&
source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5
Navigate to the directory containing the Model Downloader:
cd /opt/intel/openvino/deployment_tools/open_model_zoo/tools/downloader
Within there, you'll notice a downloader.py
file, and can use the -h
argument with it to see available arguments, --name
for model name, and --precisions
, used when only certain precisions are desired, are the important arguments. Use the following command to download the model
sudo ./downloader.py --name person-detection-retail-0013 --precisions FP16 -o /home/workspace
Open a new terminal
Execute the following commands:
cd webservice/server
npm install
After installation, run:
cd node-server
node ./server.js
If succesful you should receive a message-
Mosca Server Started.
Open another terminal
These commands will compile the UI for you
cd webservice/ui
npm install
After installation, run:
npm run dev
If succesful you should receive a message-
webpack: Compiled successfully
Open another terminal and run:
This will set up the ffmpeg
for you
sudo ffserver -f ./ffmpeg/server.conf
Finally execute the following command in another terminal
This peice of code specifies the testing video provided in resources/
folder and run it on port 3004
python main.py -i resources/Pedestrian_Detect_2_1_1.mp4 -m person-detection-retail-0013/FP32/person-detection-retail-0013.xml -l /opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so -d CPU -pt 0.6 | ffmpeg -v warning -f rawvideo -pixel_format bgr24 -video_size 768x432 -framerate 24 -i - http://0.0.0.0:3004/fac.ffm