This repository is the offical implementation of Efficient Small Object Detection on High-Resolution Images.
Python>=3.6.0 is required with all requirements.txt installed including PyTorch>=1.7:
# (optional) conda install cuda-toolkit=11.8 -c pytorch
pip install -r requirements.txt
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install setuptools==59.5.0
We currently support VisDrone, UAVDT, and TinyPerson datasets. Follow the instructions below to prepare datasets.
Data Prepare (click to expand)
- Darknet Format: The Darknet framework locates labels automatically for each image by replacing the last instance of
/images/
in each image path with/labels/
. For example:
dataset/images/im0.jpg
dataset/labels/im0.txt
The images and labels from VisDrone, UAVDT, TinyPerson are all organized in this format.
- Ground-Truth Heatmap: We recommend to leverage the segment-anything model (SAM) to introduce precise shape prior to the GT heatmaps for training. You need to install SAM first:
cd third_party/segment-anything
pip install -e .
- Dataset - VisDrone: Download the data, and ensure the subsets under the
/path/to/visdrone
directory are as follows:
VisDrone2019-DET-train VisDrone2019-DET-val VisDrone2019-DET-test-dev VisDrone2019-DET-test-challenge
Then make a soft-link to your directory, and run the scripts/data_prepare.py
script to reorganize the images and labels:
ln -sf /path/to/visdrone VisDrone
python scripts/data_prepare.py --dataset VisDrone
- Dataset - UAVDT: Download the data and perform similar preprocessing:
ln -sf /path/to/uavdt UAVDT
python scripts/data_prepare.py --dataset UAVDT
- Dataset - TinyPerson: Download the data and perform similar preprocessing:
ln -sf /path/to/tinyperson TinyPerson
python scripts/data_prepare.py --dataset TinyPerson
Run commands below to reproduce results on the datasets, e.g., VisDrone. Download the pretrained weights (e.g., YOLOv5m) and put them to the weights/pretrained/
directory first.
- Training on Single GPU
Here are the default setting to adapt YOLOv5m to VisDrone using our ESOD framework:
DATASET=visdrone MODEL=yolov5m GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh
- DDP Training
When multiple GPUs are available, the DistributedDataParallel mode can be applied to speed up the training.
Simply set GPUS
according to your devices, e.g. GPUS=0,1,2,3
DATASET=visdrone MODEL=yolov5m GPUS=0,1,2,3 BATCH_SIZE=32 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh
- Model Card
We support the YOLOv5 series (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x). After downloading them to weights/
, simply change the MODEL
(as well as IMAGE_SIZE
) to apply different model:
DATASET=visdrone MODEL=yolov5s GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh
yolov5m 8 1536
yolov5l 4 1920
yolov5x 2 1920
Besides, we also support the RetinaNet, RTMDet, YOLOv8, and GPViT models. You can download the pre-trained weights, convert to ESOD initialization, and train the models to specific datasets:
python scripts/model_convert.py --model retinanet
DATASET=visdrone MODEL=retinanet GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1536 EPOCHS=50 bash ./scripts/train.sh
Feel free to set MODEL
as retinanet
, rtmdet
, or gpvit
(yolov8m
does not require model convert). The detailed instructions will come soon.
- Dataset Supported
Beside VisDrone, we also support model training on UAVDT and TinyPerson datasets:
DATASET=uavdt MODEL=yolov5m GPUS=0 BATCH_SIZE=8 IMAGE_SIZE=1280 EPOCHS=50 bash ./scripts/train.sh
tinyperson 2048
Run commands below to compute evaluation results (AP, AP50, empty rate, missing rate) with intergrated utils/metrics.py
.
python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 8 --img-size 1536 --device 0
For computational analysis (including GFLOPs and FPS), use the following command:
python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 1 --img-size 1536 --device 0 --task measure
The organizations of data for official evaluation tools are different from Darknet. So an intermediate data conversion is required. Run command below to get the results in Darknet format.
python test.py --data data/visdrone.yaml --weights weights/yolov5m.pt --batch-size 8 --img-size 1536 --device 0 --task test --save-txt --save-conf
Then run the specified script data_convert.py
for corresponding data formats and perform official evaluations.
- VisDrone test-dev set:
cp -r ./evaluate/VisDrone2018-DET-toolkit ./VisDrone/
python data_convert.py --dataset VisDrone --pred runs/test/exp/labels
cd ./VisDrone/VisDrone2018-DET-toolkit
matlab -nodesktop -nosplash -r evalDET
-
UAVDT : coming soon.
-
TinyPerson: coming soon.
The script detect.py
runs inference and saves the results to runs/detect
.
python detect.py --weights weights/yolov5m.pt --source data/images/visdrone.txt --img-size 1536 --device 0 --view-cluster --line-thickness 1
--view-cluster
will draw the generated patches in green boxes and save the heat maps from both prediction and ground truth.
[ ] coming soon.
A large part of the code is borrowed from YOLO. Many thanks for this wonderful work.
If you find this work useful in your research, please kindly cite the paper:
@article{liu2024esod,
title={ESOD: Efficient Small Object Detection on High-Resolution Images},
author={Liu, Kai and Fu, Zhihang and Jin, Sheng and Chen, Ze and Zhou, Fan and Jiang, Rongxin and Chen, Yaowu and Ye, Jieping},
journal={IEEE Transactions on Image Processing},
year={2024},
publisher={IEEE},
url={https://arxiv.org/abs/2407.16424},
}