Skip to content

Latest commit

 

History

History
45 lines (34 loc) · 8.24 KB

README.md

File metadata and controls

45 lines (34 loc) · 8.24 KB

YOLOv8

Abstract

Ultralytics YOLOv8, developed by Ultralytics, is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection, image segmentation and image classification tasks.

performance
YOLOv8-P5 model structure

Results and models

COCO

Backbone Arch size Mask Refine SyncBN AMP Mem (GB) box AP TTA box AP Config Download
YOLOv8-n P5 640 No Yes Yes 2.8 37.2 config model | log
YOLOv8-n P5 640 Yes Yes Yes 2.5 37.4 (+0.2) 39.9 config model | log
YOLOv8-s P5 640 No Yes Yes 4.0 44.2 config model | log
YOLOv8-s P5 640 Yes Yes Yes 4.0 45.1 (+0.9) 46.8 config model | log
YOLOv8-m P5 640 No Yes Yes 7.2 49.8 config model | log
YOLOv8-m P5 640 Yes Yes Yes 7.0 50.6 (+0.8) 52.3 config model | log
YOLOv8-l P5 640 No Yes Yes 9.8 52.1 config model | log
YOLOv8-l P5 640 Yes Yes Yes 9.1 53.0 (+0.9) 54.4 config model | log
YOLOv8-x P5 640 No Yes Yes 12.2 52.7 config model | log
YOLOv8-x P5 640 Yes Yes Yes 12.4 54.0 (+1.3) 55.0 config model | log

Note

  1. We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code, but has no effect on performance.
  2. The performance is unstable and may fluctuate by about 0.3 mAP and the highest performance weight in COCO training in YOLOv8 may not be the last epoch. The performance shown above is the best model.
  3. We provide scripts to convert official weights to MMYOLO.
  4. SyncBN means using SyncBN, AMP indicates training with mixed precision.
  5. The performance of Mask Refine training is for the weight performance officially released by YOLOv8. Mask Refine means refining bbox by mask while loading annotations and transforming after YOLOv5RandomAffine, and the L and X models use Copy Paste.
  6. TTA means that Test Time Augmentation. It's perform 3 multi-scaling transformations on the image, followed by 2 flipping transformations (flipping and not flipping). You only need to specify --tta when testing to enable. see TTA for details.

Citation