You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm training oriented rcnn on dior dataset, but my loss of model is nan at epoch 7. The problem did not happen in DOTAv1.0 and DOTA1.5 datasets. How can I solve this problem?
Additional information
No response
The text was updated successfully, but these errors were encountered:
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmrotate
Environment
sys.platform: linux
Python: 3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA RTX A6000
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 1.11.0+cu113
PyTorch compiling details: PyTorch built with:
TorchVision: 0.12.0+cu113
OpenCV: 4.9.0
MMEngine: 0.10.4
MMRotate: 1.0.0rc1+
Reproduces the problem - code sample
dataset settings
dataset_type = 'DIORDataset'
data_root = '/work/data/datasets/DIOR/'
backend_args = None
train_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(type='mmdet.PackDetInputs')
]
val_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
# avoid bboxes being resized
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
test_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
train_dataloader = dict(
batch_size=2,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
dataset=dict(
type='ConcatDataset',
ignore_keys=['DATASET_TYPE'],
datasets=[
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/train.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline),
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/val.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline,
backend_args=backend_args)
]))
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/test.txt',
data_prefix=dict(img_path='JPEGImages-test'),
test_mode=True,
pipeline=val_pipeline,
backend_args=backend_args))
test_dataloader = val_dataloader
val_evaluator = dict(type='DOTAMetric', metric='mAP')
test_evaluator = val_evaluator
Reproduces the problem - command or script
dataset settings
dataset_type = 'DIORDataset'
data_root = '/work/data/datasets/DIOR/'
backend_args = None
train_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
dict(
type='mmdet.RandomFlip',
prob=0.75,
direction=['horizontal', 'vertical', 'diagonal']),
dict(type='mmdet.PackDetInputs')
]
val_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
# avoid bboxes being resized
dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
test_pipeline = [
dict(type='mmdet.LoadImageFromFile', backend_args=backend_args),
dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
train_dataloader = dict(
batch_size=2,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=None,
dataset=dict(
type='ConcatDataset',
ignore_keys=['DATASET_TYPE'],
datasets=[
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/train.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline),
dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/val.txt',
data_prefix=dict(img_path='JPEGImages-trainval'),
filter_cfg=dict(filter_empty_gt=True),
pipeline=train_pipeline,
backend_args=backend_args)
]))
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='ImageSets/Main/test.txt',
data_prefix=dict(img_path='JPEGImages-test'),
test_mode=True,
pipeline=val_pipeline,
backend_args=backend_args))
test_dataloader = val_dataloader
val_evaluator = dict(type='DOTAMetric', metric='mAP')
test_evaluator = val_evaluator
Reproduces the problem - error message
I'm training oriented rcnn on dior dataset, but my loss of model is nan at epoch 7. The problem did not happen in DOTAv1.0 and DOTA1.5 datasets. How can I solve this problem?
Additional information
No response
The text was updated successfully, but these errors were encountered: