SEGR:Semantic Enhancement and Graph Reasoning for Irregular Scene Text Recognition

Scene text recognition is an important research field focused on visual understanding, which involves cross-modal processing of visual and text semantic information. Accurately recognizing irregular scene text, which has problems such as low resolution, blurriness, deformation, uneven illumination, and so on, is a common challenge for existing scene text recognition methods. In this paper, we proposed a novel scene text recognition method based on text semantic enhancement and characters graph reasoning (SEGR) to improve the accuracy of irregular text recognition. Specifically, SEGR consists of a visual recognition branch that performs preliminary recognition based on visual features and an iterative correction branch that performs the correction of the preliminary recognition by mining semantic information and relationships between characters. The iteration correction branch consists of a text semantic enhancement module based on transformer and a relational reasoning module based on characters graph.

Requirements

pip install torch==1.7.1 torchvision==0.8.2 fastai==1.0.60 opencv-python tensorboardX lmdb pillow

Datasets

We used datasets in LMDB format for training and evaluation. Synthetic datasets MJSynth, SynthTex and WikiText were used in the training process, and three irregular text datasets and three regular text datasets were used in the evaluation process.

training datasets
Evaluation datasets
The evaluation data set can be downloaded from GoogleDrive. It can also be downloaded from the corresponding official website.
- Regular scene text datasets
- Irregular scene text datasets

The directory structure of the dataset is as follows:

data
├── charset_36.txt
├── evaluation
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── training
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST
├── WikiText-103.csv
└── WikiText-103_eval_d1.csv

Models

The SEGR pretrained model provided by us is on BaiduNetdisk(passwd:ph33), you can download it by yourself. The performance of the pretrained model on the evaluation datasets are shown in the following table:

Model	IC13	SVT	IIIT	IC15	SVTP	CUTE
SEGR	97.7	94.1	96.4	86.0	90.1	92.7

Training

If you want to train the model, you can use the following command:

CUDA_VISIBLE_DEVICES=0, 1 python main.py --config=configs/train_segr.yaml

Evaluation

If you want to evaluate the model, you can use the following command:

CUDA_VISIBLE_DEVICES=0, 1 python main.py --config=configs/train_segr.yaml --phase test --image_only

Acknowledgements

This PyTorch implementation is based on ABINet.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
__pycache__		__pycache__
configs		configs
data		data
docker		docker
figs		figs
layers		layers
modules		modules
notebooks		notebooks
tools		tools
LICENSE		LICENSE
README.md		README.md
callbacks.py		callbacks.py
dataset.py		dataset.py
losses.py		losses.py
main.py		main.py
requirements.txt		requirements.txt
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEGR:Semantic Enhancement and Graph Reasoning for Irregular Scene Text Recognition

Requirements

Datasets

Models

Training

Evaluation

Acknowledgements

About

Releases

Packages

Languages

License

HHeracles/SEGR

Folders and files

Latest commit

History

Repository files navigation

SEGR:Semantic Enhancement and Graph Reasoning for Irregular Scene Text Recognition

Requirements

Datasets

Models

Training

Evaluation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages