Scene text recognition is an important research field focused on visual understanding, which involves cross-modal processing of visual and text semantic information. Accurately recognizing irregular scene text, which has problems such as low resolution, blurriness, deformation, uneven illumination, and so on, is a common challenge for existing scene text recognition methods. In this paper, we proposed a novel scene text recognition method based on text semantic enhancement and characters graph reasoning (SEGR) to improve the accuracy of irregular text recognition. Specifically, SEGR consists of a visual recognition branch that performs preliminary recognition based on visual features and an iterative correction branch that performs the correction of the preliminary recognition by mining semantic information and relationships between characters. The iteration correction branch consists of a text semantic enhancement module based on transformer and a relational reasoning module based on characters graph.
pip install torch==1.7.1 torchvision==0.8.2 fastai==1.0.60 opencv-python tensorboardX lmdb pillow
We used datasets in LMDB format for training and evaluation. Synthetic datasets MJSynth, SynthTex and WikiText were used in the training process, and three irregular text datasets and three regular text datasets were used in the evaluation process.
-
training datasets
-
Evaluation datasets
The evaluation data set can be downloaded from GoogleDrive. It can also be downloaded from the corresponding official website.- Regular scene text datasets
- Irregular scene text datasets
- Regular scene text datasets
-
The directory structure of the dataset is as follows:
data ├── charset_36.txt ├── evaluation │ ├── CUTE80 │ ├── IC13_857 │ ├── IC15_1811 │ ├── IIIT5k_3000 │ ├── SVT │ └── SVTP ├── training │ ├── MJ │ │ ├── MJ_test │ │ ├── MJ_train │ │ └── MJ_valid │ └── ST ├── WikiText-103.csv └── WikiText-103_eval_d1.csv
The SEGR pretrained model provided by us is on BaiduNetdisk(passwd:ph33), you can download it by yourself. The performance of the pretrained model on the evaluation datasets are shown in the following table:
Model | IC13 | SVT | IIIT | IC15 | SVTP | CUTE |
---|---|---|---|---|---|---|
SEGR | 97.7 | 94.1 | 96.4 | 86.0 | 90.1 | 92.7 |
If you want to train the model, you can use the following command:
CUDA_VISIBLE_DEVICES=0, 1 python main.py --config=configs/train_segr.yaml
If you want to evaluate the model, you can use the following command:
CUDA_VISIBLE_DEVICES=0, 1 python main.py --config=configs/train_segr.yaml --phase test --image_only
This PyTorch implementation is based on ABINet.