support cls predict module #769

zhangjunlongtech · 2024-11-15T02:37:36Z

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

[✔] You have read the Contributing Guidelines on pull requests
[✔] Your code builds clean without any errors or warnings
[✔] You are using approved terminology
[✔] You have added unit tests

Motivation

This PR implements the CLS online inference component, adding text direction classification as an optional feature (disabled by default) between text detection and recognition end-to-end processes.
The component can classify and correct the orientation of the model after text detection, and then pass it into the text recognition model. It can improve the text recognition accuracy when the text is not in the positive direction.

Test Plan

1. Test the TextClassifier class separately

This PR provides a test script that separately tests the TextClassifier class. The test script runs tests based on the following process :
(Due to the directory reference structure, the test script cannot be run directly under the test directory)

Move the script to the 'tools\infer\text' directory
Run the command line

python tools/infer/text/test_cls.py  --image_dir {path_to_img or dir_to_imgs} --cls_algorithm MV3

In the example above, image_dir can be either the file directory of an image set or the address of a single image file

The target classification image is

The cls task output should looks like this

mindocr INFO - All cls res: [('180', 1.0)]

2. End-to-end testing

Run the following command for an end-to-end test：

python tools/infer/text/predict_system.py --image_dir {path_to_img} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \
                                          --cls_algorithm M3

The target image is

The e2e task output should looks like this

[2024-11-29 17:47:11] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:47:11] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:47:20] mindocr INFO - Num detected text boxes: 7
Det time: 8.79443621635437
[2024-11-29 17:47:20] mindocr INFO - num images for cls: 7
[2024-11-29 17:47:20] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:47:26] mindocr INFO - The number of images corrected by rotation is 6/7
CLS time: 5.613073825836182
[2024-11-29 17:47:26] mindocr INFO - num images for rec: 7
[2024-11-29 17:47:26] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:47:32] mindocr INFO - Recognized texts: 
warning 1.0
prevent 0.9961774945259094
snoles  0.9545091986656189
injury  0.9999916553497314
read    1.0
the     1.0
manual  0.9999873042106628
Rec time: 6.5265583992004395
[2024-11-29 17:47:32] mindocr INFO - Total time:20.936989784240723
[2024-11-29 17:47:32] mindocr INFO - Average FPS: 0.04776235792753265
[2024-11-29 17:47:32] mindocr INFO - Averge time cost: {'det': 8.79443621635437, 'cls': 5.613073825836182, 'rec': 6.5265583992004395, 'all': 20.936989784240723}
[2024-11-29 17:47:32] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

If --cls_algorithm is not configured, the cls process is not executed by default.

python tools/infer/text/predict_system.py --image_dir {path_to_img} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \

Under this condition, the non-positive image recognition accuracy is low:

[2024-11-29 17:49:37] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:49:37] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:49:46] mindocr INFO - Num detected text boxes: 7
Det time: 9.030900239944458
[2024-11-29 17:49:46] mindocr INFO - num images for rec: 7
[2024-11-29 17:49:46] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:49:53] mindocr INFO - Recognized texts: 
oninyvm 0.9057610630989075
quareud 0.8522056937217712
snoles  0.9545091986656189
unful   0.8281314969062805
pead    0.9597490429878235
aul     0.7256217002868652
jenuew  0.9899640679359436
Rec time: 6.544724464416504
[2024-11-29 17:49:53] mindocr INFO - Total time:15.577537059783936
[2024-11-29 17:49:53] mindocr INFO - Average FPS: 0.0641950005422661
[2024-11-29 17:49:53] mindocr INFO - Averge time cost: {'det': 9.030900239944458, 'rec': 6.544724464416504, 'all': 15.577537059783936}
[2024-11-29 17:49:53] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

3. e2e multi-graph online inference

e2e multi-graph online inference is tested, and the test file directory is as follows:

|-test_cls
|- example_img03_for_e2e.png
|- example_img04_for_e2e.png

The test command is as follows:

python tools/infer/text/predict_system.py --image_dir {dir_to_imgs} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \
                                          --cls_algorithm M3

The test results are as follows, and the results are in line with expectations:


INFO: Infering [1/2]: /home/nginx/work/zhangjunlong/mindocr_main/tests/st/test_cls/img_for_e2e/example_img03_for_e2e.png
[2024-11-29 17:41:35] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:41:35] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:41:43] mindocr INFO - Num detected text boxes: 7
Det time: 8.909371614456177
[2024-11-29 17:41:43] mindocr INFO - num images for cls: 7
[2024-11-29 17:41:43] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:41:49] mindocr INFO - The number of images corrected by rotation is 0/7
CLS time: 5.688442230224609
[2024-11-29 17:41:49] mindocr INFO - num images for rec: 7
[2024-11-29 17:41:49] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - Recognized texts: 
manual  0.9541060328483582
read    1.0
the     0.8334037661552429
injury  1.0
serious 1.0
prevent 1.0
warning 1.0
Rec time: 6.5503151416778564

INFO: Infering [2/2]: /home/nginx/work/zhangjunlong/mindocr_main/tests/st/test_cls/img_for_e2e/example_img04_for_e2e.png
[2024-11-29 17:41:56] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:41:56] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:41:56] mindocr INFO - Num detected text boxes: 7
Det time: 0.02867865562438965
[2024-11-29 17:41:56] mindocr INFO - num images for cls: 7
[2024-11-29 17:41:56] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - The number of images corrected by rotation is 6/7
CLS time: 0.007650613784790039
[2024-11-29 17:41:56] mindocr INFO - num images for rec: 7
[2024-11-29 17:41:56] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - Recognized texts: 
warning 1.0
prevent 0.9961774945259094
snoles  0.9545091986656189
injury  0.9999916553497314
read    1.0
the     1.0
manual  0.9999873042106628
Rec time: 0.009536981582641602

[2024-11-29 17:41:56] mindocr INFO - Total time:21.19942545890808
[2024-11-29 17:41:56] mindocr INFO - Average FPS: 0.0943421794084326
[2024-11-29 17:41:56] mindocr INFO - Averge time cost: {'det': 4.469025135040283, 'cls': 2.8480464220046997, 'rec': 3.279926061630249, 'all': 10.59971272945404}
[2024-11-29 17:41:56] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

CaitinZhao · 2024-11-15T08:57:50Z

tools/infer/text/README.md

@@ -238,6 +238,56 @@ Evaluation of the text spotting inference results on Ascend 910 with MindSpore 2
 2. Unless extra inidication, all experiments are run with `--det_limit_type`="min" and `--det_limit_side`=720.
 3. SVTR is run in mixed precision mode (amp_level=O2) since it is optimized for O2.

+## Text Direction Classification


这块不用单独呈现，e2e的时候加上就行

已调整代码结构，将cls模块作为一个可选功能（默认不启用）组件放在检测-识别e2e流程当中，不单独呈现，请检视

alien-0119 · 2024-11-25T07:46:55Z

tools/infer/text/config.py

+        "--save_cls_result",
+        type=str2bool,
+        default=True,
+        help="whether to use cls model",


这里的表述，--save_cls_result应该不是whether to use cls model吧

alien-0119 · 2024-11-25T09:21:33Z

tools/infer/text/predict_system.py

+                img_pred = f"{fn}_crop_{i}" + "\t" + cls_res[0] + "\n"
+            lines.append(img_pred)
+
+        with open(save_path, "w", encoding="utf-8") as f_cls:


这里with open要指定mode是"a"。如果在推理多张图片的情况下，指定"w"的话，会导致结果只存在一张图片的信息。

CaitinZhao · 2024-11-28T10:56:20Z

tests/st/test_cls.py

+
+In the example above, image_dir can be either the file directory of images set or the address of a single image file
+
+Test image files address see: https://github.com/zhangjunlongtech/Material/tree/main/CLS/test_for_cls


这个删掉

CaitinZhao · 2024-11-29T08:18:37Z

tools/infer/text/README.md

+python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
+                                          --det_algorithm DB++  \
+                                          --rec_algorithm CRNN  \
+                                          --use_cls True


还是不太好，和上面的参数一样改成 --cls_algorithm M3 吧，默认是None，choise当前只有M3

Add online prediction of text direction classification model

4093e6c

zhangjunlongtech changed the title ~~Add online prediction of text direction classification (CLS) model~~ Support online cls model prediction Nov 15, 2024

zhangjunlongtech added 2 commits November 15, 2024 15:21

fix code specification

0c52d67

Merge https://github.com/mindspore-lab/mindocr into cls

02c0193

CaitinZhao reviewed Nov 15, 2024

View reviewed changes

zhangjunlongtech added 3 commits November 19, 2024 10:59

resolve conflict

97b2b27

add cls to predict_system

41cd79b

Deleted redundant images

ff16eef

zhangjunlongtech changed the title ~~Support online cls model prediction~~ support cls infer module Nov 21, 2024

zhangjunlongtech changed the title ~~support cls infer module~~ support cls predict module Nov 21, 2024

zhangjunlongtech added 2 commits November 22, 2024 09:25

adjust img catalog

76798f5

Resolve conflicts and add tests

0f78c11

alien-0119 reviewed Nov 25, 2024

View reviewed changes

zhangjunlongtech added 4 commits November 26, 2024 16:04

fix the config description

8fc10b6

fix some code details

2cb55c7

delete test_rotation

4044d90

delete imgs

0cc59f0

CaitinZhao reviewed Nov 28, 2024

View reviewed changes

zhangjunlongtech added 2 commits November 28, 2024 22:25

delete test_cls

e2fe480

fix use_cls description

0e567bb

CaitinZhao reviewed Nov 29, 2024

View reviewed changes

zhangjunlongtech added 2 commits November 29, 2024 17:55

Modify the cls start method

a821983

fix description

e18bfab

CaitinZhao approved these changes Nov 30, 2024

View reviewed changes

Merge branch 'main' into cls

150691b

SamitHuang approved these changes Dec 2, 2024

View reviewed changes

CaitinZhao merged commit 8e0e58a into mindspore-lab:main Dec 2, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support cls predict module #769

support cls predict module #769

zhangjunlongtech commented Nov 15, 2024 •

edited

Loading

CaitinZhao Nov 15, 2024

zhangjunlongtech Nov 21, 2024 •

edited

Loading

alien-0119 Nov 25, 2024

zhangjunlongtech Nov 26, 2024

alien-0119 Nov 25, 2024

zhangjunlongtech Nov 26, 2024

CaitinZhao Nov 28, 2024

zhangjunlongtech Nov 28, 2024

CaitinZhao Nov 29, 2024

zhangjunlongtech Nov 29, 2024


		In the example above, image_dir can be either the file directory of images set or the address of a single image file

		Test image files address see: https://github.com/zhangjunlongtech/Material/tree/main/CLS/test_for_cls

support cls predict module #769

support cls predict module #769

Conversation

zhangjunlongtech commented Nov 15, 2024 • edited Loading

Motivation

Test Plan

1. Test the TextClassifier class separately

2. End-to-end testing

3. e2e multi-graph online inference

Choose a reason for hiding this comment

zhangjunlongtech Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangjunlongtech commented Nov 15, 2024 •

edited

Loading

zhangjunlongtech Nov 21, 2024 •

edited

Loading