Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cls predict module #769

Merged
merged 17 commits into from
Dec 2, 2024
Merged

Conversation

zhangjunlongtech
Copy link
Contributor

@zhangjunlongtech zhangjunlongtech commented Nov 15, 2024

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

  • This PR implements the CLS online inference component, adding text direction classification as an optional feature (disabled by default) between text detection and recognition end-to-end processes.
  • The component can classify and correct the orientation of the model after text detection, and then pass it into the text recognition model. It can improve the text recognition accuracy when the text is not in the positive direction.

Test Plan

1. Test the TextClassifier class separately

This PR provides a test script that separately tests the TextClassifier class. The test script runs tests based on the following process :
(Due to the directory reference structure, the test script cannot be run directly under the test directory)

  1. Move the script to the 'tools\infer\text' directory
  2. Run the command line
python tools/infer/text/test_cls.py  --image_dir {path_to_img or dir_to_imgs} --cls_algorithm MV3

In the example above, image_dir can be either the file directory of an image set or the address of a single image file

The target classification image is
CRNN_t2

The cls task output should looks like this

mindocr INFO - All cls res: [('180', 1.0)]

2. End-to-end testing

Run the following command for an end-to-end test:

python tools/infer/text/predict_system.py --image_dir {path_to_img} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \
                                          --cls_algorithm M3

The target image is
sys_test8

The e2e task output should looks like this

[2024-11-29 17:47:11] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:47:11] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:47:20] mindocr INFO - Num detected text boxes: 7
Det time: 8.79443621635437
[2024-11-29 17:47:20] mindocr INFO - num images for cls: 7
[2024-11-29 17:47:20] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:47:26] mindocr INFO - The number of images corrected by rotation is 6/7
CLS time: 5.613073825836182
[2024-11-29 17:47:26] mindocr INFO - num images for rec: 7
[2024-11-29 17:47:26] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:47:32] mindocr INFO - Recognized texts: 
warning 1.0
prevent 0.9961774945259094
snoles  0.9545091986656189
injury  0.9999916553497314
read    1.0
the     1.0
manual  0.9999873042106628
Rec time: 6.5265583992004395
[2024-11-29 17:47:32] mindocr INFO - Total time:20.936989784240723
[2024-11-29 17:47:32] mindocr INFO - Average FPS: 0.04776235792753265
[2024-11-29 17:47:32] mindocr INFO - Averge time cost: {'det': 8.79443621635437, 'cls': 5.613073825836182, 'rec': 6.5265583992004395, 'all': 20.936989784240723}
[2024-11-29 17:47:32] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

If --cls_algorithm is not configured, the cls process is not executed by default.

python tools/infer/text/predict_system.py --image_dir {path_to_img} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \

Under this condition, the non-positive image recognition accuracy is low:

[2024-11-29 17:49:37] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:49:37] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:49:46] mindocr INFO - Num detected text boxes: 7
Det time: 9.030900239944458
[2024-11-29 17:49:46] mindocr INFO - num images for rec: 7
[2024-11-29 17:49:46] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:49:53] mindocr INFO - Recognized texts: 
oninyvm 0.9057610630989075
quareud 0.8522056937217712
snoles  0.9545091986656189
unful   0.8281314969062805
pead    0.9597490429878235
aul     0.7256217002868652
jenuew  0.9899640679359436
Rec time: 6.544724464416504
[2024-11-29 17:49:53] mindocr INFO - Total time:15.577537059783936
[2024-11-29 17:49:53] mindocr INFO - Average FPS: 0.0641950005422661
[2024-11-29 17:49:53] mindocr INFO - Averge time cost: {'det': 9.030900239944458, 'rec': 6.544724464416504, 'all': 15.577537059783936}
[2024-11-29 17:49:53] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

3. e2e multi-graph online inference

e2e multi-graph online inference is tested, and the test file directory is as follows:

|-test_cls
|- example_img03_for_e2e.png
|- example_img04_for_e2e.png

The test command is as follows:

python tools/infer/text/predict_system.py --image_dir {dir_to_imgs} \
                                          --det_algorithm DB++  \
                                          --rec_algorithm CRNN  \
                                          --cls_algorithm M3

The test results are as follows, and the results are in line with expectations:


INFO: Infering [1/2]: /home/nginx/work/zhangjunlong/mindocr_main/tests/st/test_cls/img_for_e2e/example_img03_for_e2e.png
[2024-11-29 17:41:35] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:41:35] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:41:43] mindocr INFO - Num detected text boxes: 7
Det time: 8.909371614456177
[2024-11-29 17:41:43] mindocr INFO - num images for cls: 7
[2024-11-29 17:41:43] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:41:49] mindocr INFO - The number of images corrected by rotation is 0/7
CLS time: 5.688442230224609
[2024-11-29 17:41:49] mindocr INFO - num images for rec: 7
[2024-11-29 17:41:49] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - Recognized texts: 
manual  0.9541060328483582
read    1.0
the     0.8334037661552429
injury  1.0
serious 1.0
prevent 1.0
warning 1.0
Rec time: 6.5503151416778564

INFO: Infering [2/2]: /home/nginx/work/zhangjunlong/mindocr_main/tests/st/test_cls/img_for_e2e/example_img04_for_e2e.png
[2024-11-29 17:41:56] mindocr INFO - Original image shape: (350, 350, 3)
[2024-11-29 17:41:56] mindocr INFO - After det preprocess: (3, 352, 352)
[2024-11-29 17:41:56] mindocr INFO - Num detected text boxes: 7
Det time: 0.02867865562438965
[2024-11-29 17:41:56] mindocr INFO - num images for cls: 7
[2024-11-29 17:41:56] mindocr INFO - CLS img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - The number of images corrected by rotation is 6/7
CLS time: 0.007650613784790039
[2024-11-29 17:41:56] mindocr INFO - num images for rec: 7
[2024-11-29 17:41:56] mindocr INFO - Rec img idx range: [0, 7)
[2024-11-29 17:41:56] mindocr INFO - Recognized texts: 
warning 1.0
prevent 0.9961774945259094
snoles  0.9545091986656189
injury  0.9999916553497314
read    1.0
the     1.0
manual  0.9999873042106628
Rec time: 0.009536981582641602

[2024-11-29 17:41:56] mindocr INFO - Total time:21.19942545890808
[2024-11-29 17:41:56] mindocr INFO - Average FPS: 0.0943421794084326
[2024-11-29 17:41:56] mindocr INFO - Averge time cost: {'det': 4.469025135040283, 'cls': 2.8480464220046997, 'rec': 3.279926061630249, 'all': 10.59971272945404}
[2024-11-29 17:41:56] mindocr INFO - Done! Results saved in ./inference_results/system_results.txt

@zhangjunlongtech zhangjunlongtech changed the title Add online prediction of text direction classification (CLS) model Support online cls model prediction Nov 15, 2024
@@ -238,6 +238,56 @@ Evaluation of the text spotting inference results on Ascend 910 with MindSpore 2
2. Unless extra inidication, all experiments are run with `--det_limit_type`="min" and `--det_limit_side`=720.
3. SVTR is run in mixed precision mode (amp_level=O2) since it is optimized for O2.

## Text Direction Classification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块不用单独呈现,e2e的时候加上就行

Copy link
Contributor Author

@zhangjunlongtech zhangjunlongtech Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整代码结构,将cls模块作为一个可选功能(默认不启用)组件放在检测-识别e2e流程当中,不单独呈现,请检视

@zhangjunlongtech zhangjunlongtech changed the title Support online cls model prediction support cls infer module Nov 21, 2024
@zhangjunlongtech zhangjunlongtech changed the title support cls infer module support cls predict module Nov 21, 2024
"--save_cls_result",
type=str2bool,
default=True,
help="whether to use cls model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的表述,--save_cls_result应该不是whether to use cls model吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

img_pred = f"{fn}_crop_{i}" + "\t" + cls_res[0] + "\n"
lines.append(img_pred)

with open(save_path, "w", encoding="utf-8") as f_cls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里with open要指定mode是"a"。如果在推理多张图片的情况下,指定"w"的话,会导致结果只存在一张图片的信息。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更正


In the example above, image_dir can be either the file directory of images set or the address of a single image file

Test image files address see: https://github.com/zhangjunlongtech/Material/tree/main/CLS/test_for_cls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个删掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN \
--use_cls True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是不太好,和上面的参数一样改成 --cls_algorithm M3 吧,默认是None,choise当前只有M3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@CaitinZhao CaitinZhao merged commit 8e0e58a into mindspore-lab:main Dec 2, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants