Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cls predict module #769

Merged
merged 17 commits into from
Dec 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions tools/infer/text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,53 @@ Evaluation of the text spotting inference results on Ascend 910 with MindSpore 2
2. Unless extra inidication, all experiments are run with `--det_limit_type`="min" and `--det_limit_side`=720.
3. SVTR is run in mixed precision mode (amp_level=O2) since it is optimized for O2.

### Text direction classification

If there are non-upright text characters in the image, they can be classified and corrected for orientation using a text direction classifier after the detection. If you run text direction classification and correction on an input image, please perform
```shell
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN \
--cls_algorithm M3
```
The default parameter `--cls_alorithm` is None, which means that text direction classification is not performed. By setting `--cls_alorithm`, text direction classification is performed in the text detection and recognition flow. In the process of execution, the text direction classifier classifies the list of images detected by the text and corrects the direction of the non-upright images. Here are some examples of the results.

- Text direction classification

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
</p>
<p align="center">
<em> word_01.png </em>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
</p>
<p align="center">
<em> word_02.png </em>
</p>

Classification Results::
```text
word_01.png 0 1.0
word_02.png 180 1.0
```

The currently supported text direction classification network is `mobilnet_v3`, which can be set by configuring `--cls_algorithm` for `M3`. And through `--cls_amp_level` and `--cls_model_dir` to set the text direction classifier automatic mixing precision and weight file. At present, the default weight file has been configured, the default mixing precision of the network is `O0`, and the direction classification supports `0` and `180` degrees under the default configuration. We will support the classification of other directions in the future.

<center>

|**Algorithm Name**|**Network Name**|**Language**|
| :------: | :------: | :------: |
| M3 | mobilenet_v3 | CH/EN|

</center>

In addition, by setting `--save_cls_result` to `True`, text orientation classification results can be saved to `{args.crop_res_save_dir}/cls_results.txt`, Where `--crop_res_save_dir` is the directory where the results are saved.

For more parameter descriptions and usage information, please refer to `tools/infer/text/config.py`.

## Table Structure Recognition

To run table structure recognition on an input image or multiple images in a directory, please run:
Expand Down
46 changes: 46 additions & 0 deletions tools/infer/text/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,52 @@ python deploy/eval_utils/eval_pipeline.py --gt_path path/to/gt.txt --pred_path p

3、SVTR在混合精度模式下运行(amp_level=O2),因为它针对O2进行了优化。

### 文本方向分类

若图像中存在非正向的文字,可通过文本方向分类器对检测后的图像进行方向分类与矫正。若对输入图像运行文本方向分类与矫正,请执行
```shell
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN \
--cls_algorithm M3
```
其中,参数`--cls_alorithm`默认配置为None,表示不执行文本方向分类,通过设置`--cls_alorithm`即可在文本检测识别流程中进行文本方向分类。执行过程中,文本方向分类器将对文本检测所得图像列表进行方向分类,并对非正向的图像进行方向矫正。以下为部分结果示例。

- 文本方向分类

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
</p>
<p align="center">
<em> word_01.png </em>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
</p>
<p align="center">
<em> word_02.png </em>
</p>

分类结果:
```text
word_01.png 0 1.0
word_02.png 180 1.0
```
当前支持的文本方向分类网络为`mobilnet_v3`,可通过配置`--cls_algorithm`为`M3`进行设置,并通过`--cls_amp_level`与`--cls_model_dir`来设置文本方向分类器的自动混合精度与权重文件。当前已配置默认权重文件,该网络默认混合精度为`O0`,默认配置下方向分类支持`0`与`180`度,对于其他方向的分类我们将在未来予以支持。

<center>

|**算法名称**|**网络名称**|**语言**|
| :------: | :------: | :------: |
| M3 | mobilenet_v3 | 中/英|

</center>

此外,可通过设置`--save_cls_result`为`True`可将文本方向分类结果保存至`{args.crop_res_save_dir}/cls_results.txt`中,其中`--crop_res_save_dir`是保存结果的目录。

有关更多参数说明和用法,请查看`tools/infer/text/config.py`

## 表格结构识别

要对输入图像或包含多个图像的目录运行表格结构识别,请执行
Expand Down
29 changes: 29 additions & 0 deletions tools/infer/text/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,35 @@ def create_parser():
help="Auto Mixed Precision level. This setting only works on GPU and Ascend",
)

parser.add_argument(
"--cls_algorithm",
type=str,
default=None,
choices=["M3"],
help="classification algorithm. The default is None,"
"meaning that text orientation classification is not performed",
)
parser.add_argument(
"--cls_amp_level",
type=str,
default="O0",
choices=["O0", "O1", "O2", "O3"],
help="Auto Mixed Precision level. This setting only works on GPU and Ascend",
)
parser.add_argument(
"--cls_model_dir",
type=str,
help="directory containing the classification model checkpoint best.ckpt"
"or path to a specific checkpoint file.",
)
parser.add_argument("--cls_batch_num", type=int, default=8)
parser.add_argument(
"--save_cls_result",
type=str2bool,
default=True,
help="whether to save the text direction classification result",
)

return parser


Expand Down
5 changes: 5 additions & 0 deletions tools/infer/text/postprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ def __init__(self, task="det", algo="DB", rec_char_dict_path=None, **kwargs):
merge_no_span_structure=True,
box_shape="pad",
)
elif task == "cls":
postproc_cfg = dict(name="ClsPostprocess", label_list=["0", "180"])

postproc_cfg.update(kwargs)
self.task = task
Expand Down Expand Up @@ -172,3 +174,6 @@ def __call__(self, pred, data=None, **kwargs):
elif self.task == "layout":
output = self.postprocess(pred, img_shape=kwargs.get("img_shape"), meta_info=kwargs.get("meta_info"))
return output
elif self.task == "cls":
output = self.postprocess(pred)
return output
Loading