mindspore-lab · CaitinZhao · Dec 2, 2024 · Nov 15, 2024 · Nov 15, 2024 · Nov 15, 2024
diff --git a/tools/infer/text/README.md b/tools/infer/text/README.md
@@ -248,6 +248,53 @@ Evaluation of the text spotting inference results on Ascend 910 with MindSpore 2
 2. Unless extra inidication, all experiments are run with `--det_limit_type`="min" and `--det_limit_side`=720.
 3. SVTR is run in mixed precision mode (amp_level=O2) since it is optimized for O2.
 
+### Text direction classification
+
+If there are non-upright text characters in the image, they can be classified and corrected for orientation using a text direction classifier after the detection. If you run text direction classification and correction on an input image, please perform
+```shell
+python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
+                                          --det_algorithm DB++  \
+                                          --rec_algorithm CRNN  \
+                                          --cls_algorithm M3
+```
+The default parameter `--cls_alorithm` is None, which means that text direction classification is not performed. By setting `--cls_alorithm`, text direction classification is performed in the text detection and recognition flow. In the process of execution, the text direction classifier classifies the list of images detected by the text and corrects the direction of the non-upright images. Here are some examples of the results.
+
+- Text direction classification
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
+</p>
+<p align="center">
+  <em> word_01.png </em>
+</p>
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
+</p>
+<p align="center">
+  <em> word_02.png </em>
+</p>
+
+Classification Results:：
+```text
+word_01.png   0     1.0
+word_02.png   180   1.0
+```
+
+The currently supported text direction classification network is `mobilnet_v3`, which can be set by configuring `--cls_algorithm` for `M3`. And through `--cls_amp_level` and `--cls_model_dir` to set the text direction classifier automatic mixing precision and weight file. At present, the default weight file has been configured, the default mixing precision of the network is `O0`, and the direction classification supports `0` and `180` degrees under the default configuration. We will support the classification of other directions in the future.
+
+<center>
+
+  |**Algorithm Name**|**Network Name**|**Language**|
+  | :------: | :------: | :------: |
+  | M3 | mobilenet_v3 | CH/EN|
+
+</center>
+
+In addition, by setting `--save_cls_result` to `True`, text orientation classification results can be saved to `{args.crop_res_save_dir}/cls_results.txt`, Where `--crop_res_save_dir` is the directory where the results are saved.
+
+For more parameter descriptions and usage information, please refer to `tools/infer/text/config.py`.
+
 ## Table Structure Recognition
 
 To run table structure recognition on an input image or multiple images in a directory, please run:

diff --git a/tools/infer/text/README_CN.md b/tools/infer/text/README_CN.md
@@ -230,6 +230,52 @@ python deploy/eval_utils/eval_pipeline.py --gt_path path/to/gt.txt --pred_path p
 
 3、SVTR在混合精度模式下运行（amp_level=O2），因为它针对O2进行了优化。
 
+### 文本方向分类
+
+若图像中存在非正向的文字，可通过文本方向分类器对检测后的图像进行方向分类与矫正。若对输入图像运行文本方向分类与矫正，请执行
+```shell
+python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
+                                          --det_algorithm DB++  \
+                                          --rec_algorithm CRNN  \
+                                          --cls_algorithm M3
+```
+其中，参数`--cls_alorithm`默认配置为None，表示不执行文本方向分类，通过设置`--cls_alorithm`即可在文本检测识别流程中进行文本方向分类。执行过程中，文本方向分类器将对文本检测所得图像列表进行方向分类，并对非正向的图像进行方向矫正。以下为部分结果示例。
+
+- 文本方向分类
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
+</p>
+<p align="center">
+  <em> word_01.png </em>
+</p>
+
+<p align="center">
+  <img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
+</p>
+<p align="center">
+  <em> word_02.png </em>
+</p>
+
+分类结果：
+```text
+word_01.png   0     1.0
+word_02.png   180   1.0
+```
+当前支持的文本方向分类网络为`mobilnet_v3`，可通过配置`--cls_algorithm`为`M3`进行设置，并通过`--cls_amp_level`与`--cls_model_dir`来设置文本方向分类器的自动混合精度与权重文件。当前已配置默认权重文件，该网络默认混合精度为`O0`，默认配置下方向分类支持`0`与`180`度，对于其他方向的分类我们将在未来予以支持。
+
+<center>
+
+  |**算法名称**|**网络名称**|**语言**|
+  | :------: | :------: | :------: |
+  | M3 | mobilenet_v3 | 中/英|
+
+</center>
+
+此外，可通过设置`--save_cls_result`为`True`可将文本方向分类结果保存至`{args.crop_res_save_dir}/cls_results.txt`中，其中`--crop_res_save_dir`是保存结果的目录。
+
+有关更多参数说明和用法，请查看`tools/infer/text/config.py`
+
 ## 表格结构识别
 
 要对输入图像或包含多个图像的目录运行表格结构识别，请执行

diff --git a/tools/infer/text/config.py b/tools/infer/text/config.py
@@ -233,6 +233,35 @@ def create_parser():
         help="Auto Mixed Precision level. This setting only works on GPU and Ascend",
     )
 
+    parser.add_argument(
+        "--cls_algorithm",
+        type=str,
+        default=None,
+        choices=["M3"],
+        help="classification algorithm. The default is None,"
+        "meaning that text orientation classification is not performed",
+    )
+    parser.add_argument(
+        "--cls_amp_level",
+        type=str,
+        default="O0",
+        choices=["O0", "O1", "O2", "O3"],
+        help="Auto Mixed Precision level. This setting only works on GPU and Ascend",
+    )
+    parser.add_argument(
+        "--cls_model_dir",
+        type=str,
+        help="directory containing the classification model checkpoint best.ckpt"
+        "or path to a specific checkpoint file.",
+    )
+    parser.add_argument("--cls_batch_num", type=int, default=8)
+    parser.add_argument(
+        "--save_cls_result",
+        type=str2bool,
+        default=True,
+        help="whether to save the text direction classification result",
+    )
+
     return parser
 
 

diff --git a/tools/infer/text/postprocess.py b/tools/infer/text/postprocess.py
@@ -105,6 +105,8 @@ def __init__(self, task="det", algo="DB", rec_char_dict_path=None, **kwargs):
                 merge_no_span_structure=True,
                 box_shape="pad",
             )
+        elif task == "cls":
+            postproc_cfg = dict(name="ClsPostprocess", label_list=["0", "180"])
 
         postproc_cfg.update(kwargs)
         self.task = task
@@ -172,3 +174,6 @@ def __call__(self, pred, data=None, **kwargs):
         elif self.task == "layout":
             output = self.postprocess(pred, img_shape=kwargs.get("img_shape"), meta_info=kwargs.get("meta_info"))
             return output
+        elif self.task == "cls":
+            output = self.postprocess(pred)
+            return output