diff --git a/examples/license_plate_detection_and_recognition/README.md b/examples/license_plate_detection_and_recognition/README.md index f2cd57c77..748ee7060 100644 --- a/examples/license_plate_detection_and_recognition/README.md +++ b/examples/license_plate_detection_and_recognition/README.md @@ -4,9 +4,9 @@ English | [中文](./README_CN.md) # Dataset processing -## Dataset introduction +## Introduction to CCPD -Due to the lack of publicly available large and diverse datasets, most current license plate detection and recognition methods are evaluated on small and often unrepresentative datasets. This paper propose a large and comprehensive license plate dataset, CCPD, where all images are manually captured and carefully annotated by workers from a roadside parking management company. CCPD is the largest publicly available license plate dataset to date, with more than 250,000 unique car images, and the only dataset that provides vertex position annotations. +Due to the lack of publicly available large and diverse datasets, most current license plate detection and recognition methods are evaluated on small and often unrepresentative datasets. This paper propose a large and comprehensive license plate dataset, CCPD (Chinese City Parking Dataset), all images of which are manually captured and carefully annotated by workers from a roadside parking management company. CCPD is the largest publicly available license plate dataset to date, with more than 250,000 unique car images, and the only dataset that provides vertex position annotations. Paper: [Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline](https://openaccess.thecvf.com/content_ECCV_2018/papers/Zhenbo_Xu_Towards_End-to-End_License_ECCV_2018_paper.pdf) @@ -14,15 +14,13 @@ Code repository: [https://github.com/detectRecog/CCPD](https://github.com/detect ## Dataset download -Download the dataset following the instructions on the [CCPD official project website](https://github.com/detectRecog/CCPD): - -Unzip the dataset into the CCPD_Tutorial/datasets directory: - -```txt -Decompression Command: +Download the dataset following the instructions on the [CCPD official project website](https://github.com/detectRecog/CCPD), then unzip the dataset into the CCPD_Tutorial/datasets directory: +```shell tar xf CCPD2019.tar.xz +``` -Directory Structure: +The directory structure are as follows: +```text CCPD_Tutorial └── datasets └── CCPD2019 # Number of Images Description @@ -44,19 +42,19 @@ CCPD_Tutorial The CCPD dataset does not have a dedicated annotation file. The file name of each image is the corresponding data label. -For example: 025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg is divided into seven parts by the separator '-': +For example, `025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg` is divided into seven parts by the separator '-': -* **Area: ​**The ratio of the license plate area to the entire image area. 025 means 2.5%. -* **Tilt: ​**Horizontal tilt and vertical tilt. 95_113 corresponds to two angles, horizontal 95° and vertical 113°. -* **Bounding box coordinates:** The coordinates of the upper left and lower right vertices. 154&383_386&473 correspond to the bounding box coordinates: upper left (154, 383), lower right (386, 473). -* **Four vertex positions:** The exact (x, y) coordinates of the four vertices of the LP in the entire image. These coordinates start from the vertex in the lower right corner. 386&473_177&454_154&383_363&402 correspond to the coordinates of the four corner points. -* **License plate number:** There is only one LP for each image in CCPD. Each LP number consists of a Chinese character, a letter, and five letters or numbers. A valid Chinese license plate consists of 7 characters: province (1 character), letter (1 character), letter + number (5 characters). "0_0_22_27_27_33_16" is the index of each character. The three arrays are defined as follows. The last character of each array is the letter O, not the number 0. We use O as a sign of "no character" because there is no O in the Chinese license plate characters. -* **Brightness: ​**The brightness of the license plate area. 37 indicates brightness. -* **Blur:** The blurriness of the license plate area. 15 indicates blurriness. +* **Area: ​**The ratio of the license plate area to the entire image area. `025` means 2.5%. +* **Tilt: ​**Horizontal tilt and vertical tilt. `95_113` corresponds to two angles, horizontal 95° and vertical 113°. +* **Bounding box coordinates:** The coordinates of the upper left and lower right vertices. `154&383_386&473` corresponds to the bounding box coordinates: upper left (154, 383), lower right (386, 473). +* **Four vertex positions:** The exact (x, y) coordinates of the four vertices of the LP (License Plate) in the entire image. These coordinates start from the vertex in the lower right corner. `386&473_177&454_154&383_363&402` correspond to the coordinates of the four corner points. +* **License plate number:** There is only one LP in each image of CCPD. Each LP number consists of a Chinese character, a letter, and five letters or numbers. A valid Chinese license plate consists of 7 characters: province (1 character), letter (1 character), letter + number (5 characters). "0_0_22_27_27_33_16" is the index of each character. The three arrays are defined as follows. The last character of each array is the letter O, not the number 0. We use O as a sign of "no character" because there is no O in the Chinese license plate characters. +* **Brightness: ​**The brightness of the license plate area. `37` indicates brightness. +* **Blur:** The blurriness of the license plate area. `15` indicates blurriness. ## Map license plate character to array -```txt +```python provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"] alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O'] @@ -73,21 +71,22 @@ Split the ccpd_base dataset into training, testing, and validation datasets acco ## Requirements ### Ascend + |mindspore|ascend driver|firmware|cann toolkit/kernel| | :---------: | :-------------: | :-----------: | :-------------------: | |2.2.14|23.0.3|7.1.0.5.220|7.0.0.beta1| ### GPU -|mindspore|gpu driver|cuda version|firmware| -| :---------: | :----------: | :------------: | :----------------: | -|2.2.14|535.183.06|cuda11.6|RTX 4090| +|mindspore|gpu driver|cuda version| gpu type | +| :---------: | :----------: | :------------: |:----------------:| +|2.2.14|535.183.06|cuda11.6| GeForce RTX 4090 | ## Installation steps ### Installation environment dependencies -1. Creating a Virtual Environment with Conda +1. Creating a Python Virtual Environment with Conda ```shell conda create -n mindspore2.2.14_mindocr python=3.9 @@ -97,13 +96,13 @@ conda create -n mindspore2.2.14_mindocr python=3.9 According to the guidelines on the [MindSpore official website](https://www.mindspore.cn/install/), install MindSpore version 2.2.14 along with the corresponding GPU or Ascend AI processor software packages. -3. [Install openmpi 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/) (For distributed training and evaluation, if distributed training is not required, you can skip it) +3. [Install Open MPI 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/) (For distributed training and evaluation, you can skip it if distributed training is not required) -Find version 4.0.3 on the openmpi official website, download the tar.gz file and unzip it to the project-related folder: +Download Open MPI v4.0.3 from the official website, and then unzip the tar.gz file to the project-related folder: ​![image](pic/install_openmpi.png)​ -Unzip the Openmpi source package: +Unzip the Open MPI source package: ```shell tar -xvf openmpi-4.0.3.tar.gz @@ -122,9 +121,15 @@ Configure environment variables: ```shell vim /etc/profile +``` + +```text ## OpenMPI ## export PATH=$PATH:/installation_directory/openmpi/bin export LD_LIBRARY_PATH=/installation_directory/openmpi/lib +``` + +```shell source /etc/profile ``` @@ -155,7 +160,7 @@ pip install -r requirements.txt pip install -e . ``` -# Training DBNet model for text detection +# Training [DBNet](https://github.com/mindspore-lab/mindocr/blob/main/configs/det/dbnet/README.md) model for text detection ## Dataset preparation @@ -223,7 +228,7 @@ dataset: 5. Change the value of IOU from 0.5(default) to 0.7. -Location of code:./mindocr/metrics/det_metrics.py 33 +Location of code:./mindocr/metrics/det_metrics.py:L33 ```python ... @@ -236,7 +241,7 @@ def __init__(self, min_iou: float = 0.7, min_intersect: float = 0.5): ## Train ```shell -# Single Ascend/GPU Training (May Cause Memory Issues) +# Single Ascend/GPU Training (May Failed Due to Insufficient GPU/NPU on-Device Memory) python tools/train.py --config configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascend/GPU # Multi-Ascend/GPU Training (Requires Correct OpenMPI Installation and Root Privileges) mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascend/GPU @@ -258,8 +263,6 @@ Validation set test results: ## Inference -### Inference command - ```shell python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/image_dir \ --det_algorithm DB \ @@ -271,7 +274,7 @@ python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/ima ​![1_det_res](pic/det.png)​ -# Training [SVTR model](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/README_CN.md) for text recognition +# Training [SVTR](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/README.md) model for text recognition ## [Dataset processing](https://github.com/mindspore-lab/mindocr/blob/main/docs/zh/tutorials/training_recognition_custom_dataset.md) @@ -288,7 +291,7 @@ Please place all training images in the same folder and specify a text file in t **Note:** Use `\tab`​ as the separator between the image name and label, avoiding spaces or other separators. -The final structure of the training dataset will be as follows: +Finally, directory structure of the training dataset will be as follows: ```txt |-data @@ -304,9 +307,9 @@ The preparation method for the testing and validation datasets is the same. ## Dictionary preparation -Run the code in `generate_dict.py`​ with the following character set to generate the dictionary `ccpd_dict.txt`​, and place it in the `mindocr/utils/dict`​ directory. +Run the code in `generate_dict.py`​ with the following character set to generate the dictionary file `ccpd_dict.txt`​. Then place the dictionary file in the `mindocr/utils/dict`​ directory. -```txt +```python provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"] alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O'] @@ -314,7 +317,7 @@ ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q' 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O'] ``` -## Configuration file preparation(Refer to the complete configuration file in `svtr_ccpd.yaml`​) +## Prepare the configuration file (refer to the complete configuration file in `svtr_ccpd.yaml`​) 1. Copy the file `mindocr/configs/rec/svtr/svtr_tiny_ch.yaml`​ to a new file. 2. Modify the following parameters in the new configuration file: @@ -349,7 +352,7 @@ eval: ... ``` -4. Add `lower: false`​ to the metric​ section​: +4. Set `lower` under the `metric` section to `false`: ```yaml metric: @@ -377,11 +380,11 @@ metric: python tools/train.py --config configs/rec/svtr/svtr_tiny_ccpd.yaml --device_target Ascend/GPU ``` -### Training strategy +### Modify training configurations 1. **Modify the Configuration File**: Change `loss`​ section's `pred_seq_len`​ to 10. -```java +```text valid res: [2024-09-10 15:16:38] mindocr.metrics.rec_metrics INFO - correct num: 23, total num: 99996.0 [2024-09-10 15:16:38] mindocr.eval INFO - Performance: {'acc': 0.00023000920191407204, 'norm_edit_distance': 0.5451045036315918} @@ -389,7 +392,7 @@ valid res: 2. **Adjust image_shape**​: Set the 'img_size' of the 'model' section to [32, 80]. -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [01:40<00:00, 19.07it/s] [2024-09-10 19:14:02] mindocr.metrics.rec_metrics INFO - correct num: 6940, total num: 99996.0 @@ -398,7 +401,7 @@ valid res: 3. **Resize Strategy**: `Resize`​ all text images to `32 * 100`​ without considering the aspect ratio and without padding; set `max_text_length`​ to 25. -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [01:59<00:00, 16.05it/s] [2024-09-10 19:16:59] mindocr.metrics.rec_metrics INFO - correct num: 98681, total num: 99996.0 @@ -407,7 +410,7 @@ valid res: 4. **Modify the Base YAML File**: Change to `svtr_tiny.yaml`​ and add the STN module. -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:02<00:00, 6.36it/s] [2024-09-10 23:01:26] mindocr.metrics.rec_metrics INFO - correct num: 97956, total num: 99996.0 @@ -416,7 +419,7 @@ valid res: 5. **Increase the intensity of data augmentation**​: Set the `aug_type`​ in the 'SVTRRecAug' section of the configuration file to 1​. -```java +```text valid res: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:55<00:00, 5.42it/s] [2024-09-11 17:08:48] mindocr.metrics.rec_metrics INFO - correct num: 96064, total num: 99996.0 @@ -425,7 +428,7 @@ valid res: 6. **Increase the intensity of data augmentation**: Set `deterioration_p`​ and `colorjitter_p`​ to 0.5 in the `SVTRRecAug` section of the configuration file​. -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:40<00:00, 5.65it/s] [2024-09-11 20:12:32] mindocr.metrics.rec_metrics INFO - correct num: 97973, total num: 99996.0 @@ -444,13 +447,9 @@ python tools/eval.py --config configs/rec/svtr/svtr_tiny_ccpd.yaml --device_targ ## Inference -### Code modification - -Modify the file `/mindocr/tools/infer/text/predict_rec.py`​: +### Modify inference codes -1. Locate the `algo_to_model_name`​ mapping. - -2. Change the module corresponding to SVTR to `svtr_ccpd`​. +Locate the `algo_to_model_name` dict in the file `/mindocr/tools/infer/text/predict_rec.py`, and then modify the module corresponding to `SVTR` to `svtr_ccpd`​. ```python algo_to_model_name = { @@ -525,7 +524,7 @@ def svtr_ccpd(pretrained=False, **kwargs): return model ``` -### Inference command +### Execute inference command ```shell python tools/infer/text/predict_rec.py --image_dir path/to/image_path \ @@ -546,9 +545,7 @@ python tools/infer/text/predict_rec.py --image_dir path/to/image_path \ ​![image](pic/rec_res.png)​ -# Joint inference of DBNet and SVTR - -**Inference Commands: ​** +# DBNet and SVTR joint inference ```shell python tools/infer/text/predict_system.py --image_dir path/to/image_path or image_dir \ @@ -565,20 +562,20 @@ python tools/infer/text/predict_system.py --image_dir path/to/image_path or ima ​![image](pic/det_rec_res.png)​ -Visualizing Results: +**Visualized Result**: ​![1_res](pic/det_res.png)​ # Performance -Experiments are tested on ascend 910* with mindspore 2.2.14 graph mode : +Test results on Ascend 910* with MindSpore 2.2.14 graph mode : |model name|cards|batch size|resolution|jit level|graph compile|s/step|img/s| | :----------: | :-----: | :----------: | :----------: | :---------: | :-------------: | :------: | :------: | |dbnet|1|16|640x640|O0|43.50s|0.26|61.59| |svtr|1|256|64x256|O2|202.20s|0.77|331.70| -Experiments are tested on GeForce RTX 4090 with mindspore 2.2.14 graph mode : +Test results on GeForce RTX 4090 with MindSpore 2.2.14 graph mode : |model name|cards|batch size|resolution|jit level|graph compile|s/step|img/s| | :----------: | :-----: | :----------: | :----------: | :---------: | :-------------: | :------: | :------: | diff --git a/examples/license_plate_detection_and_recognition/README_CN.md b/examples/license_plate_detection_and_recognition/README_CN.md index 70a7336b0..38347063d 100644 --- a/examples/license_plate_detection_and_recognition/README_CN.md +++ b/examples/license_plate_detection_and_recognition/README_CN.md @@ -4,9 +4,9 @@ # 数据集处理 -## 数据集介绍 +## CCPD数据集介绍 -由于没有公开可用的大型多样化数据集,当前大多数的车牌检测和识别方法都是在一些小且通常不具代表性的数据集进行评估。本文提出了一种大型且全面的车牌数据集CCPD,该数据集的所有图像都是由路边停车管理公司的工人手工拍摄并仔细标注的。CCPD是迄今为止最大的公开可用车牌数据集,拥有超过25万张独特的汽车图像,并且是唯一提供顶点位置标注的数据集。本文基于CCPD提出了一种新颖的网络模型,可以同时以高速和高精度预测边界框并识别相应的车牌号码。 +由于没有公开可用的大型多样化数据集,当前大多数的车牌检测和识别方法都是在一些小且通常不具代表性的数据集进行评估。本文提出了一种大型且全面的车牌数据集CCPD (Chinese City Parking Dataset),该数据集的所有图像都是由路边停车管理公司的工人手工拍摄并仔细标注的。CCPD是迄今为止最大的公开可用车牌数据集,拥有超过25万张独特的汽车图像,并且是唯一提供顶点位置标注的数据集。本文基于CCPD提出了一种新颖的网络模型,可以同时以高速和高精度预测边界框并识别相应的车牌号码。 论文:[Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline](https://openaccess.thecvf.com/content_ECCV_2018/papers/Zhenbo_Xu_Towards_End-to-End_License_ECCV_2018_paper.pdf) @@ -14,14 +14,13 @@ ## 数据集下载 -按照[CCPD官方项目网址](https://github.com/detectRecog/CCPD)的指引,下载数据集: - -解压数据集到CCPD_Tutorial/datasets目录下: - -```txt -解压命令: +按照[CCPD官方项目](https://github.com/detectRecog/CCPD)网页内的指引,下载数据集。然后,解压数据集到CCPD_Tutorial/datasets目录下: +```shell tar xf CCPD2019.tar.xz -目录结构: +``` + +解压后的目录结构: +```text CCPD_Tutorial └── datasets └── CCPD2019 # 图片数 说明 @@ -43,19 +42,19 @@ CCPD_Tutorial CCPD数据集没有专门的标注文件,每张图像的文件名就是对应的数据标注(label)。 -例如 : 025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg由分隔符'-'分为七个部分: +例如:`025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg`由分隔符'-'分为七个部分: -1. 面积 : 车牌面积与整个画面面积的面积比。025表示占比2.5%。 -2. 倾斜度 : 水平倾斜度和垂直倾斜度。95_113 对应两个角度, 水平95°, 竖直113°。 -3. 边界框坐标 : 左上和右下顶点的坐标。154&383_386&473对应边界框坐标:左上(154, 383), 右下(386, 473)。 -4. 四个顶点位置 : LP的四个顶点在整幅图像中的确切(x, y)坐标。这些坐标从右下角的顶点开始。386&473_177&454_154&383_363&402对应四个角点坐标。 -5. 车牌号码 : CCPD中的每个图像只有一个LP。每个LP号码由一个汉字、一个字母和五个字母或数字组成。有效的中国车牌由省份(1个字符)、字母(1个字符)、字母+数字(5个字符)7个字符组成。"0_0_22_27_27_33_16"是每个字符的索引。这三个数组的定义如下。每个数组的最后一个字符是字母O,而不是数字0。我们用O作为“无字符”的标志,因为中国车牌字符中没有O。 -6. 亮度:车牌区域的亮度。37表示亮度。 -7. 模糊性:车牌区域的模糊性。15表示模糊度。 +1. 面积:车牌面积与整个画面面积的面积比。`025`表示占比2.5%。 +2. 倾斜度:水平倾斜度和垂直倾斜度。`95_113` 对应两个角度, 水平95°, 竖直113°。 +3. 边界框坐标:左上和右下顶点的坐标。`154&383_386&473`对应边界框坐标,即左上(154, 383)、右下(386, 473)。 +4. 四个顶点位置:车牌区域的四个顶点在整幅图像中的确切(x, y)坐标。这些坐标从右下角的顶点开始。`386&473_177&454_154&383_363&402`对应四个角点坐标。 +5. 车牌号码:CCPD的每个图像样本中只有一个车牌。每个车牌号码由一个汉字、一个字母和五个字母或数字组成。有效的中国车牌由省份(1个字符)、字母(1个字符)、字母+数字(5个字符)7个字符组成。`0_0_22_27_27_33_16`是每个字符的索引。这三个数组的定义如下。每个数组的最后一个字符是字母O,而不是数字0。我们用O作为“无字符”的标志,因为中国车牌字符中没有O。 +6. 亮度:车牌区域的亮度。`37`表示亮度。 +7. 模糊性:车牌区域的模糊性。`15`表示模糊度。 ### 车牌字符映射数组 -```txt +```python provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"] alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O'] @@ -65,7 +64,7 @@ ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q' ## 数据集分割 -根据spilt文件夹中的train.txt、test.txt和val.txt对ccpd_base数据集进行分割,分割出训练数据集、测试数据集和验证数据集。分割代码见spilt.py。 +根据spilt文件夹中的`train.txt`、`test.txt`和`val.txt`,将ccpd_base数据集分割成训练数据集、测试数据集和验证数据集。分割代码参考`spilt.py`。 # [MindOCR环境安装](https://github.com/mindspore-lab/mindocr) @@ -79,17 +78,17 @@ ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q' ### GPU -|mindspore|gpu driver|cuda version|firmware| -| :---------: | :----------: | :------------: | :----------------: | -|2.2.14|535.183.06|cuda11.6| RTX 4090| +|mindspore|gpu driver|cuda version| gpu type | +| :---------: | :----------: | :------------: |:--------:| +|2.2.14|535.183.06|cuda11.6| GeForce RTX 4090 | ## 安装步骤 ### 安装环境依赖 -1. conda创建虚拟环境 +1. conda创建Python虚拟环境: -```txt +```shell conda create -n mindspore2.2.14_mindocr python=3.9 ``` @@ -97,21 +96,21 @@ conda create -n mindspore2.2.14_mindocr python=3.9 按照[MindSpore官网](https://www.mindspore.cn/install/)指引,安装MindSpore 2.2.14版本及其配套的GPU或昇腾AI处理器配套软件包。 -3. [安装openmpi 4.0.3](https://www.open-mpi.org/software/ompi/v4.0/) (for distributed training/evaluation)【为了分布式的训练和评估,如不需要分布式训练,可跳过】 +3. [安装Open MPI v4.0.3](https://www.open-mpi.org/software/ompi/v4.0/) (for distributed training/evaluation)【用于模型分布式训练和评估,如不需要分布式训练,可跳过】 -在openmpi下载官网找到4.0.3版本,下载tar.gz文件解压到项目相关文件夹 +从Open MPI官网下载v4.0.3版本的tar.gz文件,并将其解压到项目相关文件夹 ​![image](pic/install_openmpi.png)​ -解压Openmpi源码包 +解压Open MPI源码包 -```text +```shell tar -xvf openmpi-4.0.3.tar.gz ``` -安装OpenMPI、进入源码根目录,运行配置文件,开始安装: +进入源码根目录,运行配置文件执行Open MPI安装: -```text +```shell cd openmpi-4.0.0/ ./configure --prefix=/安装目录/openmpi make @@ -120,7 +119,7 @@ make install 配置环境变量 -```text +```shell vim /etc/profile ``` @@ -130,13 +129,13 @@ export PATH=$PATH:/安装目录/openmpi/bin export LD_LIBRARY_PAHT=/安装目录/openmpii/lib ``` -```text +```shell source /etc/profile ``` 测试 -```text +```shell cd /安装目录/openmpi/examples make ./hello_c @@ -153,7 +152,7 @@ make |0.3|2.2.10| |0.1|1.8| -```txt +```shell git clone https://github.com/mindspore-lab/mindocr.git git checkout v0.3.2 cd mindocr @@ -161,14 +160,15 @@ pip install -r requirements.txt pip install -e . ``` -# 训练DBNet模型做文本检测 +# 训练[DBNet](https://github.com/mindspore-lab/mindocr/blob/main/configs/det/dbnet/README_CN.md)文本检测模型 ## 数据集准备 -1. 将ccpd_train/ccpd_test/ccpd_val数据集分别放到train/test/val下的images中 +1. 将ccpd_train、ccpd_test、ccpd_val数据集分别置于train、test、val路径下的images文件夹中 + 2. 运行[mindocr提供的脚本](https://github.com/mindspore-lab/mindocr/blob/main/docs/zh/datasets/ccpd.md)转换数据标注格式 -```txt +```shell python tools/dataset_converters/convert.py \ --dataset_name ccpd --task det \ --image_dir path/to/CCPD2019/ccpd_base \ @@ -204,7 +204,7 @@ python tools/dataset_converters/convert.py \ 1. 在mindocr/configs/det/dbnet下创建db_r50_ccpd.yaml文件 2. 复制db_r50_ctw1500.ymal文件的内容到db_r50_ccpd.yaml文件 -3. 修改`postprocess`​下的`box_type`​和`box_thresh`​分别为`quad`​和`0.7`​ +3. 将`postprocess`​下的`box_type`​和`box_thresh`​分别修改为`quad`​和`0.7`​ ```yaml postprocess: @@ -228,7 +228,7 @@ dataset: 5. 默认测试的IOU为0.5,修改为0.7 -代码位置:./mindocr/metrics/det_metrics.py 33 +代码位置:./mindocr/metrics/det_metrics.py:L33 ```python ... @@ -240,8 +240,8 @@ def __init__(self, min_iou: float = 0.7, min_intersect: float = 0.5): ## 训练 -```txt -# 单卡训练,容易爆内存 +```shell +# 单卡训练,可能因GPU或NPU片上存储不足失败 python tools/train.py --config configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascend/GPU # 多卡训练,需要正确安装opemmpi和使用root权限 mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascend/GPU @@ -249,7 +249,7 @@ mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/dbnet ## 测试 -```txt +```shell python tools/eval.py -c=configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascend/GPU ``` @@ -263,8 +263,6 @@ python tools/eval.py -c=configs/det/dbnet/db_r50_ccpd.yaml --device_target Ascen ## 推理 -### 推理命令 - ```shell python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/image_dir \ --det_algorithm DB \ @@ -276,7 +274,7 @@ python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/ima ​![1_det_res](pic/det.png)​ -# 训练[SVTR模型](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/README_CN.md)做文本识别 +# 训练[SVTR](https://github.com/mindspore-lab/mindocr/blob/main/configs/rec/svtr/README_CN.md)文本识别模型 ## [数据集处理](https://github.com/mindspore-lab/mindocr/blob/main/docs/zh/tutorials/training_recognition_custom_dataset.md) @@ -293,7 +291,7 @@ python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/ima *注意*:请将图片名和标签以 \tab 作为分隔,避免使用空格或其他分隔符。 -最终训练集存放会是以下形式: +最终训练集将以以下形式存放: ``` |-data @@ -305,13 +303,13 @@ python tools/infer/text/predict_det.py --image_dir path/to/image or path/to/ima | ... ``` -测试集和验证集的准备方式同理。 +测试集和验证集的准备采用类似方式。 ## 字典准备 -根据以下字符集运行代码generate_dict.py生成字典ccpd_dict.txt,并将其放到mindocr/utils/dict目录下。 +根据以下字符集运行代码generate_dict.py,生成字典文件ccpd_dict.txt,并将其置于mindocr/utils/dict目录下。 -```txt +```python provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"] alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O'] @@ -319,7 +317,7 @@ ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q' 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O'] ``` -## 配置文件准备(完整配置文件见svtr_ccpd.yaml) +## 准备配置文件(完整配置文件见svtr_ccpd.yaml) 1. 复制一份mindocr/configs/rec/svtr/svtr_tiny_ch.yaml文件,并对其进行修改 2. 修改字典配置`character_dict_path`​和字符种类数量`num_classes`​,以及最大字符长度`max_text_len`​的值,如下所示: @@ -354,7 +352,7 @@ eval: ... ``` -4. ​`metric`​中添加`lower`​为`false`​ +4. ​`metric`​中设置`lower`​为`false`​ ```yaml metric: @@ -378,15 +376,15 @@ metric: ## 训练 -```txt +```shell python tools/train.py --config configs/rec/svtr/svtr_tiny_ccpd.yaml --device_target Ascend/GPU ``` -### 训练策略 +### 修改训练配置 1. 修改配置文件`loss`​部分的`pred_seq_len`​为10 -```java +```text valid res: [2024-09-10 15:16:38] mindocr.metrics.rec_metrics INFO - correct num: 23, total num: 99996.0 [2024-09-10 15:16:38] mindocr.eval INFO - Performance: {'acc': 0.00023000920191407204, 'norm_edit_distance': 0.5451045036315918} @@ -394,16 +392,16 @@ valid res: 3. 修改配置文件`model`部分的`img_size`为[32,80] -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [01:40<00:00, 19.07it/s] [2024-09-10 19:14:02] mindocr.metrics.rec_metrics INFO - correct num: 6940, total num: 99996.0 [2024-09-10 19:14:02] mindocr.eval INFO - Performance: {'acc': 0.069402776658535, 'norm_edit_distance': 0.765773355960846} ``` -4. Resize策略: 直接将所有文本图像`resize`​到`32 * 100`​,`Resize`​时不使用`Padding`​;`max_text_length`设置为25​; +4. Resize策略:直接将所有文本图像`resize`​到`32 * 100`​,`Resize`​时不使用`Padding`​;`max_text_length`设置为25​; -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [01:59<00:00, 16.05it/s] [2024-09-10 19:16:59] mindocr.metrics.rec_metrics INFO - correct num: 98681, total num: 99996.0 @@ -412,16 +410,16 @@ valid res: 5. 修改基础yaml文件为`svtr_tiny.yaml`​,增加`STN`​模块 -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:02<00:00, 6.36it/s] [2024-09-10 23:01:26] mindocr.metrics.rec_metrics INFO - correct num: 97956, total num: 99996.0 [2024-09-10 23:01:26] mindocr.eval INFO - Performance: {'acc': 0.9795991778373718, 'norm_edit_distance': 0.995379626750946} ``` -6. 增加数据增强强度:将​配置文件`SVTRRecAug`部分​的`aug_type`修改为1 +6. 增加数据增强强度:将​配置文件`SVTRRecAug`部分​的`aug_type`修改为1 -```java +```text valid res: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:55<00:00, 5.42it/s] [2024-09-11 17:08:48] mindocr.metrics.rec_metrics INFO - correct num: 96064, total num: 99996.0 @@ -430,7 +428,7 @@ valid res: 7. 增加数据增强强度:在​配置文件`SVTRRecAug`部分​增加`deterioration_p: 0.5`​、`colorjitter_p: 0.5`​ -```java +```text valid res: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1923/1923 [05:40<00:00, 5.65it/s] [2024-09-11 20:12:32] mindocr.metrics.rec_metrics INFO - correct num: 97973, total num: 99996.0 @@ -439,7 +437,7 @@ valid res: ## 测试 -```txt +```shell python tools/eval.py --config configs/rec/svtr/svtr_tiny_ccpd.yaml --device_target Ascend/GPU ``` @@ -447,9 +445,9 @@ python tools/eval.py --config configs/rec/svtr/svtr_tiny_ccpd.yaml --device_targ * ​`acc`​: 97.12% -## 模型推理 +## 推理 -### 代码修改 +### 修改推理代码 修改`/mindocr/tools/infer/text/predict_rec.py`​中的`algo_to_model_name`​,将`SVTR`​对应的模块修改为`svtr_ccpd`​ @@ -526,9 +524,9 @@ def svtr_ccpd(pretrained=False, **kwargs): return model ``` -### 推理命令 +### 执行推理命令 -```txt +```shell python tools/infer/text/predict_rec.py --image_dir path/to/image_path \ --rec_algorithm SVTR \ --rec_image_shape "3,32,100" \ @@ -549,7 +547,7 @@ python tools/infer/text/predict_rec.py --image_dir path/to/image_path \ # 联合DBNet和SVTR推理 -```txt +```shell python tools/infer/text/predict_system.py --image_dir path/to/image_path or image_dir \ --det_algorithm DB \ --det_model_dir path/to/dbnet/best.ckpt \ @@ -560,19 +558,23 @@ python tools/infer/text/predict_system.py --image_dir path/to/image_path or ima --rec_image_shape "3,64,256" --max_text_length 24 --rec_amp_level O2 --visualize_output true ``` +**Output:** + ​![image](pic/det_rec_res.png)​ +**Visualizing Results**: + ​![1_res](pic/det_res.png)​ # 性能表现 -实验在 ascend 910* 上使用 MindSpore 2.2.14 的图模式进行测试: +在 Ascend 910* 上使用 MindSpore 2.2.14 的图模式进行实验测试: |model name|cards|batch size|resolution|jit level|graph compile|s/step|img/s| | :----------: | :-----: | :----------: | :----------: | :---------: | :-------------: | :------: | :------: | |dbnet|1|16|640x640|O0|43.50s|0.26|61.59| |svtr|1|256|64x256|O2|202.20s|0.77|331.70| -实验在 GPU 上使用 MindSpore 2.2.14 的图模式进行测试: +在 GeForce RTX 4090 上使用 MindSpore 2.2.14 的图模式进行实验测试: |model name|cards|batch size|resolution|jit level|graph compile|s/step|img/s| | :----------: | :-----: | :----------: | :----------: | :---------: | :-------------: | :------: | :------: |