AIAS/ocr_sdk at f6bd604a291458f7cdff477e8f11426ac106ba0a - AIAS

test/AIAS

mirror of https://gitee.com/mymagicpower/AIAS.git synced 2024-12-02 04:08:21 +08:00

History

Calvin f6bd604a29 update pre/post model process to adapt the new version paddleOCR.		2022-07-23 23:52:22 +08:00
..
src	update pre/post model process to adapt the new version paddleOCR.	2022-07-23 23:52:22 +08:00
pom.xml	upgrade to 0.17.0	2022-05-28 14:48:33 +08:00
README.md	update pre/post model process to adapt the new version paddleOCR.	2022-07-23 23:52:22 +08:00

README.md

文字识别（OCR）工具箱

文字识别（OCR）目前在多个行业中得到了广泛应用，比如金融行业的单据识别输入，餐饮行业中的发票识别，交通领域的车票识别，企业中各种表单识别，以及日常工作生活中常用的身份证，驾驶证，护照识别等等。 OCR（文字识别）是目前常用的一种AI能力。

OCR工具箱功能:

1. 方向检测

0度
90度
180度
270度

2. 图片旋转

3. 文字识别(提供4组模型，请看文档)

mobile模型
light模型
server模型
v3模型

模型列表（根据需要自行替换）：

  mobile模型:
    # mobile detection model URI
    检测: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_ppocr_mobile_v2.0_det_infer.zip
    # mobile recognition model URI
    识别: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_ppocr_mobile_v2.0_rec_infer.zip
  light模型:
    # light detection model URI
    检测: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_PP-OCRv2_det_infer.zip
    # light recognition model URI
    识别: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_PP-OCRv2_rec_infer.zip
  server模型:
    # server detection model URI
    检测: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_ppocr_server_v2.0_det_infer.zip
    # server recognition model URI
    识别: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_ppocr_server_v2.0_rec_infer.zip
  v3模型:
    # v3 detection model URI
    检测: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_PP-OCRv3_det_infer.zip
    # v3 recognition model URI
    识别: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ch_PP-OCRv3_rec_infer.zip

版面分析（支持5个类别, 用于配合文字识别，表格识别的流水线处理）

Text
Title
List
Table
Figure

    # 版面分析 model URI
    layout: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/ppyolov2_r50vd_dcn_365e_publaynet_infer.zip

表格识别

生成html表格
生成excel文件

    # 表格识别 model URI
    table-en: https://aias-home.oss-cn-beijing.aliyuncs.com/models/ocr_models/en_table.zip

运行OCR识别例子

1.1 文字方向检测：

例子代码: OcrDetectionExample.java
运行成功后，命令行应该看到下面的信息:

[INFO ] - Result image has been saved in: build/output/detect_result.png
[INFO ] - [
	class: "0", probability: 1.00000, bounds: [x=0.073, y=0.069, width=0.275, height=0.026]
	class: "0", probability: 1.00000, bounds: [x=0.652, y=0.158, width=0.222, height=0.040]
	class: "0", probability: 1.00000, bounds: [x=0.143, y=0.252, width=0.144, height=0.026]
	class: "0", probability: 1.00000, bounds: [x=0.628, y=0.328, width=0.168, height=0.026]
	class: "0", probability: 1.00000, bounds: [x=0.064, y=0.330, width=0.450, height=0.023]
]

输出图片效果如下：

2. 图片旋转：

每调用一次rotateImg方法，会使图片逆时针旋转90度。

例子代码: RotationExample.java
旋转前图片:
旋转后图片效果如下：

3. 文字识别：

再使用本方法前，请调用上述方法使图片文字呈水平(0度)方向。

例子代码: OcrV3RecognitionExample.java
运行成功后，命令行应该看到下面的信息:

[INFO ] - [
	class: "你", probability: -1.0e+00, bounds: [x=0.319, y=0.164, width=0.050, height=0.057]
	class: "永远都", probability: -1.0e+00, bounds: [x=0.329, y=0.349, width=0.206, height=0.044]
	class: "无法叫醒一个", probability: -1.0e+00, bounds: [x=0.328, y=0.526, width=0.461, height=0.044]
	class: "装睡的人", probability: -1.0e+00, bounds: [x=0.330, y=0.708, width=0.294, height=0.043]
]

输出图片效果如下：

4. 版面分析：

运行成功后，命令行应该看到下面的信息:

[INFO ] - [
	class: "Text", probability: 0.98750, bounds: [x=0.081, y=0.620, width=0.388, height=0.183]
	class: "Text", probability: 0.98698, bounds: [x=0.503, y=0.464, width=0.388, height=0.167]
	class: "Text", probability: 0.98333, bounds: [x=0.081, y=0.465, width=0.387, height=0.121]
	class: "Figure", probability: 0.97186, bounds: [x=0.074, y=0.091, width=0.815, height=0.304]
	class: "Table", probability: 0.96995, bounds: [x=0.506, y=0.712, width=0.382, height=0.143]
]

输出图片效果如下：

5. 表格识别：

运行成功后，命令行应该看到下面的信息:

<html>
 <body>
  <table>
   <thead>
    <tr>
     <td>Methods</td>
     <td>R</td>
     <td>P</td>
     <td>F</td>
     <td>FPS</td>
    </tr>
   </thead>
   <tbody>
    <tr>
     <td>SegLink[26]</td>
     <td>70.0</td>
     <td>86.0</td>
     <td>770</td>
     <td>89</td>
    </tr>
    <tr>
     <td>PixelLink[4j</td>
     <td>73.2</td>
     <td>83.0</td>
     <td>77.8</td>
     <td></td>
    </tr>
...
   </tbody>
  </table> 
 </body>
</html>

输出图片效果如下：
生成excel效果如下：

重要说明：

paddleOCR的文字检测识别，默认支持文字是旋转歪斜（即：支持自动文字转正）。这个功能暂时没有时间适配（所以需要保证图片是摆正的，或者使用下面的预处理工具对整个图片转正，虽然不如原算法处理的完美）。

README.md Unescape Escape