前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >终于遇到了一款好用的OCR产品

终于遇到了一款好用的OCR产品

原创
作者头像
liuzhen007
发布2025-01-12 11:57:07
发布2025-01-12 11:57:07
780
举报
文章被收录于专栏:流媒体音视频

前言

在当今数字化时代,各行各业都面临着海量的文件和数据处理需求。然而,传统的通用文字识别技术往往难以满足企业对高精度识别的要求,尤其是在处理复杂的票据、表单、文档和合同等文件时。这些文件常常存在版式结构复杂、中英文混排、票据混贴、印刷手写体混合、样式差异化大、英文字段解析难度大等问题。为了解决这些普遍的痛点,腾讯云推出了智能结构化识别产品,以提供更精确和高效的文件处理解决方案。

一、智能结构化

腾讯云推出的智能结构化(Smart Structure Optical Character Recognition)产品,融合了业界领先的深度学习技术、图像检测技术以及OCR大模型能力,能够实现不限版式的结构化信息抽取。无论是固定卡证还是复杂的物流单据,均可实现智能识别。该产品预学习建立键值对应关系,支持客户定制模板,提升数据提取录入效率,适用于政务、票据核销、行业表单和国际物流等场景。

1、优势特性

1.1 高精度

支持任意固定版式的卡证票据识别,各字段精度均处于业界领先水平,识别准确率达到90%以上。

1.2 泛化性

支持多种常见固定版式的数据结构化提取,如警察证、教师资格证、道路运输证等,适用于多行业场景。

1.3 易用性

用户仅需简单的几步配置,即可定制个性化结构抽取方式,快速提取数据,无需训练,实现高效录入。

2、接口调用

2.1 证书类

找一张电子证书照片,比如https://ocr-demo-1254418846.cos.ap-guangzhou.myqcloud.com/document/SmartStructuralOCR/SmartStructuralOCR4.png,调用如下API:

代码语言:txt
复制
https://aistudio.cloud.tencent.com/demo-common/democase/getDemoResultByDefaultImage

请求参数:

代码语言:txt
复制
{"IsPdf":true,"ConfigId":"General","ItemNames":[],"ImageUrl":"https://ocr-demo-1254418846.cos.ap-guangzhou.myqcloud.com/document/SmartStructuralOCR/SmartStructuralOCR4.png"}

返回结果:

代码语言:txt
复制
{
    "code": 0,
    "message": "success",
    "requestId": "0af32a80-8bd0-426e-a8d0-3b92cf8c9d1a",
    "data": {
        "Response": {
            "Angle": -0.07058890908956528,
            "RequestId": "393cc1cf-c13e-4f70-8be6-71125de79a24",
            "StructuralList": [
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "标题"
                                    },
                                    "Value": {
                                        "AutoContent": "中华人民共和国道路运输证",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 175,
                                                "Y": 145
                                            },
                                            "LeftTop": {
                                                "X": 175,
                                                "Y": 90
                                            },
                                            "RightBottom": {
                                                "X": 326,
                                                "Y": 144
                                            },
                                            "RightTop": {
                                                "X": 326,
                                                "Y": 89
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "证字号"
                                    },
                                    "Value": {
                                        "AutoContent": "浙交运管货字305943888055号",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 209,
                                                "Y": 165
                                            },
                                            "LeftTop": {
                                                "X": 209,
                                                "Y": 150
                                            },
                                            "RightBottom": {
                                                "X": 410,
                                                "Y": 164
                                            },
                                            "RightTop": {
                                                "X": 410,
                                                "Y": 149
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "业户名称"
                                    },
                                    "Value": {
                                        "AutoContent": "浙江平运物流有限公司",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 173,
                                                "Y": 219
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 205
                                            },
                                            "RightBottom": {
                                                "X": 301,
                                                "Y": 218
                                            },
                                            "RightTop": {
                                                "X": 301,
                                                "Y": 204
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "地址"
                                    },
                                    "Value": {
                                        "AutoContent": "杭州市萧山区萧绍路1128号",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 172,
                                                "Y": 242
                                            },
                                            "LeftTop": {
                                                "X": 172,
                                                "Y": 228
                                            },
                                            "RightBottom": {
                                                "X": 327,
                                                "Y": 241
                                            },
                                            "RightTop": {
                                                "X": 327,
                                                "Y": 227
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "车牌号码"
                                    },
                                    "Value": {
                                        "AutoContent": "浙A.46735",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 172,
                                                "Y": 281
                                            },
                                            "LeftTop": {
                                                "X": 172,
                                                "Y": 266
                                            },
                                            "RightBottom": {
                                                "X": 233,
                                                "Y": 281
                                            },
                                            "RightTop": {
                                                "X": 233,
                                                "Y": 266
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "经营许可证号"
                                    },
                                    "Value": {
                                        "AutoContent": "305943089235",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 172,
                                                "Y": 303
                                            },
                                            "LeftTop": {
                                                "X": 172,
                                                "Y": 290
                                            },
                                            "RightBottom": {
                                                "X": 252,
                                                "Y": 303
                                            },
                                            "RightTop": {
                                                "X": 252,
                                                "Y": 290
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "车辆类型"
                                    },
                                    "Value": {
                                        "AutoContent": "集装箱挂车",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 174,
                                                "Y": 327
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 314
                                            },
                                            "RightBottom": {
                                                "X": 238,
                                                "Y": 327
                                            },
                                            "RightTop": {
                                                "X": 237,
                                                "Y": 314
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "吨(座)位"
                                    },
                                    "Value": {
                                        "AutoContent": "33000",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 174,
                                                "Y": 350
                                            },
                                            "LeftTop": {
                                                "X": 174,
                                                "Y": 337
                                            },
                                            "RightBottom": {
                                                "X": 208,
                                                "Y": 350
                                            },
                                            "RightTop": {
                                                "X": 208,
                                                "Y": 337
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "经营范围"
                                    },
                                    "Value": {
                                        "AutoContent": "货运:普通货运、货物专用运输(集装箱)。",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 174,
                                                "Y": 373
                                            },
                                            "LeftTop": {
                                                "X": 174,
                                                "Y": 357
                                            },
                                            "RightBottom": {
                                                "X": 422,
                                                "Y": 372
                                            },
                                            "RightTop": {
                                                "X": 422,
                                                "Y": 356
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "车辆尺寸"
                                    },
                                    "Value": {
                                        "AutoContent": "12391mmX2480mmX1565mm",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 174,
                                                "Y": 423
                                            },
                                            "LeftTop": {
                                                "X": 174,
                                                "Y": 410
                                            },
                                            "RightBottom": {
                                                "X": 322,
                                                "Y": 422
                                            },
                                            "RightTop": {
                                                "X": 322,
                                                "Y": 409
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "发证日期"
                                    },
                                    "Value": {
                                        "AutoContent": "2022年11月11日",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 173,
                                                "Y": 448
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 433
                                            },
                                            "RightBottom": {
                                                "X": 263,
                                                "Y": 447
                                            },
                                            "RightTop": {
                                                "X": 263,
                                                "Y": 432
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "有效期至"
                                    },
                                    "Value": {
                                        "AutoContent": "2025年11月1日",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 173,
                                                "Y": 471
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 455
                                            },
                                            "RightBottom": {
                                                "X": 264,
                                                "Y": 470
                                            },
                                            "RightTop": {
                                                "X": 264,
                                                "Y": 454
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "核发机关"
                                    },
                                    "Value": {
                                        "AutoContent": "杭州市交通运输局",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 173,
                                                "Y": 494
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 479
                                            },
                                            "RightBottom": {
                                                "X": 314,
                                                "Y": 493
                                            },
                                            "RightTop": {
                                                "X": 314,
                                                "Y": 478
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "审验有效期至"
                                    },
                                    "Value": {
                                        "AutoContent": "2023年1月1日",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 173,
                                                "Y": 530
                                            },
                                            "LeftTop": {
                                                "X": 173,
                                                "Y": 515
                                            },
                                            "RightBottom": {
                                                "X": 264,
                                                "Y": 529
                                            },
                                            "RightTop": {
                                                "X": 264,
                                                "Y": 514
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                },
                {
                    "Groups": [
                        {
                            "Lines": [
                                {
                                    "Key": {
                                        "AutoName": "技术等级评定"
                                    },
                                    "Value": {
                                        "AutoContent": "一级2022年1月1日",
                                        "Coord": {
                                            "LeftBottom": {
                                                "X": 174,
                                                "Y": 553
                                            },
                                            "LeftTop": {
                                                "X": 174,
                                                "Y": 539
                                            },
                                            "RightBottom": {
                                                "X": 303,
                                                "Y": 552
                                            },
                                            "RightTop": {
                                                "X": 303,
                                                "Y": 538
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    ]
                }
            ],
            "WordList": []
        }
    }
}

效果展示:

由上图可以看出识别结果都是正确的,准确率挺高的。更加详细的API和SDK文档,可以参考链接:https://cloud.tencent.com/document/product/866/112179,特别接口调用出错的时候,比如提示错误信息是FailedOperation.ImageDecodeFailed,则表示对应的上传图片解码失败了,可能需要转换一下图片的格式,比如从JPEG转换成PNG试试。

2.2 文档类

这也是最让我惊艳的功能,对于经常查阅文档的学生、技术工作者、文献研究者等非常友好,很多时候网上找到的文档大多数都是pdf格式或者纯图片集合,复制粘贴相当繁琐。下面来看一个示例,原文:

识别结果:

Similarity Matching(BSM), eliminating the need for text encoder in the inference process, leading to a remarkable speedup. Moreover, it dynamically augments text em-beddings with visual modalities, enhancing the overall performance. Specifically, it brings about substantial improvements in inference speed(by 48.5%) while enhancing performance.

  1. Extensive experiments were conducted to evaluate the performance of TCM and FastTCM in different settings.We explored their utility in boosting the efficacy of exist-ing text detectors and spotters, their competence in few-shot learning, and their domain adaptation capabilities.Our thorough ablation studies offered insights into the contributions of our method in harnessing pretrained CLIP knowledge to elevate the performance of text detectors and spotters.
  2. Our method exhibited impressive adaptability across diverse tasks. The proposed FastTCM-CR50 showed their efficacy in scene text spotting and complex tasks like oriented, dense, and small object detection in aerial imagery.

3 METHODOLOGY

An overview of our approach is shown in Fig.3. In essence,we repurpose the CLIP model to serve as the backbone,utilizing the FastTCM as a bridge between the CLIP backbone and the detection/spotter heads.

3.1 Prerequisite: CLIP Model

The CLIP model[1] has demonstrated substantial potential in the realm of learning transferable knowledge and open-set visual concepts, given its capacity to analyze 400 million unannotated image-text pairs during its pretraining phase.Prior research[53] reveals that CLIP's individual neurons are adept at capturing concepts in literal, symbolic, and conceptual manners, which serves as an innately text-friendly model, capable of effectively mapping the space between image and text[54]. During its training phase, CLIP learns a joint embedding space for two modalities through a contrastive loss. Given a batch of image-text pairs, the model maximizes the cosine similarity with matching text and minimizes the similarity with all other unmatched text for each image.The same process applies to each piece of text,which has allowed CLIP to be utilized for zero-shot image recognition[2]. However, leveraging the valuable insights generated by such a model presents two prerequisites.First, an effective method is required to access the prior knowledge stored within the CLIP model. Second, while the original model is designed to measure the similarity between a complete image and a single word or sentence,scene text detection and spotting usually involve numerous text instances per image, all of which need to be equivalently recalled.

3.2 FastTCM

FastTCM, designed to enhance the CLIP model, serves as a robust foundation for boosting existing scene text detectors and spotters. It achieves this by extracting both image and text embeddings from CLIP's image and text encoders,

respectively. The first step in the process is designing a cross-modal interaction mechanism. We do this via visual prompt learning which restores the locality feature from CLIP's image encoder.The enhanced locality feature allows for capturing fine-grained data to effectively respond to a more general text region, setting the stage for subsequent matches between text instances and language. Next, to better channel pre-trained knowledge, we build a language prompt generator.This generator produces a contextual cue for each image.For the efficient extraction of interactions between the image and text encoder, all while enabling faster inferences,we use a method called Bimodal Similarity matching.This method allows for the offline computation of inferences using the CLIP text encoder, along with the dynamic generation of language prompts that are based on the conditions of the image.Finally,an instance-language matching technique is employed to align the image and text embeddings.This encourages the image encoder to meticulously refine text regions from the cross-modal visual-language priors.

Figure 3-The overall framework of our approach.

Figure 4-The details of the FastTCM. The image encoder and text encoder are directly from the CLIP model. The red dashed arrows represent training-only operators, with the correspond-ing upstream calculation procedure performed offline during the inference stage.

3.2.1 Image Encoder

We use the pretrained ResNet50[55] of CLIP as the image encoder, which produces an embedding vector for every input pixel. Given the input image I′∈RH×W×3I′∈RH×W×3 , image encoder outputs image embedding I∈RH×W×CIRH×W×C , where Hˉ=Hˉs,Wˉ=WˉsHˉ=sHˉ​,Wˉ=sWˉ​ , and C is the image embedding dimension(C is set to 1024) and s is the downsampling ratio(s is empirically set to 32), which can be expressed as:

I= ImageEncoder(I′).(1)I= ImageEncoder(I′).(1)

3.2.2 Text Encoder

The text encoder takes input a number of of K classes prompt and embeds it into a continuous vector space RCRC , producing

以上是识别结果,不仅文字正确,甚至连数学公式和图片都识别出来了。是不是被震惊到了,而且我们稍微改造一下,就能支持整篇pdf文档的识别。

结尾

总的来说,腾讯云的智能结构化产品在大模型的基础上,考虑到图文识别的实际需求,衍生出了多种类型的图文识别小模型,更加专业和高效。目前看缺点这块的话,就是官网提到的准确率还没有达到100%,但是99.9%还是可以精益求精一下的。希望这项技术能够助力行业衍生出更多更好的OCR产品。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • 一、智能结构化
    • 1、优势特性
      • 1.1 高精度
      • 1.2 泛化性
      • 1.3 易用性
    • 2、接口调用
      • 2.1 证书类
      • 2.2 文档类
      • 3 METHODOLOGY
      • 3.1 Prerequisite: CLIP Model
      • 3.2 FastTCM
      • 3.2.1 Image Encoder
      • 3.2.2 Text Encoder
  • 结尾
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档