在本篇论文中,我们的目标是解决文档中的文本识别泛化与灵活性的问题。我们引入了一个利用语言中字符的重复性模型,并且分离了视觉化习得与语言模型阶段。通过这样的方法, 我们将文本识别转化为了形状匹配问题,从而实现了外观泛化与类别上的灵活性。我们在不同字母的组成的合成及真实数据集上评估了新模型,结果显示,它能够处理传统架构只有通过昂贵的再培训才能解决的问题,包括:1、它可以概括未见的新字体而不需要提供示例;2、它可以灵活地更改类别的数量,只需更改提供的示例即可;3、通过提供新的字形集,它可以概括到没有经过训练的新语言和新字符。对于所有这些情况,我们都展示了与最先进的模型相比的显著改进。
原文题目:Adaptive Text Recognition through Visual Matching
原文:In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual representation learning and linguistic modelling stages. By doing this, we turn text recognition into a shape matching problem, and thereby achieve generalization in appearance and flexibility in classes. We evaluate the new model on both synthetic and real datasets across different alphabets and show that it can handle challenges that traditional architectures are not able to solve without expensive retraining, including: (i) it can generalize to unseen fonts without new exemplars from them; (ii) it can flexibly change the number of classes, simply by changing the exemplars provided; and (iii) it can generalize to new languages and new characters that it has not been trained for by providing a new glyph set. We show significant improvements over state-of-the-art models for all these cases.
原文作者:Yue Bai, Zhiqiang Tao, Lichen Wang, Sheng Li, Yu Yin, Yun Fu
原文地址:https://arxiv.org/abs/2009.06610
原创声明,本文系作者授权云+社区发表,未经许可,不得转载。
如有侵权,请联系 yunjia_community@tencent.com 删除。
我来说两句