首页> 外文会议>International Conference on Frontiers in Handwriting Recognition >Zero-Shot Learning Based Approach For Medieval Word Recognition using Deep-Learned Features
【24h】

Zero-Shot Learning Based Approach For Medieval Word Recognition using Deep-Learned Features

机译:基于深度学习功能的中世纪词识别的零射击基于探测方法

获取原文

摘要

Historical manuscripts reflect our past. Recently digitization of large quantities of historical handwritten documents is taking place in every corner of the world, and are being archived. From those digital repositories, automatic text indexing and retrieval system fetch only those documents to an end user that they are interested in. A regular OCR technology is not capable of rendering this service to an end user in a reliable manner. Instead, a word recognition/spotting algorithm performs the task. Word recognition based systems require enough labelled data per class to train the system. Moreover, all word classes need to be taught beforehand. Though word spotting could evade this drawback of prior training, these systems often need to have additional overheads like a language model to deal with "out of lexicon" words. Zero-shot learning could be a possible alternative to counter such situation. A Zero-shot learning algorithm is capable of handling unseen classes, provided the algorithm has been fortified with rich discriminating features and reliable "attribute description" per class during training. Since deeply learned features have enough discriminating power, a deep learning framework has been used here for feature extraction purpose. To the best of our knowledge, this is probably the first work on "out of lexicon" medieval word recognition using a Zero-Shot Learning framework. We obtained very encouraging results(accuracy ≈57% for "out of lexicon" classes) while dealing with 166 training classes and 50 unseen test classes.
机译:历史稿件反映了我们的过去。最近,在世界的每个角落都在进行大量历史手写文件的数字化,正在存档。从那些数字存储库中,自动文本索引和检索系统只将这些文档获取到他们感兴趣的最终用户。常规OCR技术无法以可靠的方式将此服务呈现给最终用户。相反,单词识别/发现算法执行任务。基于Word识别的系统每类需要足够的标记数据来训练系统。此外,所有单词类都需要事先教授。虽然单词发现可以避免先前培训的缺点,但这些系统通常需要具有额外的开销,如语言模型,可以处理“出于词汇”单词。零拍学习可能是抵消这种情况的可能选择。提供零拍摄学习算法能够处理unseen类,只要算法已经强化了在训练期间每班具有丰富的鉴别特征和可靠的“属性描述”。由于深度学习的功能具有足够的辨别力,因此这里已经使用了深度学习框架进行特征提取目的。据我们所知,这可能是使用零射击学习框架的“出于词汇”中世纪词的第一个工作。我们获得了非常令人鼓舞的结果(精确≈57%,在lexicon“课程中”,同时处理166课程和50个看不见的测试课程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号