首页> 外文会议>International Conference on Natural Language Processing and Chinese Computing >Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph
【24h】

Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph

机译:基于汉字建设属性知识图的OCR转换误差规则知识推理模型

获取原文

摘要

OCR is a character conversion method based on image recognition. The complexity of the character and the image quality plays a key role in the conversion accuracy. The OCR conversion process has the characteristics of irregular conversion errors and the combination between incorrect conversion words and context of original location in certain text scenarios is established in semantic. In this paper, we propose an OCR conversion error rules inference model based on Chinese character construction attribute knowledge graph to analyze and inference the structure and complexity of Chinese characters. The model integrates a variety of coding methods, extracts features of entities and relationships of different data types with different encoder in the knowledge graph, uses convolutional neural networks to learn and inference the unknown error rules in the OCR conversion. In addition, in order to enable the triple feature matrix to fully contain the construction attribute information of the Chinese characters, a feature crossover algorithm for feature diffusion of the triple feature matrix is introduced. In this algorithm, the relation matrix and the entities matrix are crossed to generate the new feature matrix which can better represent the triple of knowledge graph. The experimental results show that, compared with the current mainstream knowledge inference model, the OCR conversion error rules inference model incorporating the feature cross algorithm has achieved important improvements in MRR, Hits@1, Hits@2 and other evaluation indicators on public data sets and task-related data sets.
机译:OCR是基于图像识别的字符转换方法。该字符和图像质量的复杂性起着转换精度的关键作用。该OCR转换过程有不规则的转换错误的特性和不正确的转换的话,并且在某些情况下文本原始位置的上下文之间的组合被建立语义。在本文中,我们提出了基于中国造字属性知识图来分析和推断中国汉字的结构和复杂的OCR转换错误规则推理模型。该模型集成了多种编码方法,提取实体和不同的数据类型与在知识图中不同的编码器的关系的特点,采用卷积神经网络来学习,并在OCR转换推理未知错误的规则。此外,为了使三重特征矩阵完全包含中国汉字的结构属性的信息,三重特征矩阵的特征扩散功能交叉算法引入。该算法的关系矩阵和实体矩阵杂交,产生新的特征矩阵,可以更好地代表三重知识图形。实验结果表明,与目前主流的知识推理模型相比,结合了功能交叉算法的OCR转换错误规则推断模型对公共数据集,实现了MRR重要的改进,点击@ 1,点击@ 2个评价指标和任务相关的数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号