Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph

机译：基于汉字建设属性知识图的OCR转换误差规则知识推理模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

OCR is a character conversion method based on image recognition. The complexity of the character and the image quality plays a key role in the conversion accuracy. The OCR conversion process has the characteristics of irregular conversion errors and the combination between incorrect conversion words and context of original location in certain text scenarios is established in semantic. In this paper, we propose an OCR conversion error rules inference model based on Chinese character construction attribute knowledge graph to analyze and inference the structure and complexity of Chinese characters. The model integrates a variety of coding methods, extracts features of entities and relationships of different data types with different encoder in the knowledge graph, uses convolutional neural networks to learn and inference the unknown error rules in the OCR conversion. In addition, in order to enable the triple feature matrix to fully contain the construction attribute information of the Chinese characters, a feature crossover algorithm for feature diffusion of the triple feature matrix is introduced. In this algorithm, the relation matrix and the entities matrix are crossed to generate the new feature matrix which can better represent the triple of knowledge graph. The experimental results show that, compared with the current mainstream knowledge inference model, the OCR conversion error rules inference model incorporating the feature cross algorithm has achieved important improvements in MRR, Hits@1, Hits@2 and other evaluation indicators on public data sets and task-related data sets.

机译：OCR是基于图像识别的字符转换方法。该字符和图像质量的复杂性起着转换精度的关键作用。该OCR转换过程有不规则的转换错误的特性和不正确的转换的话，并且在某些情况下文本原始位置的上下文之间的组合被建立语义。在本文中，我们提出了基于中国造字属性知识图来分析和推断中国汉字的结构和复杂的OCR转换错误规则推理模型。该模型集成了多种编码方法，提取实体和不同的数据类型与在知识图中不同的编码器的关系的特点，采用卷积神经网络来学习，并在OCR转换推理未知错误的规则。此外，为了使三重特征矩阵完全包含中国汉字的结构属性的信息，三重特征矩阵的特征扩散功能交叉算法引入。该算法的关系矩阵和实体矩阵杂交，产生新的特征矩阵，可以更好地代表三重知识图形。实验结果表明，与目前主流的知识推理模型相比，结合了功能交叉算法的OCR转换错误规则推断模型对公共数据集，实现了MRR重要的改进，点击@ 1，点击@ 2个评价指标和任务相关的数据集。

著录项

来源
《International Conference on Natural Language Processing and Chinese Computing》|2020年|415-425|共11页
会议地点
作者
Xiaowen Zhang; Hairong Wang; Wenjie Gu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Knowledge inference; Knowledge graph; OCR; Convolutional neural network; Text error correction;

机译：知识推理;知识图;OCR;卷积神经网络;文本错误纠正;

相似文献

外文文献
中文文献
专利

1. Elimination of Systematic Mass Measurement Errors in Liquid Chromatography-Mass Spectrometry Based Proteomics Using Regression Models and a Priori Partial Knowledge of the Sample Content [J] . Vladislav A. Petyuk, Navdeep Jaitly, Ronald J. Moore, Analytical chemistry . 2008,第3期

机译：使用回归模型和样品内容的先验知识消除基于液相色谱-质谱的蛋白质组学中的系统质量测量误差
2. Value-added treatment inference model for rule-based certainty knowledge [J] . Min-Yuan Cheng, Chin-Jung Huang Expert systems with applications . 2008,第2期

机译：基于规则的确定性知识的增值处理推理模型
3. Evaluation of and improvement planning for smart homes using rough knowledge-based rules on a hybrid multiple attribute decision-making model [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第10期

机译：在混合多属性决策模型中使用粗糙知识的规则评估智能房屋的改进规划
4. Finding Inference Rules Using Graph Mining in Ontological Knowledge Bases [C] . Lucas Fonseca Navarro, Estevam R. Hruschka, Ana Paula Appel Brazilian Conference on Intelligent Systems . 2016

机译：在本体知识库中使用图挖掘查找推理规则
5. The development of Chinese word reading: Relations of sub-character processing, phonological awareness, morphological awareness, and orthographic knowledge to Chinese-English biscriptal reading. [D] . Tong, Xiuli. 2008

机译：汉语单词阅读的发展：汉字双字阅读中的子字符处理，语音意识，形态意识和拼字知识的关系。
6. Elimination of Systematic Mass Measurement Errors in Liquid Chromatography-Mass Spectrometry Based Proteomics using Regression Models and a priori Partial Knowledge of the Sample Content [O] . Vladislav A. Petyuk, Navdeep Jaitly, Ronald J. Moore, -1

机译：使用回归模型和样品内容的先验知识消除基于液相色谱-质谱的蛋白质组学中的系统质量测量误差
7. Fuzzy classification knowledge base construction based on trend rules and inverse inference [O] . Ганна Борисівна Ракитянська 2015

机译：基于趋势规则和逆推论的模糊分类知识库施工

Knowledge Inference Model of OCR Conversion Error Rules Based on Chinese Character Construction Attributes Knowledge Graph

摘要

著录项

相似文献

相关主题

期刊订阅