首页> 外文会议>Proceedings of the IASTED international conferences on informatics >ADAPTIVE GENERIC CLASSIFIER FOR STRUCTURED DOCUMENTS
【24h】

ADAPTIVE GENERIC CLASSIFIER FOR STRUCTURED DOCUMENTS

机译:结构化文档的自适应通用分类器

获取原文
获取原文并翻译 | 示例

摘要

Structured documents as forms, cheques, and slips are used widely in all sectors and have an inherently high error rate (ERR) which is mainly due to many factors as inconsistent of human while filling the documents manually, different written language used all over the world to fill up the required information, different structure and layouts for each document. In document classification systems, not only it is difficult to keep the ERR low, finding features that differentiate the documents that are almost similar is considered as another tough challenge. Finding a generic solution for a different written language forms and solving the previous mentioned obstacles poses a great challenge in the development of more robust structured document classification system. In this paper, an adaptive generic document classification engine is proposed based on building a unique sequence of discrete symbols out of the structured document's features and implementing a dynamic time wrapping (DTW) algorithm to calculate the similarities between the sequence of symbols of the tested document and all the saved sequence of symbols for all the templates and providing the decision. This novel technique of building a sequence of different symbols extracted out of a unique features and using a DTW algorithm to classify the input shows a higher level of robustness with improved ERR.
机译:结构化的文档(如表格,支票和发票)在各个领域得到广泛使用,并且固有的错误率(ERR)高,这主要是由于人工填充文档时与人为不一致的许多因素造成的,世界各地使用了不同的书面语言填写每个文档所需的信息,不同的结构和布局。在文档分类系统中,不仅很难保持较低的ERR,而且发现区分几乎相似文档的特征也被认为是另一个艰巨的挑战。为不同的书面语言形式找到通用的解决方案并解决前面提到的障碍,这对开发更强大的结构化文档分类系统提出了巨大的挑战。本文基于结构化文档的特征构建独特的离散符号序列并实现动态时间包装(DTW)算法来计算测试文档的符号序列之间的相似度,提出了一种自适应通用文档分类引擎以及所有模板的所有已保存符号序列并提供决策。建立从独特特征中提取的一系列不同符号并使用DTW算法对输入进行分类的这项新颖技术显示出更高的鲁棒性和更高的ERR。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号