【24h】

Information visualization for document classification

机译:信息可视化,用于文档分类

获取原文
获取原文并翻译 | 示例

摘要

This project seeks to combine state-of-the-art information visualization techniques with text image Cannon Quality Factors to characterize and discriminate among text documents and their digital images. It will provide a highly effective tool for characterization and management of a test corpus composed of over 1200 documents. The basic concept is that once characterized, it should be possible to visually identify regions of expected OCR accuracy and degree of OCR difficulty within the OCR Test Corpus using the Cannon Quality Factors. We have been working with an information visualization tool (dubbed "Parentage") to identify the appropriate metric data for the above purposes. Two very important potential applications of this work include the capability to (1) identify new research directions for OCR development, and (2) identify the most appropriate OCR commercial/system engine to use with a given set of documents.
机译:该项目旨在将最先进的信息可视化技术与文本图像Cannon质量因子相结合,以表征和区分文本文档及其数字图像。它将提供一个高效的工具来表征和管理由1200多个文档组成的测试语料库。基本概念是,一旦确定了特征,就应该有可能使用Cannon质量因子在OCR测试语料库中目视识别预期的OCR准确性和OCR难度程度的区域。我们一直在使用信息可视化工具(称为“父母身份”)来识别用于上述目的的适当指标数据。这项工作的两个非常重要的潜在应用包括:(1)识别OCR开发的新研究方向,以及(2)识别最适合用于给定文档集的OCR商业/系统引擎的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号