...
首页> 外文期刊>International Journal on Document Analysis and Recognition >An experimental evaluation of OCR text representations for learning document classifiers
【24h】

An experimental evaluation of OCR text representations for learning document classifiers

机译:用于学习文档分类器的OCR文本表示的实验评估

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In the literature, many feature types are proposed for document classification. However, an extensive and systematic evaluation of the various approaches has not yet been done. In particular, evaluations on OCR documents are very rare. In this paper we investigate seven text representations based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n-grams is an attractive technique which can even compare to techniques relying on a morphological analysis. This holds for OCR texts as well as for correct ASCII texts.
机译:在文献中,提出了许多特征类型用于文档分类。但是,尚未对各种方法进行广泛而系统的评估。特别是,对OCR文件的评估非常罕见。在本文中,我们研究了基于n-gram和单个单词的七个文本表示形式。我们比较了它们在两个领域中对OCR文本和相应的正确ASCII文本进行分类的有效性:商务信函和技术报告摘要。我们的结果表明,使用n-gram是一种有吸引力的技术,甚至可以与依赖形态分析的技术进行比较。这适用于OCR文本以及正确的ASCII文本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号