【24h】

On-line Handwritten Text Categorization

机译:在线手写文本分类

获取原文

摘要

As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these kinds of documents such as topic spotting are required. This means that we should be able to perform text categorization of on-line documents. The textual data available in on-line documents can be extracted through on-line recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the word recognition rate on the categorization performances, by comparing the performances of a categorization system over the texts obtained through on-line handwriting recognition and the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on the noise levels.
机译:作为新的创新设备,接受或生产在线文档,出现,需要进行这些文档的管理设施,例如主题点。这意味着我们应该能够执行在线文档的文本分类。可以通过在线识别中提取在线文档中可用的文本数据,该过程在生成的文本中产生噪声,即错误。这项工作报告了基于文本内容对在线手写文档的分类进行分类的实验。通过将分类系统的性能与通过在线手写识别和地面真理可用的相同文本的文本进行比较,通过比较分类系统的性能来分析单词识别率对分类性能的影响。在这项工作中比较了两个分类算法(KNN和SVM)。 REUTERS-21578的子集是由2000多个手写文件组成的语料库,用于这项研究。结果表明,精度损耗不显着,并且精确损耗仅对噪声水平的回忆值的召回值为60%-80%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号