首页> 外文会议>Document recognition and retrieval XVI >On-line Handwritten Text Categorization
【24h】

On-line Handwritten Text Categorization

机译:在线手写文本分类

获取原文
获取原文并翻译 | 示例

摘要

As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these kinds of documents such as topic spotting are required. This means that we should be able to perform text categorization of on-line documents. The textual data available in on-line documents can be extracted through online recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the word recognition rate on the categorization performances, by comparing the performances of a categorization system over the texts obtained through on-line handwriting recognition and the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578 corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on the noise levels.
机译:随着接受或产生在线文档的新型创新设备的出现,需要管理此类文档的设施,例如主题发现。这意味着我们应该能够对在线文档进行文本分类。可以通过在线识别来提取在线文档中可用的文本数据,这是在结果文本中产生噪声即错误的过程。这项工作报告了基于文本内容对在线手写文档进行分类的实验。通过比较分类系统对通过在线手写识别获得的文本和可作为基础事实使用的文本的性能,我们分析了单词识别率对分类性能的影响。本文比较了两种分类算法(kNN和SVM)。这项研究收集了Reuters-21578语料库的一个子集,该子集包含2000多个手写文档。结果表明,准确度损失并不明显,而精确度损失仅对取决于噪声水平的60%-80%的召回值有意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号