首页> 外文会议> >A statistical refinement method for word shape token querying of document images
【24h】

A statistical refinement method for word shape token querying of document images

机译:一种统计精细化的文档图像词形令牌查询方法

获取原文

摘要

Word Shape Tokens (WSTs) are tokens used to represent words based on the overall shape or contour of a word as it appears in printed text. A character shape code (CSC) mapping function is used to aggregate similarly shaped letters such as "g" and "y" into one single code to represent those letters. The rationale behind this is that it is far easier and more accurate to map a scanned image of a word or letter into its WST representation than it is to map into its full ASCII representation. In previous work we showed that user-mediated selection of WSTs for querying document images improved system performance. In the work reported here we use a statistically derived dataset to help determine whether or not a particular WST from a scanned document image actually matches a query term WST. We do this by comparing the preceding and following WSTs of the each WST in a document against previously collected frequency data for a large set of WST occurrences.
机译:单词形状标记(WST)是用于根据单词在打印文本中出现的整体形状或轮廓表示单词的标记。字符形状代码(CSC)映射功能用于将形状相似的字母(例如“ g”和“ y”)聚合到一个单独的代码中,以表示这些字母。其基本原理是,将单词或字母的扫描图像映射到其WST表示中要比映射到其完整的ASCII表示要容易得多,更准确。在以前的工作中,我们证明了用户介导的WST选择来查询文档图像可以提高系统性能。在这里报告的工作中,我们使用统计派生的数据集来帮助确定来自扫描文档图像的特定WST是否实际上与查询词WST相匹配。为此,我们通过将文档中每个WST的前后WST与以前为大量WST发生而收集的频率数据进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号