首页> 外文会议>Conference on Document Recognition and Retrieval XI; Jan 21-22, 2004; San Jose, California, USA >Style-Independent Document Labeling: Design and Performance Evaluation
【24h】

Style-Independent Document Labeling: Design and Performance Evaluation

机译:与样式无关的文档标签:设计和性能评估

获取原文
获取原文并翻译 | 示例

摘要

The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical journals into MEDLINE~R, the premier bibliographic citation database at NLM. Currently, a rule-based algorithm (called ZoneCzar) is used for labeling important bibliographical fields (title, author, affiliation, and abstract) on medical journal article page images. While rules have been created for medical journals with regular layout types, new rules have to be manually created for any input journals with arbitrary or new layout types. Therefore, it is of interest to label any journal articles independent of their layout styles. In this paper, we first describe a system (called ZoneMatch) for automated generation of crucial geometric and non-geometric features of important bibliographical fields based on string-matching and clustering techniques. The rule-based algorithm is then modified to use these features to perform style-independent labeling. We then describe a performance evaluation method for quantitatively evaluating our algorithm and characterizing its error distributions. Experimental results show that the labeling performance of the rule-based algorithm is significantly improved when the generated features are used.
机译:美国国家医学图书馆(NLM)已开发了医学文章记录系统(MARS),用于将医学期刊的书目信息自动数据输入到MEDLINE〜R中,该数据库是NLM的主要书目引文数据库。当前,基于规则的算法(称为ZoneCzar)用于标记医学期刊文章页面图像上的重要书目字段(标题,作者,隶属关系和摘要)。虽然已为具有常规布局类型的医学期刊创建了规则,但必须为具有任意或新布局类型的任何输入期刊手动创建新规则。因此,标记任何期刊文章而不依赖其版式样式是很有意义的。在本文中,我们首先介绍一种基于字符串匹配和聚类技术自动生成重要书目领域的关键几何和非几何特征的系统(称为ZoneMatch)。然后将基于规则的算法修改为使用这些功能来执行与样式无关的标记。然后,我们描述一种性能评估方法,用于定量评估我们的算法并表征其误差分布。实验结果表明,当使用生成的特征时,基于规则的算法的标注性能得到了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号