首页> 外文会议>International conference on theory and practice of digital libraries >A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles
【24h】

A Domain Meta-wrapper Using Seeds for Intelligent Author List Extraction in the Domain of Scholarly Articles

机译:使用种子的领域元包装器,用于学术文章领域的智能作者列表提取

获取原文

摘要

In this paper we investigate about automated extraction of author lists in the domain of scientific digital libraries. It is given a list of known "seed" authors and we aim to extract complete lists of co-authors from Web pages in arbitrary format. We adopt a methodology embedding domain knowledge in a unique "meta-wrapper", not requiring training, with negligible maintenance costs and based on the combination of several extraction techniques. Such methods are applied at the structural level, at the character level and at the annotation level. We describe the methodology, illustrate our tool, compare with known approaches and measure the accuracy of our techniques with proper experiments.
机译:在本文中,我们研究了在科学数字图书馆领域中作者列表的自动提取。它提供了一个已知的“种子”作者列表,我们的目标是从网页中以任意格式提取完整的合作者列表。我们采用一种方法将领域知识嵌入到独特的“元包装器”中,不需要培训,维护成本可以忽略不计,并且基于多种提取技术的组合。这些方法在结构级别,字符级别和注释级别上应用。我们描述方法,说明工具,与已知方法进行比较,并通过适当的实验来衡量技术的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号