首页> 外文期刊>Bioinformatics >Automatic term list generation for entity tagging.
【24h】

Automatic term list generation for entity tagging.

机译:自动生成术语列表以进行实体标记。

获取原文
获取原文并翻译 | 示例
       

摘要

MOTIVATION: Many entity taggers and information extraction systems make use of lists of terms of entities such as people, places, genes or chemicals. These lists have traditionally been constructed manually. We show that distributional clustering methods which group words based on the contexts that they appear in, including neighboring words and syntactic relations extracted using a shallow parser, can be used to aid in the construction of term lists. RESULTS: Experiments on learning lists of terms and using them as part of a gene tagger on a corpus of abstracts from the scientific literature show that our automatically generated term lists significantly boost the precision of a state-of-the-art CRF-based gene tagger to a degree that is competitive with using hand curated lists and boosts recall to a degree that surpasses that of the hand-curated lists. Our results also show that these distributional clustering methods do not generate lists as helpful as those generated by supervised techniques, but that they can be used to complement supervised techniques so as to obtain better performance. AVAILABILITY: The code used in this paper is available from http://www.cis.upenn.edu/datamining/software_dist/autoterm/
机译:动机:许多实体标签和信息提取系统都使用诸如人,地点,基因或化学物质之类的实体术语列表。这些列表传统上是手动构建的。我们展示了基于单词出现的上下文对单词进行分组的分布聚类方法,包括使用浅层解析器提取的相邻单词和句法关系,可用于辅助术语列表的构建。结果:关于学习术语列表并将其用作科学文献摘要中的基因标记的一部分的实验表明,我们自动生成的术语列表显着提高了基于CRF的最新基因的准确性标记器的使用程度与使用手工策展的列表具有竞争性,并且可将召回率提高到超过手工策展的列表的程度。我们的结果还表明,这些分布聚类方法生成的列表不如监督技术生成的列表那样有用,但是它们可用于补充监督技术,以获得更好的性能。可用性:本文中使用的代码可从http://www.cis.upenn.edu/datamining/software_dist/autoterm/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号