【24h】

Massive Biomedical Term Discovery

机译:大规模生物医学术语发现

获取原文
获取原文并翻译 | 示例

摘要

Most technical and scientific terms are comprised of complex, multiword noun phrases but certainly not all noun phrases are technical or scientific terms. The distinction of specific terminology from common non-specific noun phrases can be based on the observation that terms reveal a much lesser degree of distributional variation than non-specific noun phrases. We formalize the limited paradigmatic modifiability of terms and, subsequently, test the corresponding algorithm on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using an already existing and community-wide curated biomedical terminology as an evaluation gold standard, we show that our algorithm significantly outperforms standard term identification measures and, therefore, qualifies as a high-performant building block for any terminology identification system.
机译:大多数技术和科学术语都由复杂的多词名词短语组成,但当然并非所有名词短语都是技术或科学术语。特定术语与常见非特定名词短语的区别可以基于以下观察:与非特定名词短语相比,术语揭示的分布变化程度要小得多。我们将术语的有限范式可形式化形式化,然后,对从1.04亿字的生物医学文本语料库中提取的bigram,trigram和quadgram名词短语测试相应的算法。使用一个已经存在并且在社区范围内广泛使用的生物医学术语作为评估黄金标准,我们证明了我们的算法明显优于标准术语识别措施,因此可以作为任何术语识别系统的高性能构件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号