首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Expanding Science and Technology Thesauri from Bibliographic Datasets Using Word Embedding
【24h】

Expanding Science and Technology Thesauri from Bibliographic Datasets Using Word Embedding

机译:使用词嵌入从书目数据集中扩展科学技术叙词表

获取原文

摘要

The use of thesauri and taxonomies for science and technology information in scientometrics has been attracting attention. However, manual construction and maintenance of thesauri is expensive and requires significant time, thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method to expand an existing thesaurus using the abstracts of articles from state-of-the-art technological domains with limited structured information. Specifically, we consider a method for properly allocating new terms to the hierarchical structures of an existing thesaurus using rapidly evolving word embedding. In an experiment, word vectors of 500 degrees are constructed from 567,000 biomedical articles and are clustered after dimension reduction using principal component analysis. Then, semantic relations are estimated based on the spatial relations between the new term and any of the terms in the thesaurus. We then conducted a comparison of the results obtained from three experts. In future, we will develop a recommendation system for new terms related to the existing terms to support semi-automatic thesaurus maintenance.
机译:在科学计量学中,叙词表和分类法在科学和技术信息中的使用已引起人们的关注。然而,叙词表的手动构建和维护很昂贵并且需要大量时间,因此,半自动构建和维护的方法正在积极研究中。我们提出了一种方法,该方法使用具有有限结构信息的最新技术领域中的文章摘要来扩展现有词库。具体来说,我们考虑一种使用快速发展的词嵌入技术为现有词库的层次结构正确分配新术语的方法。在一个实验中,从567,000个生物医学文章中构建了500度的词向量,并在使用主成分分析进行降维后将其聚类。然后,根据新词库与词库中任何一个词库之间的空间关系来估计语义关系。然后,我们对从三位专家那里获得的结果进行了比较。将来,我们将为与现有术语相关的新术语开发一个推荐系统,以支持半自动同义词库的维护。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号