首页> 外文会议>International Conference on Applications of Natural Language to Informations Systems >Technical Term Recognition with Semi-supervised Learning Using Hierarchical Bayesian Language Models
【24h】

Technical Term Recognition with Semi-supervised Learning Using Hierarchical Bayesian Language Models

机译:使用分层贝叶斯语言模型进行半监督学习的技术术语识别

获取原文

摘要

To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.
机译:要识别技术术语,需要术语词典或标记的CorcleA,但它需要花费大量的成本来编制它们。此外,这些术语可以具有若干表示,并且可以开发新的术语,这进一步复杂化问题,即简单的字典建筑无法解决问题。在这项研究中,为了降低创建词典的成本,我们针对建立一个学习识别来自Small Tagged语料库的术语的系统,使用半监督学习。通过组合基于HPylm的标签级语言模型和字符级语言模型来解决问题。我们对识别生物医学术语进行了实验。在监督学习中,我们获得了65%的F-Measet,它是利用许多领域特定启发式的最佳现有系统背后的8%。在半监督学习中,我们可以更好地降低监督数据的准确性,而不是泄漏方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号