首页> 外文会议>International conference on text, speech and dialogue >Semantic Splitting of German Medical Compounds
【24h】

Semantic Splitting of German Medical Compounds

机译:德国医用化合物的语义分裂

获取原文

摘要

Compounding is widespread in highly inflectional languages with a quarter of all nouns created by composition. In our field of study, the German medical language, the amount of compounds significantly outnumbers this figure with 64 %. Thus, their correct splitting is a high-impact preprocessing step for any NLP-based application. In this work we address two challenges of medical decomposition: First, we introduce the consideration of unknown constituents in order to split compounds that were not recognized as such so far. Second, our approach builds on the corpus-based approach of Koehn and Knight and adds semantic knowledge from domain ontologies to increase the accuracy during disambiguation of the various split options. Using this first-of-a-kind semantic approach in a study on decomposition of German medical compounds, we outperform the existing approaches by far.
机译:复合在高变形的语言中很普遍,所有名词的四分之一都是通过合成创建的。在我们的研究领域(德国医学语言)中,化合物的数量以64%的数量大大超过该数字。因此,对于任何基于NLP的应用程序而言,它们的正确拆分都是影响很大的预处理步骤。在这项工作中,我们解决了医学分解的两个挑战:首先,我们引入了对未知成分的考虑,以便拆分迄今尚未被认识的化合物。其次,我们的方法基于Koehn和Knight的基于语料库的方法,并添加了领域本体中的语义知识,从而在消除各种拆分选项的歧义时提高了准确性。在研究德国医用化合物的分解过程中,使用了这种首创的语义方法,到目前为止,我们的性能优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号