首页> 外文期刊>BMC Medical Informatics and Decision Making >Creating a medical dictionary using word alignment: The influence of sources and resources
【24h】

Creating a medical dictionary using word alignment: The influence of sources and resources

机译:使用单词对齐方式创建医学词典:来源和资源的影响

获取原文
           

摘要

Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.
机译:背景技术具有相同内容和不同语言的平行文本的自动单词对齐是用于生成新翻译词典的方法之一。生成的单词对齐方式的质量取决于输入资源的质量。在本文中,我们报告了医学术语系统ICD-10,ICF,NCSP,KSH97-P和MeSH的英语和瑞典语版本的自动单词对齐,以及术语系统和资源类型如何影响质量。方法我们使用由手动单词对齐生成的静态资源(例如字典,统计资源(例如统计派生的字典)和培训资源)自动对术语系统进行单词对齐。我们改变了用于生成资源的术语系统的哪一部分,对齐的词的部分以及在对齐过程中使用的资源的类型,以探索不同术语系统和资源对查全率和准确性的影响。经过分析,我们使用了自动单词对齐的最佳配置来生成候选词对。然后,我们手动验证了候选词对,并将正确的词对包括在英语-瑞典语词典中。结果结果表明,更多的资源和资源类型可提供更好的结果,但是用于生成资源的零件的大小仅部分影响质量。最有用的资源是从ICD-10生成的,而MeSH生成的资源却没有其他资源那么普遍。术语系统专栏的结构中的系统性语言间差异使专栏难以对齐。手动创建的培训资源所产生的结果几乎与静态资源,统计资源和培训资源的并集相近,并且比静态资源和统计资源的联合所产生的效果要好得多。经过验证的英语-瑞典语词典包含24,000个基本形式的术语对。结束语更多的资源会在自动单词对齐方面带来更好的结果,但是有些资源只会带来很小的改进。最重要的资源类型是培训,而最一般的资源来自ICD-10。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号