首页> 外文会议>Annual meeting of the Association for Computational Linguistics >On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data
【24h】

On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

机译:论中国临床记录中的更好的单词嵌入:与域和外域数据相结合的研究

获取原文

摘要

High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.
机译:高质量的单词嵌入对生物医学自然语言处理的应用具有重要意义。近年来,对如何学习良好嵌入和基于英语医学文本的嵌入质量的兴趣激增已成为显而易见的,但是根据中国医学文本,特别是中国临床记录的有限研究。在此,我们提出了一种新的方法,即利用外域数据作为辅助在中国临床记录有限的情况下,嵌入质量评价方法基于医学概念相似性进行的辅助。实验结果表明,选择良好的训练样本是必要的,并在嵌入质量和训练时间消耗之间收集适量的外域数据和交易是更好的嵌入的必要因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号