首页> 外文会议>iCatse international conference on information science and applications >Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding
【24h】

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

机译:使用Word-Cluster Embedding改进语义扩展的医疗简短文本分类

获取原文

摘要

Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to doctors. However, in electronic medical records (EMR), the texts containing sentences are shorter than that in general domain, which leads to the lack of semantic features and the ambiguity of semantic. To tackle this challenge, we propose to add word-cluster embedding to deep neural network for improving short text classification. Concretely, we first use hierarchical agglomerative clustering to cluster the word vectors in the semantic space. Then we calculate the cluster center vector which represents the implicit topic information of words in the cluster. Finally, we expand word vector with cluster center vector, and implement classifiers using CNN and LSTM respectively. To evaluate the performance of our proposed method, we conduct experiments on public data sets TREC and the medical short sentences data sets which is constructed and released by us. The experimental results demonstrate that our proposed method outperforms state-of-the-art baselines in short sentence classification on both medical domain and general domain.
机译:自动文本分类(TC)研究可用于现实世界问题,例如患有内部患者排放摘要和医学文本报告的分类,这有利于向医生更加理解的医疗文件。然而,在电子医疗记录(EMR)中,包含句子的文本比常规域中的文本短,这导致缺乏语义特征和语义的歧义。为了解决这一挑战,我们建议将嵌入到深神经网络的单词集群,以改善短文本分类。具体地,我们首先使用分层凝聚聚类来聚集语义空间中的单词向量。然后我们计算集群中心向量,该矢量表示群集中单词的隐式主题信息。最后,我们将Word Vector与集群中心向量扩展,并分别使用CNN和LSTM实现分类器。为了评估我们所提出的方法的表现,我们对公共数据集进行实验组TREC和由我们构建和发布的医疗短句数据集。实验结果表明,我们所提出的方法在医学领域和一般领域的短句分类中优于最先进的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号