首页> 外文会议>International Conference on Applications of Natural Language to Information Systems >Virus Causes Flu: Identifying Causality in the Biomedical Domain Using an Ensemble Approach with Target-Specific Semantic Embeddings
【24h】

Virus Causes Flu: Identifying Causality in the Biomedical Domain Using an Ensemble Approach with Target-Specific Semantic Embeddings

机译:病毒引起流感:使用具有目标特定语义嵌入的集合方法识别生物医学域中的因果关系

获取原文

摘要

Identification of Cause-Effect (CE) relation is crucial for creating a scientific knowledge-base and facilitate question-answering in the biomedical domain. An example sentence having CE relation in the biomedical domain (precisely Leukemia) is: viability of THP-1 cells was inhibited by COR. Here, COR is the cause argument, viability of THP-1 cells is the effect argument and inhibited is the trigger word creating a causal scenario. Notably CE relation has a temporal order between cause and effect arguments. In this paper, we harness this property and hypothesize that the temporal order of CE relation can be captured well by the Long Short Term Memory (LSTM) network with independently obtained semantic embeddings of words trained on the targeted disease data. These focused semantic embeddings of words overcome the labeled data requirement of the LSTM network. We extensively validate our hypothesis using three types of word embeddings, viz., Glo Ve, PubMed, and target-specific where the target (focus) is Leukemia. We obtain a statistically significant improvement in the performance with LSTM using GloVe and target-specific embeddings over other baseline models. Furthermore, we show that an ensemble of LSTM models gives a significant improvement (~3%) over the individual models as per the t-test. Our CE relation classification system's results generate a knowledge-base of 277478 CE relation mentions using a rule-based approach.
机译:识别原因(CE)关系对于创造科学知识库至关重要,并促进生物医学领域的质疑答案。具有生物医学域中的CE关系的示例句(精确白血病)是:COR的可生存能量抑制THP-1细胞。在这里,COR是原因参数,THP-1单元的可行性是效果参数和禁止是触发词创建因果方案。特别是CE关系在原因和效果参数之间具有时间顺序。在本文中,我们利用了这种财产,并假设可以通过长短短期内存(LSTM)网络来捕获CE关系的时间顺序,其中具有在目标疾病数据上培训的单词的单词的独立获得的语义嵌入。这些聚焦的语义嵌入词克服了LSTM网络的标记数据要求。我们使用三种类型的Word Embeddings,Viz广泛验证我们的假设。,Glo Ve,Pubmed和目标特定于目标(焦点)是白血病。我们在使用Glove和目标特定于基线模型中获得LSTM的性能的统计上显着改进。此外,我们表明LSTM模型的集合,根据T检验,在各个模型中提供了显着的改善(〜3%)。我们的CE关系分类系统的结果使用基于规则的方法产生了277478 CE关系的知识库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号