首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations
【2h】

Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations

机译:基于分层双向注意的RNN支持受基因突变影响的蛋白质间相互作用的文档分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words. In our model, we use word embeddings to project the words to a low-dimensional vector space. We leverage word embeddings trained on PubMed for initializing the embedding layer of our network. We apply this model to biomedical literature specifically, on paper abstracts published in PubMed. We argue that the title of the paper itself usually contains important information more salient than a typical sentence in the abstract. For this reason, we propose a shortcut connection that integrates the title vector representation directly to the final feature representation of the document. We concatenate the sentence vector that represents the title and the vectors of the abstract to the document feature vector used as input to the task classifier. With this system we participated in the Document Triage Task of the BioCreative VI Precision Medicine Track and we achieved 0.6289 Precision, 0.7656 Recall and 0.6906 F1-score with the Precision and F1-score be the highest ranking first among the other systems.Database URL:
机译:在本文中,我们将基于分层双向注意力的递归神经网络(RNN)描述为可重用的序列编码器体系结构,该体系结构用作句子和文档编码器进行文档分类。序列编码器由两个双向RNN组成,配备有注意机制,该机制识别并捕获文档中最重要的元素,单词或句子,然后是用于分类任务的密集层。我们的方法利用了由句子序列组成的文档的层次性质,而句子由单词序列组成。在我们的模型中,我们使用词嵌入将词投影到低维向量空间。我们利用在PubMed上训练的词嵌入来初始化网络的嵌入层。我们将这种模型专门应用于生物医学文献,发表在PubMed上的论文摘要上。我们认为论文的标题通常包含比摘要中的典型句子更重要的重要信息。因此,我们提出了一种快捷方式连接,该快捷方式将标题矢量表示直接集成到文档的最终特征表示中。我们将代表标题的句子向量和摘要的向量连接到用作任务分类器输入的文档特征向量。通过该系统,我们参加了BioCreative VI精确医学轨道的文档分类任务,获得了0.6289精度,0.7656召回率和0.6906 F1分数,其中Precision和F1分数在其他系统中排名最高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号