首页> 外文会议>IEEE International Conference on Semantic Computing >Siamese Discourse Structure Recursive Neural Network for Semantic Representation
【24h】

Siamese Discourse Structure Recursive Neural Network for Semantic Representation

机译:暹罗语篇结构递归神经网络的语义表示

获取原文

摘要

Finding a highly informative, low-dimensional representation for texts, specifically long texts, is one of the main challenges for efficient information storage and retrieval. This representation should capture the semantic and syntactic information of the text while retaining relevance for large-scale similarity search. We propose the utilization of Rhetorical Structure Theory (RST) to consider text structure in the representation. In addition, to embed document relevance in distributed representation, we use a Siamese neural network to jointly learn document representations. Our Siamese network consists of two sub-networks of recursive neural networks built over the RST tree. We examine our approach on two datasets, a subset of Reuters's corpus and BBC news dataset. Our model outperforms latent Dirichlet allocation document modeling on both datasets. Our method also outperforms latent semantic analysis document representation has been beaten by our method by 3% and 6% on the BBC and Reuters datasets, respectively. The proposed method also outperforms TF _ IDF representations by 11 % and 15% and the word embedding averaging representation by 6% and 7% in precision at k retrieved documents on Reuters and BBC datasets, respectively.
机译:为文本(尤其是长文本)找到一种内容丰富,低维的表示形式是有效信息存储和检索的主要挑战之一。该表示应捕获文本的语义和句法信息,同时保留与大规模相似搜索的相关性。我们建议利用修辞结构理论(RST)来考虑表示中的文本结构。此外,为了将文档相关性嵌入分布式表示中,我们使用暹罗神经网络来共同学习文档表示。我们的暹罗网络由建立在RST树上的两个递归神经网络子网络组成。我们在两个数据集(路透社语料集和BBC新闻数据集)上检查了我们的方法。我们的模型在两个数据集上都胜过潜在的Dirichlet分配文档建模。在BBC和Reuters数据集上,我们的方法也胜过了潜在的语义分析文档表示,分别被我们的方法击败了3 \%和6 \%。在路透社和BBC数据集上检索到的k个文档中,所提出的方法的精度也分别比TF_IDF表示高11 \%和15 \%,并且词嵌入平均表示的精度分别高6 \%和7 \%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号