首页> 外文期刊>Arabian Journal for Science and Engineering >Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
【24h】

Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language

机译:句子嵌入和卷积神经网络用于阿拉伯语语义文本相似性检测

获取原文
获取原文并翻译 | 示例
       

摘要

The continuous increase in extraordinary textual sources on the web has facilitated the act of paraphrase. Its detection has become a challenge in different natural language processing applications (e.g., plagiarism detection, information retrieval and extraction, question answering, etc.). Different from western languages like English, few works have been addressed the problem of extrinsic paraphrase detection in Arabic language. In this context, we proposed a deep learning-based approach to indicate how original and suspect documents expressed the same meaning. Indeed, word2vec algorithm extracted the relevant features by predicting each word to its neighbors. Subsequently, averaging the obtained vectors was efficient for generating sentence vectors representations. Then, convolutional neural network was useful to capture more contextual information and compute the degree of semantic relatedness. Faced to the lack of resources publicly available, paraphrased corpus was developed using skip gram model. It had better performance in replacing an original word by its most similar one that had the same grammatical class from a vocabulary. Finally, the proposed system achieved good results enhancing an efficient contextual relationship detection between Arabic documents in terms of precision (85%) and recall (86.8%) than previous studies.
机译:网络上非常规文本来源的不断增加促进了释义的行为。在不同的自然语言处理应用程序(例如detection窃检测,信息检索和提取,问题回答等)中,其检测已成为一项挑战。与西方语言(例如英语)不同,很少有作品解决阿拉伯语言中外在释义的检测问题。在这种情况下,我们提出了一种基于深度学习的方法,以表明原始文档和可疑文档如何表达相同的含义。实际上,word2vec算法通过预测每个单词的相邻单词来提取相关特征。随后,平均获得的向量对于生成句子向量表示很有效。然后,卷积神经网络可用于捕获更多上下文信息并计算语义相关度。面对缺乏公开资源的情况,使用跳过语法模型开发了释义语料库。在用与词汇表中相同的语法类的最相似的单词代替原始单词时,它具有更好的性能。最后,与以前的研究相比,该系统取得了良好的结果,增强了阿拉伯文文档之间在准确度(85%)和召回率(86.8%)之间的有效上下文关系检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号