首页> 外文期刊>International journal of cognitive informatics and natural intelligence >Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity
【24h】

Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

机译:基于卷积神经网络的阿拉伯文本相似度分布语义模型

获取原文
获取原文并翻译 | 示例
       

摘要

The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
机译:解决的问题是开发一种模型,该模型可以可靠地标识以前看不见的文档对是否已被释义。由于其功能多样且缺乏公开可用的语料库,因此在阿拉伯语文件中对其进行检测是一个挑战。面对这些问题,作者提出了一种语义方法。在特征提取级别,作者使用结合了全局共现计数和上下文跳过语法模型的全局向量表示。在复述识别级别,作者应用了卷积神经网络模型来学习文档之间的更多上下文和语义信息。对于实验,作者使用开放源阿拉伯语语料库作为源语料库。然后,作者收集不同的数据集以创建词汇模型。对于释义语料库的构建,作者用源语料库中的每个单词替换最相似的单词,该单词具有相同的语法类,并应用了word2vec算法和词性标注。实验表明,与文献中的现有方法相比,该模型在准确性和召回率方面均取得了可喜的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号