...
首页> 外文期刊>Quality Control, Transactions >Duplicate Questions Pair Detection Using Siamese MaLSTM
【24h】

Duplicate Questions Pair Detection Using Siamese MaLSTM

机译:使用暹罗马尔斯特重复问题对检测

获取原文
获取原文并翻译 | 示例
           

摘要

Quora is a growing platform comprising a user generated collection of questions and answers. The questions and answers are created, edited, and organized by the users. Enormous number of users on the Quora website makes it unavoidable to have multiple questions from different users with similar intent, which raises the issue of duplicate questions. Effectively detecting duplicate questions would make it easier to find high quality answers and help save time, which in turn would result in an improved user experience for writers and readers on Quora. In this paper, Quora Question Pairs dataset is collected from Kaggle for detection of duplicate questions. First, three types of word embeddings involving Google news vector embedding, FastText crawl embedding with 300 dimensions, and FastText crawl sub words embedding with 300 dimensions are implemented individually to vectorize all the questions and train the model. The final features used for prediction are blend of these three types of word embeddings. Then, Siamese MaLSTM (& x201C;Ma & x201D; for Manhattan distance) Neural Network model is applied for prediction of duplicate questions in the dataset. Finally, the model is tested on 100000 pairs of questions. The experiments show that the proposed model achieves 91.14 & x0025; accuracy which is better than the state-of-the-art models.
机译:Quora是一个不断增长的平台,包括用户生成的问题和答案的集合。由用户创建,编辑和组织的问题和答案。 Quora网站上的巨大用户使其无法从不同的用户中具有类似意图的多个问题,这提出了重复问题的问题。有效地检测重复问题将使您更容易找到高质量的答案并帮助节省时间,这反过来将导致Quora上的作家和读者的用户体验改进。在本文中,从卡格收集了Quora问题对数据集以检测重复问题。首先,涉及谷歌新闻向量嵌入的三种类型的单词嵌入,FastText爬网嵌入300个维度,并且FastText爬网与300维的嵌入嵌入300个维度的单词是单独实现的,以便将所有问题的矢量化,并培训模型。用于预测的最终功能是这三种类型的单词嵌入的混合。然后,暹罗马尔斯特(&x201c; ma&x201d;对于曼哈顿距离),神经网络模型应用于数据集中的重复问题的预测。最后,该模型在100000对问题上进行了测试。实验表明,拟议的模型实现了91.14&x0025;精度优于最先进的模型。

著录项

  • 来源
    《Quality Control, Transactions》 |2020年第2020期|21932-21942|共11页
  • 作者单位

    Khwaja Fareed Univ Engn & Informat Technol Dept Comp Sci Rahim Yar Khan 64200 Pakistan;

    Khwaja Fareed Univ Engn & Informat Technol Dept Comp Sci Rahim Yar Khan 64200 Pakistan;

    Khwaja Fareed Univ Engn & Informat Technol Dept Comp Engn Rahim Yar Khan 64200 Pakistan|Univ Messina Dipartimento Matemat & Informat MIFT I-98122 Messina Italy;

    Khwaja Fareed Univ Engn & Informat Technol Dept Comp Sci Rahim Yar Khan 64200 Pakistan;

    Yeungnam Univ Dept Informat & Commun Engn Gyongsan 38542 South Korea;

    Khwaja Fareed Univ Engn & Informat Technol Dept Comp Sci Rahim Yar Khan 64200 Pakistan|Islamia Univ Bahawalpur Dept Comp Sci & Informat Technol Bahawalpur 63100 Pakistan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Duplicate question pair detection; text mining; deep learning; MaLSTM; word embedding;

    机译:重复的问题对检测;文本挖掘;深入学习;马尔斯特;词嵌入;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号