首页> 外文会议>Forum for Information Retrieval Evaluation >Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning
【24h】

Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning

机译:ANUJ @ DPIL-FIRE2016:使用机器学习的印地语语言的新型解释方法

获取原文

摘要

Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrases for the given Shared Task DPIL (Detecting Paraphrases in Indian Languages) in Hindi Language specifically. This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks. In cross validation experiments, Random Forest outperforms the other methods with Fl-score of 0.94. We have extended our work by adding few more features and using the former best classifier resulting in improvement of Fl-score by 1%. The experimental results depict that our algorithm got the highest F1-score and accuracy and hence, secured the first rank in Hindi language in this shared task among all participants. Our novel approach can be used in various applications such as question-answering system, document clustering, machine translation, text summarization, plagiarism detection and many more.
机译:每种语言都具有合理的几种解释。随着网络,智能设备和社交媒体的演变,它已成为识别这些句法或语义歧义的具有挑战性的任务。在自然语言处理中,使用具有相同含义的不同单词编写的两个语句被称为释义。在2016年的火灾中,我们在印度语言中致力于检测给定共享任务DPIL(检测印度语言释义)的释义的问题。本文提出了一种识别两个陈述的新方法,如果两个陈述是使用随机林,支持传染媒介机器,渐变升值和高斯天真贝叶斯的两个陈述,在给定的两个训练数据集的两个子任务中都是如此。在交叉验证实验中,随机森林优于不同的其他方法0.94。我们通过添加更多功能并使用前最好的分类器来扩展我们的工作,从而提高了FL分1%。实验结果描绘了我们的算法获得了最高的F1分数和准确性,因此,在所有参与者之间的这种共同任务中确保了印地语语言的第一排名。我们的新方法可用于各种应用,如问答系统,文件集群,机器翻译,文本摘要,抄袭检测等等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号