Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning

机译：ANUJ @ DPIL-FIRE2016：使用机器学习的印地语语言的新型解释方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Every language possesses plausible several interpretations. With the evolution of web, smart devices and social media it has become a challenging task to identify these syntactic or semantic ambiguities. In Natural Language Processing, two statements written using different words having same meaning is termed as paraphrasing. At FIRE 2016, we have worked upon the problem of detecting paraphrases for the given Shared Task DPIL (Detecting Paraphrases in Indian Languages) in Hindi Language specifically. This paper proposes a novel approach to identify if two statements are paraphrased or not using various machine learning algorithms like Random Forest, Support Vector Machine, Gradient Boosting and Gaussian Naive Bayes on the given training data set of two subtasks. In cross validation experiments, Random Forest outperforms the other methods with Fl-score of 0.94. We have extended our work by adding few more features and using the former best classifier resulting in improvement of Fl-score by 1%. The experimental results depict that our algorithm got the highest F1-score and accuracy and hence, secured the first rank in Hindi language in this shared task among all participants. Our novel approach can be used in various applications such as question-answering system, document clustering, machine translation, text summarization, plagiarism detection and many more.

机译：每种语言都具有合理的几种解释。随着网络，智能设备和社交媒体的演变，它已成为识别这些句法或语义歧义的具有挑战性的任务。在自然语言处理中，使用具有相同含义的不同单词编写的两个语句被称为释义。在2016年的火灾中，我们在印度语言中致力于检测给定共享任务DPIL（检测印度语言释义）的释义的问题。本文提出了一种识别两个陈述的新方法，如果两个陈述是使用随机林，支持传染媒介机器，渐变升值和高斯天真贝叶斯的两个陈述，在给定的两个训练数据集的两个子任务中都是如此。在交叉验证实验中，随机森林优于不同的其他方法0.94。我们通过添加更多功能并使用前最好的分类器来扩展我们的工作，从而提高了FL分1％。实验结果描绘了我们的算法获得了最高的F1分数和准确性，因此，在所有参与者之间的这种共同任务中确保了印地语语言的第一排名。我们的新方法可用于各种应用，如问答系统，文件集群，机器翻译，文本摘要，抄袭检测等等。

著录项

来源
《Forum for Information Retrieval Evaluation》|2018年|205p|共12页
会议地点
作者
Anuj Saini; Aayushi Verma;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391-53;
关键词
Paraphrase detection; Machine learning Natural language processing; Soundex; Semantic similarity; Random forest;

机译：解释检测;机器学习自然语言处理;Soundex;语义相似;随机森林;

相似文献

外文文献
中文文献
专利

1. “本族语特性”与可理解性：浅析跨文化交际中英语作为通用语的使用经历对中国英语使用者的语言态度之影响 [J] . 王颖, Jennifer JENKINS 中国应用语言学：英文版 . 2016,第001期
2. Plagiarism Detection Using Machine Learning-Based Paraphrase Recognizer [J] . A. Chitra, Anupriya Rajkumar Journal of Intelligent Systems . 2016,第3期

机译：使用基于机器学习的复述识别器进行窃检测
3. Machine Vision Methods, Natural Language Processing, and Machine Learning Algorithms for Automated Dispersion Plot Analysis and Chemical Identification from Complex Mixtures [J] . Yeap Danny, Hichwa Paul T., Rajapakse Maneeshin Y., Analytical chemistry . 2019,第16期

机译：机器视觉方法，自然语言处理和机器学习算法，用于自动分散绘图分析和复杂混合物的化学识别
4. The Class Imbalance Problem in the Machine Learning Based Detection of Vandalism in Wikipedia across Languages [J] . Arsim Susuri, Mentor Hamiti Agni Dika Advances in Science, Technology and Engineering Systems . 2017,第1期

机译：基于机器学习的跨语言维基百科中故意破坏的检测中的类不平衡问题
5. Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning [C] . Anuj Saini, Aayushi Verma Forum for Information Retrieval Evaluation . 2018

机译：ANUJ @ DPIL-FIRE2016：使用机器学习的印地语语言的新型解释方法
6. Decision Models for Application of Machine Learning Methods for Fraud Detection [D] . Bled, Philippe. 2019

机译：机器学习方法在欺诈检测中应用的决策模型
7. Machine learning and natural language processing methods to identify ischemic stroke acuity and location from radiology reports [O] . Charlene Jennifer Ong, Agni Orfanoudaki, Rebecca Zhang, 2020

机译：机器学习和自然语言处理方法以识别缺血性卒中敏锐度和放射学报告的位置
8. Detection of Broken Rotor Bars in Induction Machines using Machine Learning Methods [O] . Stefan Quabeck, Wenbo Shangguan, Daniel Scharfenstein, 2021

机译：使用机器学习方法检测感应机器中的断开转子杆

Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅