首页> 外文会议>IEEE International Conference on Computational Intelligence and Computing Research >Machine learning based paraphrase identification system using lexical syntactic features
【24h】

Machine learning based paraphrase identification system using lexical syntactic features

机译:使用词法句法特征的基于机器学习的释义识别系统

获取原文

摘要

During the natural language communication, meaning understanding is the complex task that humans learn from their childhood but to automate this process of meaning understanding for computers has great real world applications. Simple text processing tasks are not enough to uncover the meaning from given unstructured natural language text. Our current research focuses on the issues pertaining to the same. Paraphrase identification is such important task of identifying the meaning similarity between two text segments in natural language understanding system. Proposed a machine learning system uses lexical features and dependency based features for sentence level paraphrase identification. The performance of proposed system is evaluated by conducting experiment on standard Microsoft paraphrase corpus. Moreover, a comparative study of current system with other machine learning based systems on Microsoft paraphrase corpus for paraphrase identification is carried out. The proposed system achieves competitive results compare to other state-of-the art machine learning systems by using simple linguistic features. The system using SVM classifier achieves 81.41% f-score by using simple lexical features only. Voting based classifier scores 80.97% with lexical features. Results with dependency features are highly sensitive to minor syntactic change.
机译:在自然语言交流中,意义理解是人类从童年时代学来的复杂任务,但是要使计算机的意义理解过程自动化,则在现实世界中具有很大的应用。简单的文本处理任务不足以从给定的非结构化自然语言文本中揭示含义。我们当前的研究集中在与之相关的问题上。复述识别是识别自然语言理解系统中两个文本段之间含义相似性的重要任务。提出了一种机器学习系统,其使用词法特征和基于依存关系的特征来进行句子级释义的识别。通过在标准Microsoft复述语料库上进行实验来评估所提出系统的性能。此外,进行了当前系统与其他基于Microsoft机器学习系统的机器学习系统的比较研究,以识别应用程序。通过使用简单的语言功能,与其他最新的机器学习系统相比,所提出的系统获得了有竞争力的结果。使用SVM分类器的系统仅通过使用简单的词法功能即可达到81.41%的f分数。基于投票的分类器的词汇特征得分为80.97%。具有依存关系特征的结果对较小的语法变化高度敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号