首页> 外文OA文献 >Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques
【2h】

Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques

机译:使用自然语言处理概念和机器学习技术对情感评论进行分类

摘要

Natural language processing (NLP) is the hypothetically motivated scope of computational strategies for representing and analyzing naturally occurring text at many levels of textual analysis for the goal of attaining automatic language processing system for multiple tasks and applications. One of the most import applications of natural language processing from industry perspective is sentiment analysis. Sentiment analysis is the most eminent branch of NLP because of its capability to classify any textual document to either as positive or negative polarity. With the proliferation of World Wide Web, huge textual unstructured data in form of tweets, messages, articles, social networking discussions, reviews of products and movies are available so as to extract right information from the large pool. Thus, a need is felt to analyze this data to bring out some hidden facts based on the intention of the author of the text. The intention can be either criticism (negative) of product and movie review or it can be admiration (positive). Although, The intention can vary from strongly positive to positive and strongly negative to negative. This thesis completely focuses on classification of movie reviews in either as positive or negative review using machine learning techniques like Support Vector Machine(SVM), K-Nearest Neighbor(KNN) and Naive Bayes (NB) classifier. Further, a N-gram Model has been proposed where the documents are classified based on unigram, bigram and trigram composition of words in a sentence. Two dataset are considered for this study; one is a labeled polarity dataset where each movie review is either labeled as positive or negative and other one is IMDb movie reviews dataset. Finally, the prediction accuracy of above mentioned machine learning algorithms in different manipulations of same dataset is studied and a comparative analysis has been made for critical examination.
机译:自然语言处理(NLP)是用于在文本分析的多个级别上表示和分析自然出现的文本的计算策略的假设动机,其目的是获得用于多个任务和应用程序的自动语言处理系统。从行业角度来看,自然语言处理最重要的应用之一就是情感分析。情感分析是NLP的最重要分支,因为它能够将任何文本文档分类为正极性或负极性。随着万维网的普及,可以使用大量的文本非结构化数据,包括推文,消息,文章,社交网络讨论,产品和电影评论,以便从大量资源中提取正确的信息。因此,需要根据文本作者的意图分析此数据以找出一些隐藏的事实。目的可以是对产品和电影评论的批评(否定),也可以是赞美(肯定)。虽然,意图可以从强烈的正面变为正面,从强烈的负面变为负面。本文完全侧重于使用诸如支持向量机(SVM),K最近邻(KNN)和朴素贝叶斯(NB)分类器之类的机器学习技术将电影评论分为正面评论或负面评论。此外,已经提出了一种N元语法模型,其中基于句子中单词的单字,双字和三​​字组成对文档进行分类。本研究考虑了两个数据集;一个是带标签的极性数据集,其中每个电影评论都被标记为正片或负片,另一个是IMDb电影评论数据集。最后,研究了上述机器学习算法在同一数据集的不同操作下的预测精度,并进行了比较分析,以进行严格的检查。

著录项

  • 作者

    Agrawal Ankit;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号