首页> 外文会议>International Conference on Pattern Recognition and Machine Intelligence >Lexical TF-IDF: An n-gram Feature Space for Cross-Domain Classification of Sentiment Reviews
【24h】

Lexical TF-IDF: An n-gram Feature Space for Cross-Domain Classification of Sentiment Reviews

机译:词汇TF-IDF:一种N-GRAM特征空间,用于情感评论的跨域分类

获取原文

摘要

Feature extraction and selection is a vital step in sentiment classification using machine learning approach. Existing methods use only TF-IDF rating to represent either unigram or n-gram feature vectors. Some approaches leverage upon the use of existing sentiment dictionaries and use the score of a unigram sentiment word as the feature vector and ignore TF-IDF rating. In this work, we construct n-gram sentiment features by extracting the sentiment words and their intensifiers or negations from a review. Then the score of an n-gram constructed from lexicon of semantic unigram and its intensifier or negation is multiplied to TF-IDF rating to determine the feature score. We experiment with two benchmark data sets for sentiment classification using Support Vector Machine and Maximum Entropy method with cross domain validation by considering training and testing data from two different sets and obtain a substantial improvement in terms of various performance measures compared to existing methods. Cross-domain validation ensures proposed method can be applied for sentiment classification of data sets where example patterns are not available, which typically is the case with commercial data sets.
机译:特征提取和选择是使用机器学习方法的情绪分类的重要步骤。现有方法仅使用TF-IDF额定值来表示UNIGRAM或N-GRAM特征向量。有些方法在使用现有情绪词典时利用杠杆作用,并使用Unigram Sendiment Word的分数作为特征向量,忽略TF-IDF评级。在这项工作中,我们通过从审查中提取情绪单词及其强烈的情绪或否定来构建n克情绪特征。然后从语义UNIGRAM的词典和其增强器或否定构成的n-gram的得分乘以TF-IDF评级以确定特征分数。我们通过考虑来自两个不同集合的训练和测试数据,使用支持向量机和具有跨域验证的最大熵方法进行情绪分类的两个基准数据集。与现有方法相比,通过各种性能测量获得了大量改进。跨域验证确保所提出的方法可以应用于数据集的情绪分类,其中示例模式不可用,这通常是商业数据集的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号