首页> 外文会议>International Conference on Advanced Informatics: Concept Theory and Applications >Classifying Positive or Negative Text Using Features Based on Opinion Words and Term Frequency - Inverse Document Frequency
【24h】

Classifying Positive or Negative Text Using Features Based on Opinion Words and Term Frequency - Inverse Document Frequency

机译:使用基于意见词和术语频率的特征对正负文本进行分类-逆文档频率

获取原文

摘要

The contents in website and social networks are rapidly generated. The opinions and reviews can be analyzed and classified into two classes, positive or negative opinions, by machine learning methods. However, the main issue is how to representing each text as a proper set of variables, a p-feature vector, so that the successful classifiers can be obtained by one of the supervised learning approaches with its suitable parameter setting. In this study, a two-feature vector representing positive and negative moods in each text was prepared by using lists of positive and negative words, and then combined with term frequency - inverse document frequency (TF-IDF) features. kNN and SVM classifiers were comparatively built by this set and also other baseline set to predict each test vector and measure their effectiveness. Data of text Reviews from Yelp, Amazon and IMDB, were experimented with 10-fold cross validation in parameter variation and feature set reduction using PCA. The best Accuracy results across these three datasets, ~0.81-0.87, were yielded by SVM classifiers with each size of the reduced feature sets that is very smaller than the original size.
机译:网站和社交网络中的内容快速生成。意见和评论可以通过机器学习方法进行分析,分为正面或负面意见两类。但是,主要问题是如何将每个文本表示为一组适当的变量(p特征向量),以便通过一种具有适当参数设置的监督学习方法,可以获得成功的分类器。在这项研究中,通过使用正词和负词的列表,然后结合术语频率-逆文档频率(TF-IDF)功能,准备了代表每个文本中正负情绪的两特征向量。 kNN和SVM分类器通过此集和其他基线集进行了比较构建,以预测每个测试向量并衡量其有效性。 Yelp,Amazon和IMDB的文本评论数据使用PCA在参数变化和特征集简化方面进行了10倍交叉验证,并进行了实验。 SVM分类器产生的这三个数据集的最佳准确性结果约为0.81-0.87,而简化特征集的每个大小都比原始大小小得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号