首页> 外文期刊>Journal of information and computational science >Study on the Application of Feature Selection for Big Text Data Using Expected Cross Entropy
【24h】

Study on the Application of Feature Selection for Big Text Data Using Expected Cross Entropy

机译:期望交叉熵在大文本数据特征选择中的应用研究

获取原文
获取原文并翻译 | 示例

摘要

In practice, microblog text data are sparse, high-dimensional and large and they are called microblog big text data in this paper. Microblog big text data are one of the important web data sources containing a wealth of user information. This paper studies feature selection for big text data in microblog. This study is meaningful because many features are no useful to text classification. And it is able to reduce dimensions so as to reduce data complexities. This paper introduces Expected Cross Entropy (ECE) to select effective features from microblog text big data and at the same time, variance score (VS), information gain (IG) and mutual information (MI) are also applied to feature selection for big text data in microblog. The four methods are compared using model classification accuracy. It is indicated in the experiments that ECE performs better than the other three methods in feature selection in microblog big text data.
机译:在实践中,微博客文本数据稀疏,高维,大,在本文中被称为微博客大文本数据。微博大文本数据是包含大量用户信息的重要Web数据源之一。本文研究了微博中大文本数据的特征选择。这项研究是有意义的,因为许多功能对于文本分类没有用。而且它能够减小尺寸,从而降低数据复杂性。本文介绍了期望交叉熵(ECE)从微博文本大数据中选择有效特征,同时,方差分数(VS),信息增益(IG)和互信息(MI)也被用于大文本的特征选择。微博中的数据。使用模型分类精度比较这四种方法。实验表明,ECE在微博大文本数据特征选择方面比其他三种方法表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号