首页> 外文会议>International Conference on Computing, Communication and Automation >An ensemble based NLP feature assessment in binary classification
【24h】

An ensemble based NLP feature assessment in binary classification

机译:二进制分类中基于整体的NLP特征评估

获取原文

摘要

Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is important to extract important terms from a document. In this paper, we examine the impact of NLP features (stop words, stemmer and combination of both) on predictive performance of base classifiers and ensembles of Naive Bayesian category. We selected different category of base classifier like NB, SVM, KNN and J48 as these are frequently used by the researchers in text mining. IMBD movie review dataset is used as a standard dataset for experimental work. We prepared ensembles of Naive Bayesian with base classifiers and found ensemble gives better performance over the base classifiers with entire NLP categorical dataset. Ensemble of NB with SVM out performed among other ensembles with different categorical dataset.
机译:文本特征选择在文本挖掘中起着重要作用。术语是文档表示中的关键角色。文档表示可以帮助在以下领域中应用:索引编制,摘要,分类,聚类和过滤。文本实例面临着高维特征空间的挑战,使用这些特征在文本分析中可能非常有用。因此,从文档中提取重要术语很重要。在本文中,我们研究了NLP特征(停用词,词干和二者的组合)对朴素贝叶斯类别的基础分类器和合奏的预测性能的影响。我们选择了不同类别的基础分类器,如NB,SVM,KNN和J48,这是研究人员在文本挖掘中经常使用的分类器。 IMBD电影评论数据集用作实验工作的标准数据集。我们使用基本分类器准备了朴素贝叶斯合奏,发现与整个NLP分类数据集相比,集成能提供比基本分类器更好的性能。带有SVM的NB集成与其他具有不同分类数据集的集成一起执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号