首页> 外文会议>International Conference on Advanced Science and Engineering >Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF
【24h】

Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF

机译:Twitter上的特征提取的术语加权:BM25和TF-IDF之间的比较

获取原文
获取外文期刊封面目录资料

摘要

Feature extraction is to transform a text document from any format into a list of features that can be easily processed by text classification techniques. Feature extraction is one of significant preprocessing techniques in data mining and text classification that computes features value in documents. Hence, efficient feature extraction techniques like the BM25 and term frequency-inverse document frequency (TF-IDF) techniques are normally utilized in term weighting. Nevertheless, BM25 is not a single function that is utilized to exceedingly correct very long documents. This problem cannot denote the helpfulness or importance of confident features, and decreases the efficiency of classification. This paper presents a comparative study of feature extraction techniques. Two techniques were evaluated BM25 and TF-IDF to weight the terms on Twitter. In this paper, TF-IDF feature extraction technique is presented to compare between the two techniques. The experiments show that TF-IDF improves the performance evaluation of feature extraction according to the maximum value of F1-measure is 89.77 for TF-IDF and 89.16 for BM25.
机译:特征提取是将文本文档从任何格式转换为可以通过文本分类技术轻松处理的功能列表中。特征提取是数据挖掘和文本分类中的重要预处理技术之一,可以计算文档中的特征值。因此,通常使用类似BM25和术语频率 - 逆文档频率(TF-IDF)技术的有效特征提取技术,例如术语加权。然而,BM25不是一个单一的函数,用于非常纠正很长的文件。这个问题不能表示自信功能的乐于助人或重要性,并降低分类的效率。本文介绍了特征提取技术的比较研究。评估两种技术BM25和TF-IDF重量Twitter上的术语。在本文中,提出了TF-IDF特征提取技术以比较两种技术。实验表明,TF-IDF改善了根据F1-Meast的最大值的特征提取的性能评估为89.77,用于TF-IDF和BM25的89.16。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号