首页> 外文OA文献 >An evolutionary approach for discovering effective composite features for text categorization
【2h】

An evolutionary approach for discovering effective composite features for text categorization

机译:一种发现有效组合特征以进行文本分类的进化方法

摘要

The study of text categorization has assumed special significance in the Internet era in helping us navigate the ocean of web pages and emails that continue to grow in an unrelenting pace. In many previous works on text classifications, it has been shown that composite features consisting of multiple word tokens like statistical phrases can contribute effectively to the classification task. However finding useful composite features through comprehensive search from the vast number of possibilities is often prohibitive in terms of computing resource requirements. In the past, to make the search feasible, we often limit the search space by imposing some parametric constraints like minimum frequency and/or number of words in the composite feature. In this paper we proposed a new evolutionary approach to find effective composite features for classification, an approach that combines probabilistic feature generation with error-biased sampling We demonstrate the effectiveness of our approach using the Reuters-21578 test collection.
机译:文本分类的研究在互联网时代具有特殊意义,它可以帮助我们浏览不断增长的网页和电子邮件。在许多以前的文本分类工作中,已经表明,由多个词标记(如统计短语)组成的复合特征可以有效地促进分类任务。然而,就计算资源需求而言,通过从大量可能性中进行全面搜索来找到有用的合成特征通常是禁止的。过去,为了使搜索可行,我们经常通过在复合特征中施加一些参数约束(例如最小频率和/或单词数)来限制搜索空间。在本文中,我们提出了一种新的进化方法来寻找有效的分类特征,该方法将概率特征生成与误差偏向采样相结合。我们使用Reuters-21578测试集证明了该方法的有效性。

著录项

  • 作者

    Wong AKS; Lee JWT;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号