Unsupervised Feature Selection for Text Data

机译：文本数据的无监督特征选择

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature selection for unsupervised tasks is particularly challenging, especially when dealing with text data. The increase in online documents and email communication creates a need for tools that can operate without the supervision of the user. In this paper we look at novel feature selection techniques that address this need. A distributional similarity measure from information theory is applied to measure feature utility. This utility informs the search for both representative and diverse features in two complementary ways: CLUSTER divides the entire feature space, before then selecting one feature to represent each cluster; and GREEDY increments the feature subset size by a greedily selected feature. In particular we found that Greedy's local search is suited to learning smaller feature subset sizes while Cluster is able to improve the global quality of larger feature sets. Experiments with four email data sets show significant improvement in retrieval accuracy with nearest neighbour based search methods compared to an existing frequency-based method. Importantly both GREEDY and Cluster make significant progress towards the upper bound performance set by a standard supervised feature selection method.

机译：无人监督任务的特征选择特别具有挑战性，尤其是在处理文本数据时。在线文档和电子邮件通信的增长导致人们需要一种无需用户监督即可运行的工具。在本文中，我们着眼于满足这一需求的新颖特征选择技术。信息论中的分布相似性度量被用于度量特征效用。该实用程序通过两种互补的方式通知搜索代表性特征和多样化特征：CLUSTER划分了整个特征空间，然后选择一个特征表示每个聚类； GREEDY通过贪婪选择的特征来增加特征子集的大小。特别是，我们发现Greedy的本地搜索适合于学习较小的特征子集大小，而Cluster可以提高较大特征集的整体质量。与四个基于电子邮件的数据集进行的实验表明，与现有基于频率的方法相比，基于最近邻居的搜索方法在检索准确性方面有了显着提高。重要的是，GREEDY和Cluster都在通过标准的受监督特征选择方法设定的上限性能方面取得了重大进展。

著录项

来源
《European Conference on Advances in Case-Based Reasoning(ECCBR 2006); 20060904-07; Fethiye(TR)》|2006年|340-354|共15页
会议地点 Fethiye(TR)
作者
Nirmalie Wiratunga; Rob Lothian; Stewart Massie;
展开▼
作者单位

School of Computing, The Robert Gordon University, Aberdeen AB25 1HG, Scotland, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering [J] . Abualigah Laith Mohammad, Khader Ahamad Tajudin Journal of supercomputing . 2017,第11期

机译：基于混合遗传算法和遗传算子的无监督文本特征选择技术
2. Helmholtz principle based supervised and unsupervised feature selection methods for text mining [J] . Melike Tutkan, Murat Can Ganiz, Selim Akyokus Information Processing & Management . 2016,第5期

机译：基于亥姆霍兹原理的文本挖掘中有监督和无监督特征选择方法
3. A new unsupervised feature selection method for text clustering based on genetic algorithms [J] . Pirooz Shamsinejadbabki, Mohammad Saraee Journal of Intelligent Information Systems . 2012,第3期

机译：基于遗传算法的文本聚类无监督特征选择新方法
4. Unsupervised Feature Selection for Text Data [C] . Nirmalie Wiratunga, Rob Lothian, Stewart Massie European Conference on Advances in Case-Based Reasoning(ECCBR 2006); 20060904-07; Fethiye(TR) . 2006

机译：文本数据的无监督特征选择
5. Unsupervised data mining methods for functional data analysis and feature selection. [D] . Rattakorn, Panaya. 2009

机译：用于功能数据分析和特征选择的无监督数据挖掘方法。
6. The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis [O] . Juanying Xie, Mingzhao Wang, Shengquan Xu, 2021

机译：基于标准偏差和基因组数据分析的余弦相似性的无监督特征选择算法
7. Unsupervised Feature Selection for Text Data [O] . Nirmalie Wiratunga, Rob Lothian, Stewart Massie 2006

机译：文本数据的无监督特征选择
8. Unsupervised Feature Selection on Data Streams. [R] . H., H., S., Y., Kasiviswanathan, S. 2015

机译：数据流上的无监督特征选择。

Unsupervised Feature Selection for Text Data

摘要

著录项

相似文献

相关主题

期刊订阅