Integrating LDA with Clustering Technique for Relevance Feature Selection

机译：将LDA与聚类技术集成为相关性特征选择

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Selecting features from documents that describe user information needs is challenging due to the nature of text, where redundancy, synonymy, polysemy, noise and high dimensionality are common problems. The assumption that clustered documents describe only one topic can be too simple knowing that most long documents discuss multiple topics. LDA-based models show significant improvement over the cluster-based in information retrieval (IR). However, the integration of both techniques for feature selection (FS) is still limited. In this paper, we propose an innovative and effective cluster- and LDA-based model for relevance FS. The model also integrates a new extended random set theory to generalise the LDA local weights for document terms. It can assign a more discriminative weight to terms based on their appearance in LDA topics and the clustered documents. The experimental results, based on the RCV1 dataset and TREC topics for information filtering (IF), show that our model significantly outperforms eight state-of-the-art baseline models in five standard performance measures.

机译：根据文本的性质，选择来自描述用户信息需求的文档的功能是挑战，其中冗余，同义词，多义，噪音和高维度是常见问题。假设群集文档只描述一个主题的假设可以太简单了解大多数长文件讨论多个主题。基于LDA的模型在基于集群的信息检索（IR）中显示出显着改善。然而，两种特征选择（FS）技术的集成仍然有限。在本文中，我们提出了一种创新和有效的基于集群和LDA的相关性FS模型。该模型还集成了新的扩展随机集理论，以概括文档术语的LDA本地权重。它可以根据其在LDA主题和群集文档中的外观为术语分配更辨别的权重。实验结果，基于RCV1数据集和TREC主题用于信息过滤（IF），表明我们的模型在五种标准性能措施中显着优于八种最先进的基线模型。

著录项

来源
《Australasian Joint Conference on Artiﬁcial Intelligence》|2017年|376p|共13页
会议地点
作者
Abdullah Semran Alharbi; Yuefeng Li; Yue Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Feature selection; Term weighting; LDA; Extended random set; Intra-Inter-cluster features; Information filtering;

机译：特征选择;术语加权;LDA;延长随机集;帧内帧内功能;信息过滤;

相似文献

外文文献
中文文献
专利

1. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data [J] . Chia Huey Ooi, Madhu Chetty, Shyh Wei Teng BMC Bioinformatics . 2006,第1期

机译：基于相关性的多类基因表达数据特征选择技术中相关性和冗余之间的区分优先级
2. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering [J] . Kusum Kumari Bharti, Pramod Kumar Singh Expert Systems with Application . 2015,第6期

机译：通过将特征选择与特征提取方法集成来进行文本聚类的混合降维
3. Integration of dense subgraph finding with feature clustering for unsupervised feature selection [J] . Sanghamitra Bandyopadhyay, Tapas Bhadra, Pabitra Mitra, Pattern recognition letters . 2014,第APRa15期

机译：集成密集子图查找与特征聚类，实现无监督特征选择
4. Integrating LDA with Clustering Technique for Relevance Feature Selection [C] . Abdullah Semran Alharbi, Yuefeng Li, Yue Xu Australasian joint conference on artificial intelligence . 2017

机译：将LDA与聚类技术集成以进行相关特征选择
5. Feature extraction and clustering techniques for digital image forensics. [D] . Alfraih, Areej Sulaiman. 2015

机译：数字图像取证的特征提取和聚类技术。
6. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data [O] . Chia Huey Ooi, Madhu Chetty, Shyh Wei Teng 2006

机译：基于相关性的多类基因表达数据特征选择技术中相关性和冗余之间的差异优先级
7. Integrating LDA with Clustering Technique for Relevance Feature Selection [O] . Abdullah Semran Alharbi, Yuefeng Li, Yue Xu 2017

机译：将LDA与聚类技术集成为相关性特征选择
8. Improved Feature Extraction, Feature Selection, and Identification Techniques That Create a Fast Unsupervised Hyperspectral Target Detection Algorithm [R] . Johnson, R. J. 2008

机译：改进的特征提取，特征选择和识别技术，创建快速无监督的高光谱目标检测算法

Integrating LDA with Clustering Technique for Relevance Feature Selection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅