首页> 外文会议>Australasian Joint Conference on Artificial Intelligence >Integrating LDA with Clustering Technique for Relevance Feature Selection
【24h】

Integrating LDA with Clustering Technique for Relevance Feature Selection

机译:将LDA与聚类技术集成为相关性特征选择

获取原文
获取外文期刊封面目录资料

摘要

Selecting features from documents that describe user information needs is challenging due to the nature of text, where redundancy, synonymy, polysemy, noise and high dimensionality are common problems. The assumption that clustered documents describe only one topic can be too simple knowing that most long documents discuss multiple topics. LDA-based models show significant improvement over the cluster-based in information retrieval (IR). However, the integration of both techniques for feature selection (FS) is still limited. In this paper, we propose an innovative and effective cluster- and LDA-based model for relevance FS. The model also integrates a new extended random set theory to generalise the LDA local weights for document terms. It can assign a more discriminative weight to terms based on their appearance in LDA topics and the clustered documents. The experimental results, based on the RCV1 dataset and TREC topics for information filtering (IF), show that our model significantly outperforms eight state-of-the-art baseline models in five standard performance measures.
机译:根据文本的性质,选择来自描述用户信息需求的文档的功能是挑战,其中冗余,同义词,多义,噪音和高维度是常见问题。假设群集文档只描述一个主题的假设可以太简单了​​解大多数长文件讨论多个主题。基于LDA的模型在基于集群的信息检索(IR)中显示出显着改善。然而,两种特征选择(FS)技术的集成仍然有限。在本文中,我们提出了一种创新和有效的基于集群和LDA的相关性FS模型。该模型还集成了新的扩展随机集理论,以概括文档术语的LDA本地权重。它可以根据其在LDA主题和群集文档中的外观为术语分配更辨别的权重。实验结果,基于RCV1数据集和TREC主题用于信息过滤(IF),表明我们的模型在五种标准性能措施中显着优于八种最先进的基线模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号