首页> 外文会议>International conference on the computer science and engineering >A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques
【24h】

A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

机译:一种新的加权方案,适用于改进文本文档聚类技术

获取原文

摘要

Text clustering is an efficient analysis technique used in the domain of the text mining to arrange a huge of unorganized text documents into a subset of coherent clusters. Where, the similar documents in the same cluster. In this paper, we proposed a novel term weighting scheme, namely, length feature weight (LFW), to improve the text document clustering algorithms based on new factors. The proposed scheme assigns a favorable term weight according to the obtained information from the documents collection. It recognizes the terms which are particular to each cluster and enhances their weights based on the proposed factors at the level of the document. P-hill climbing technique is used to validate the proposed scheme in the text clustering. The proposed weight scheme is compared with the existing weight scheme (TF-IDF) to validate its results in that domain. Experiments are conducted on eight standard benchmark text datasets taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed weighting scheme LFW overcomes the existing weighting scheme and enhances the result of text document clustering technique in terms of the F-measure, precision, and recall.
机译:文本聚类是文本挖掘领域中使用的一种有效的分析技术,以将大量未经组织的文本文档安排到连贯群集群的子集中。其中,同一群集中的类似文档。在本文中,我们提出了一种新的术语加权方案,即长度特征权重(LFW),以改进基于新因素的文本文档聚类算法。拟议的计划根据文件收集所获得的信息分配有利的术语重量。它识别特定于每个群集的术语,并根据文档级别的提出因素增强其权重。 P-Hill攀登技术用于验证文本聚类中的提出方案。将所提出的重量方案与现有权重方案(TF-IDF)进行比较,以验证该域中的结果。实验是在从计算智能实验室(Labic)的八个标准基准文本数据集上进行的。结果证明,所提出的加权方案LFW克服了现有的加权方案,并在F测量,精度和召回方面增强了文本文档聚类技术的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号