首页> 外文会议>Innovations in Information Technology >A Feature Reduction Technique for Improved Web Page Clustering
【24h】

A Feature Reduction Technique for Improved Web Page Clustering

机译:一种改进网页聚类的特征缩减技术

获取原文

摘要

This paper presents a new approach for text feature reduction that can be used to speed up web page clustering. The technique is based on using a classified corpus in order to build a dictionary that captures the importance of various terms in different categories. The dictionary is then used to translate an input document's feature vector into a smaller one. Two experiments carried out in order to evaluate this technique are also presented. The evaluation results show that when used, the presented technique results in much faster and more accurate clustering, than when it is not. They also show that despite being simpler, the presented technique can give results comparable to those of currently widely used feature reduction techniques.
机译:本文提出了一种新方法,可用于加快网页群集的文本特征减少方法。该技术基于使用分类的语料库,以构建捕获不同类别中各种术语的重要性的字典。然后将字典用于将输入文档的特征向量转换为较小的字体。还介绍了用于评估该技术的两个实验。评估结果表明,当使用时,所呈现的技术会导致更快,更准确的聚类,而不是什么时候。他们还表明,尽管更简单,所呈现的技术可以提供与目前广泛使用的特征减少技术相当的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号