首页> 外文会议>2009 International Conference on Machine Learning and Cybernetics(2009机器学习与控制论国际会议)论文集 >MINING THE HOTTEST TOPICS ON CHINESE WEBPAGE BASED ON THE IMPROVED K-MEANS PARTITIONING
【24h】

MINING THE HOTTEST TOPICS ON CHINESE WEBPAGE BASED ON THE IMPROVED K-MEANS PARTITIONING

机译:基于改进的K均值分割的中文网页挖掘主题

获取原文

摘要

This paper presents a new method for the mining the hottest topics on Chinese webpage which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between words and integers is created by coding words. Then the title is expressed by integer set, and the cost of space and time for clustering is decreased largely. Determining the value of k is a shortcoming of stream data mining based on k-means. By this new method, the value of k is adjusted in clustering. Then both the accuracy and the speed are improved.
机译:本文提出了一种基于改进的k均值划分算法的中文网页最热门主题挖掘方法。通过删除对聚类无用的单词来减少应用于单词分割的字典,并且创建字典树以应用于单词分割。这样就提高了分词的速度。单词和整数之间的对应关系是通过对单词进行编码来创建的。然后用整数集表示标题,并大大减少了聚类的空间和时间。确定k的值是基于k均值的流数据挖掘的缺点。通过这种新方法,可以在聚类中调整k的值。这样就提高了精度和速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号