首页> 中文期刊>信息技术 >基于MapReduce的改进k-means文本聚类算法

基于MapReduce的改进k-means文本聚类算法

     

摘要

Under the circumstances when traditional k-means clustering algorithm has a scalability problem when dealing with large-scale text data, the author proposed a parallel k-means text clustering algorithm based on MapReduce. This paper improved the clustering effect by removing the outliers and using a good way to look for the initial centroids, and then it improved the scalability by designing a parallel large-scale text clustering model based on MapReduce. The experiments indicate that this algorithm has good clustering effect and scalability in dealing with large clustering text.%针对传统k-means文本聚类算法在处理大规模文本数据时扩展性不足的问题,提出了基于MapReduce编程模型的并行k-means文本聚类算法.通过删除离群点和采用高效的初始质心选择策略提高k-means聚类效果,并设计基于MapReduce框架的大规模文本并行聚类模型提高算法的可扩展性.实验证明,该算法在大规模文本聚类中具有良好的聚类效果和可扩展性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号