首页> 外文会议>IEEE International Congress on Big Data >Scalable k-NN based text clustering
【24h】

Scalable k-NN based text clustering

机译:基于可扩展的K-NN文本群集

获取原文

摘要

Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an approximate k-NN graph, which is then used to compute connected components representing clusters. Our focus is to understand the scalability / accuracy tradeoff that underlies our method: we do so through an extensive experimental campaign, where we use real-life datasets, and show that even rough approximations of k-NN graphs are sufficient to identify valid clusters. Our method is scalable and can be easily tuned to meet requirements stemming from different application domains.
机译:使用文本功能的聚类项目是许多应用程序的重要问题,例如垃圾邮件广告系列的根本原因分析,以及识别社交媒体中的常见主题。由于此类数据的庞大规模,算法可伸缩性成为主要问题。在这项工作中,我们介绍了构建近似K-NN图形的文本群集的方法,然后将其用于计算表示群集的连接组件。我们的重点是了解我们的方法下潜的可扩展性/准确性权衡:我们通过广泛的实验活动来这样做,我们使用现实生活数据集,并表明甚至粗略近似的K-NN图形是足以识别有效簇的。我们的方法是可扩展的,可以轻松调整以满足不同应用域的要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号