...
首页> 外文期刊>Information Systems >A K-partitioning algorithm for clustering large-scale spatio-textual data
【24h】

A K-partitioning algorithm for clustering large-scale spatio-textual data

机译:一种大规模空间文本数据聚类的K划分算法

获取原文
获取原文并翻译 | 示例
           

摘要

The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k-means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k-means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k-means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k-partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors.
机译:如今,时空文本数据量急剧增加,这对于处理如此大规模的时空文本数据集变得越来越重要。尽管已经研究了大量的作品来回答各种时空文本查询,但是迄今为止,很少考虑用于时空文本数据的分析方法。为此,本文提出了一种基于k均值的聚类算法,专门针对大量的时空文本数据。 k均值算法的优势之一在于其效率和可伸缩性,这意味着它适用于大规模数据。但是,将普通的k均值算法应用于时空文本数据具有挑战性,因为每个时空文本对象都具有非数字属性,即文本维,以及数字属性,即空间维。我们通过使用对象的随机对之间的预期距离而不是构造每个群集的实际质心来解决此问题。根据我们的实验结果,我们证明了该算法的聚类质量可与其他可以处理时空文本数据的k分区算法相媲美,并且其效率优于同类竞争者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号