A K-partitioning algorithm for clustering large-scale spatio-textual data

Choi Dong-Wan; Chung Chin-Wan

首页> 外文期刊>Information Systems >A K-partitioning algorithm for clustering large-scale spatio-textual data

【24h】

A K-partitioning algorithm for clustering large-scale spatio-textual data

机译：一种大规模空间文本数据聚类的K划分算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The volume of spatio-textual data is drastically increasing in these days, and this makes more and more essential to process such a large-scale spatio-textual dataset. Even though numerous works have been studied for answering various kinds of spatio-textual queries, the analyzing method for spatio-textual data has rarely been considered so far. Motivated by this, this paper proposes a k-means based clustering algorithm specialized for a massive spatio-textual data. One of the strong points of the k-means algorithm lies in its efficiency and scalability, implying that it is appropriate for a large-scale data. However, it is challenging to apply the normal k-means algorithm to spatio-textual data, since each spatio-textual object has non-numeric attributes, that is, textual dimension, as well as numeric attributes, that is, spatial dimension. We address this problem by using the expected distance between a random pair of objects rather than constructing actual centroid of each cluster. Based on our experimental results, we show that the clustering quality of our algorithm is comparable to those of other k-partitioning algorithms that can process spatio-textual data, and its efficiency is superior to those competitors.

机译：如今，时空文本数据量急剧增加，这对于处理如此大规模的时空文本数据集变得越来越重要。尽管已经研究了大量的作品来回答各种时空文本查询，但是迄今为止，很少考虑用于时空文本数据的分析方法。为此，本文提出了一种基于k均值的聚类算法，专门针对大量的时空文本数据。 k均值算法的优势之一在于其效率和可伸缩性，这意味着它适用于大规模数据。但是，将普通的k均值算法应用于时空文本数据具有挑战性，因为每个时空文本对象都具有非数字属性，即文本维，以及数字属性，即空间维。我们通过使用对象的随机对之间的预期距离而不是构造每个群集的实际质心来解决此问题。根据我们的实验结果，我们证明了该算法的聚类质量可与其他可以处理时空文本数据的k分区算法相媲美，并且其效率优于同类竞争者。

著录项

来源
《Information Systems》 |2017年第3期|1-11|共11页
作者
Choi Dong-Wan; Chung Chin-Wan;
展开▼
作者单位

Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada;

Chongqing Univ Technol, Chongqing Liangjiang KAIST Int Program, Chongqing, Peoples R China|Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Spatio-textual similarity; K-means clustering; K-medoids clustering; K-prototypes clustering; Expected distance; Grid partitioning;

机译：时空文本相似度;K-均值聚类;K-medoids聚类;K-原型聚类;期望距离;网格划分;

相似文献

外文文献
中文文献
专利

1. Fuzzy clustering-based skyline query preprocessing algorithm for large-scale flow data analysis [J] . Zeng Yifu, Zhou Yantao, Zhou Xu, Journal of supercomputing . 2020,第2期

机译：基于模糊聚类的天际线查询预处理算法用于大规模流量数据分析
2. Privacy-preserving constrained spectral clustering algorithm for large-scale data sets [J] . Ji Li, Jianghong Wei, Mao Ye, Information Security, IET . 2020,第3期

机译：大型数据集的隐私保留频谱聚类算法
3. A stratified sampling based clustering algorithm for large-scale data [J] . Xingwang Zhao, Jiye Liang, Chuangyin Dang Knowledge-Based Systems . 2019,第JANa1期

机译：基于分层采样的大规模数据聚类算法
4. Probability of large-scale data set EM clustering algorithms based on partial information constraints [C] . Xiaoyan Liu Workshop on Advanced Research and Technology in Industry Applications . 2016

机译：基于部分信息约束的大规模数据集EM聚类算法的概率
5. Efficient Sequence Clustering and Embedding Algorithms for Large-scale Metagenomics Data [D] . Zheng, Wei. 2019

机译：大规模偏心组织数据的高效序列聚类和嵌入算法
6. Genetic weighted k-means algorithm for clustering large-scale gene expression data [O] . Fang-Xiang Wu 2008

机译：遗传加权k均值算法用于大规模基因表达数据的聚类
7. Probability of large-scale data set EM clustering algorithms based on partial information constraints [O] . Xiao yan Liu 2016

机译：基于部分信息约束的大规模数据集EM聚类算法的概率

A K-partitioning algorithm for clustering large-scale spatio-textual data

摘要

著录项

相似文献

相关主题

期刊订阅