首页> 外文会议>IEEE Conference on Industrial Electronics and Applications >Clustering based semantic data summarization technique: A new approach
【24h】

Clustering based semantic data summarization technique: A new approach

机译:基于聚类的语义数据摘要技术:一种新方法

获取原文

摘要

Due to advancement of computing and proliferation of data repositories, efficient data mining techniques are required to extract meaningful information. Summarization is such an important data analysis technique which can be broadly classified into two categories as semantic and syntactic methods. Syntactic methods consider a dataset as a sequence of bytes whereas semantic methods convert large dataset into a much smaller one yet maintaining low information loss. Clustering algorithms are widely used for semantic summarization such as basic k-means. Existing clustering based summarization techniques assume that a summary is represented using the cluster centroids. However, the centroids might not represent the actual data points in summary. In addition, many clustering algorithms, such as the most popular k-means algorithm requires the number of clusters as an input, which is not available for unsupervised summarization of unlabeled data. To address these issues, we propose a clustering based semantic summarization using a combination of x-means and k-medoid clustering algorithms. Our experimental analysis shows that, the proposed algorithm outperforms k-means based summarization techniques.
机译:由于数据存储库的计算和增殖的进步,需要有效的数据挖掘技术来提取有意义的信息。摘要是这种重要数据分析技术,可以广泛分为两类作为语义和句法方法。语法方法将数据集视为一系列字节序列,而语义方法将大型数据集转换为更小的一个但保持低信息丢失。聚类算法广泛用于语义摘要,例如基本k均值。基于群集的综合概述技术假设使用群集质心表示摘要。但是,质心可能不代表摘要中的实际数据点。此外,许多聚类算法,例如最流行的K-means算法需要群集的数量作为输入,这不可用于未经标记数据的无监督摘要。为解决这些问题,我们使用X-Means和K-METOID聚类算法的组合提出基于聚类的语义摘要。我们的实验分析表明,所提出的算法优于基于K型概略的概述技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号