首页> 外文会议>IEEE Conference on Industrial Electronics and Applications >Clustering based semantic data summarization technique: A new approach
【24h】

Clustering based semantic data summarization technique: A new approach

机译:基于聚类的语义数据汇总技术:一种新方法

获取原文

摘要

Due to advancement of computing and proliferation of data repositories, efficient data mining techniques are required to extract meaningful information. Summarization is such an important data analysis technique which can be broadly classified into two categories as semantic and syntactic methods. Syntactic methods consider a dataset as a sequence of bytes whereas semantic methods convert large dataset into a much smaller one yet maintaining low information loss. Clustering algorithms are widely used for semantic summarization such as basic k-means. Existing clustering based summarization techniques assume that a summary is represented using the cluster centroids. However, the centroids might not represent the actual data points in summary. In addition, many clustering algorithms, such as the most popular k-means algorithm requires the number of clusters as an input, which is not available for unsupervised summarization of unlabeled data. To address these issues, we propose a clustering based semantic summarization using a combination of x-means and k-medoid clustering algorithms. Our experimental analysis shows that, the proposed algorithm outperforms k-means based summarization techniques.
机译:由于计算的进步和数据存储库的激增,需要有效的数据挖掘技术来提取有意义的信息。摘要是一种重要的数据分析技术,可以大致分为语义和句法两大类。句法方法将数据集视为字节序列,而语义方法将大数据集转换为小得多的数据集,同时又保持较低的信息丢失率。聚类算法被广泛用于语义汇总,例如基本k均值。现有的基于聚类的摘要技术假定使用聚类质心表示摘要。但是,质心可能不代表摘要中的实际数据点。另外,许多聚类算法(例如最流行的k-means算法)需要将聚类的数量作为输入,这不适用于无标签数据的无监督汇总。为了解决这些问题,我们提出了一种结合使用x均值和k-medoid聚类算法的基于聚类的语义摘要。我们的实验分析表明,该算法优于基于k均值的摘要技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号