Clustering based semantic data summarization technique: A new approach

机译：基于聚类的语义数据汇总技术：一种新方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to advancement of computing and proliferation of data repositories, efficient data mining techniques are required to extract meaningful information. Summarization is such an important data analysis technique which can be broadly classified into two categories as semantic and syntactic methods. Syntactic methods consider a dataset as a sequence of bytes whereas semantic methods convert large dataset into a much smaller one yet maintaining low information loss. Clustering algorithms are widely used for semantic summarization such as basic k-means. Existing clustering based summarization techniques assume that a summary is represented using the cluster centroids. However, the centroids might not represent the actual data points in summary. In addition, many clustering algorithms, such as the most popular k-means algorithm requires the number of clusters as an input, which is not available for unsupervised summarization of unlabeled data. To address these issues, we propose a clustering based semantic summarization using a combination of x-means and k-medoid clustering algorithms. Our experimental analysis shows that, the proposed algorithm outperforms k-means based summarization techniques.

机译：由于计算的进步和数据存储库的激增，需要有效的数据挖掘技术来提取有意义的信息。摘要是一种重要的数据分析技术，可以大致分为语义和句法两大类。句法方法将数据集视为字节序列，而语义方法将大数据集转换为小得多的数据集，同时又保持较低的信息丢失率。聚类算法被广泛用于语义汇总，例如基本k均值。现有的基于聚类的摘要技术假定使用聚类质心表示摘要。但是，质心可能不代表摘要中的实际数据点。另外，许多聚类算法（例如最流行的k-means算法）需要将聚类的数量作为输入，这不适用于无标签数据的无监督汇总。为了解决这些问题，我们提出了一种结合使用x均值和k-medoid聚类算法的基于聚类的语义摘要。我们的实验分析表明，该算法优于基于k均值的摘要技术。

著录项

来源
《IEEE Conference on Industrial Electronics and Applications》|2014年|1780-1785|共6页
会议地点
作者
Ahmed Mohiuddin; Mahmood Abdun Naser;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Conferences; Decision support systems; Industrial electronics; Clustering; Data Summarization;

机译：会议;决策支持系统;工业电子产品;集群;数据汇总;

相似文献

外文文献
中文文献
专利

1. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method [J] . Illhoi Yoo, Xiaohua Hu, Il-Yeol Song BMC Bioinformatics . 2007,第SUPPLEMENTa9期

机译：基于相干图的生物医学文献语义聚类和总结方法及新的评价方法
2. Big Data Summarization Using Novel Clustering Algorithm and Semantic Feature Approach [J] . Shilpa G. Kolte, Jagdish W. Bakal International Journal of Rough Sets and Date Analysis . 2017,第3期

机译：新型聚类算法和语义特征方法的大数据汇总
3. Sentence Embedding Based Semantic Clustering Approach for Discussion Thread Summarization [J] . Atif Khan, Qaiser Shah, M. Irfan Uddin, Complexity . 2020,第1期

机译：基于语句嵌入的语义聚类方法讨论线程汇总
4. Clustering based semantic data summarization technique: A new approach [C] . Ahmed Mohiuddin, Mahmood Abdun Naser IEEE Conference on Industrial Electronics and Applications . 2014

机译：基于聚类的语义数据摘要技术：一种新方法
5. High-Dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products. [D] . Zhou, Dunke. 2012

机译：高维数据聚类和基于聚类的数据汇总产品的统计分析。
6. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method [O] . Illhoi Yoo, Xiaohua Hu, Il-Yeol Song 2007

机译：基于相干图的生物医学文献语义聚类和总结方法及新的评价方法
7. A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method [O] . Illhoi Yoo, Xiaohua Hu, Il-Yeol Song 2007

机译：基于相干图的生物医学文献语义聚类和总结方法及新的评价方法

Clustering based semantic data summarization technique: A new approach

摘要

著录项

相似文献

相关主题

期刊订阅