首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Textual data summarization using the Self-Organized Co-Clustering model
【24h】

Textual data summarization using the Self-Organized Co-Clustering model

机译:使用自组织共聚机模型进行文本数据摘要

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
机译:最近,不同的研究已经证明了使用共聚类,一种数据挖掘技术,其同时产生分类的行簇和特征列簇。本工作引入了一种新颖的共聚类模型,以便以文档术语格式轻松总结文本数据。除了以其他现有算法突出显示均匀的共簇之外,我们还将噪声与重要的共簇区分开,这对于稀疏的文档术语矩阵特别有用。此外,我们的模型提出了重要的共簇之间的结构,从而为用户提供了改进的可解释性。该方法提出符合文档和术语集群的最先进方法,并提供用户友好的结果。该模型依赖于泊松分布和潜伏块模型的受限版本,这是一种用于共聚类的概率方法。提出了一种随机期望 - 最大化算法来运行模型的推论以及模型选择标准,以选择共簇的数量。仿真和实数据集两个都通过其容易地识别相关共簇的能力来说明该模型的效率。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号