【24h】

A Clustering Algorithm for Asymmetrically Related Data with Applications to Text Mining

机译:一种非对称相关数据的聚类算法及其在文本挖掘中的应用

获取原文
获取原文并翻译 | 示例

摘要

Clustering techniques find a collection of subsets of a data set such that the collection satisfies a criterion that is dependent on a relation defined on the data set. The underlying relation is traditionally assumed to be symmetric. However, there exist many practical scenarios where the underlying relation is asymmetric. One example of an asymmetric relation in text analysis is the inclusion relation, i.e., the inclusion of the meaning of a block of text in the meaning of another block. In this paper, we consider the general problem of clustering of asymmetrically related data and propose an algorithm to cluster such data. To demonstrate its usefulness, we consider two applications in text mining: (1) summarization of short documents, and (2) generation of a concept hierarchy from a set of documents. Our experiments show that the performance of the proposed algorithm is superior to that of more traditional algorithms.
机译:聚类技术找到数据集的子集的集合,以使该集合满足依赖于在数据集上定义的关系的标准。传统上将基础关系假定为对称的。但是,存在许多实际情况,其中基础关系是不对称的。文本分析中不对称关系的一个示例是包含关系,即以另一个块的含义包含一个文本块的含义。在本文中,我们考虑了非对称相关数据聚类的一般问题,并提出了一种对此类数据进行聚类的算法。为了证明其有用性,我们考虑了文本挖掘中的两个应用程序:(1)简短文档摘要,以及(2)从一组文档生成概念层次结构。我们的实验表明,所提算法的性能优于传统算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号