【24h】

Finding hierarchical structures of document collections By using tolerance relations

机译:通过公差关系查找文档集合的层次结构

获取原文
获取原文并翻译 | 示例

摘要

We develop a hierarchical clustering algorithm based on Tolerance Rough Set Model (TRSM). Text clustering is one of ways to find the structure of the text collection. The quality of text clustering depends not only on the clustering algorithm but also on the document representation model. We aim to enrich representations concerning with documents and their distance according to semantic relations introduced by TRSM. The model offers a way of considering semantics relatedness between documents. It is an extension of the equivalence rough set model by employing tolerance relations instead of equivalence relations. The main advantages of the proposed model are it is more appropriate for textual data and the computation can be done efficiently. Based on the tolerance rough set model, we develop a hierarchical document clustering algorithm. The algorithm is evaluated and validated experimentally on test collections. The results suggest that this clustering algorithm can be well adapted to text mining.
机译:我们开发了基于公差粗糙集模型(TRSM)的分层聚类算法。文本聚类是查找文本集合结构的方法之一。文本聚类的质量不仅取决于聚类算法,还取决于文档表示模型。我们旨在根据TRSM引入的语义关系来丰富与文档有关的表示形式及其距离。该模型提供了一种考虑文档之间语义相关性的方法。它是通过采用公差关系而不是等价关系对等价粗糙集模型的扩展。该模型的主要优点是它更适合于文本数据,并且可以高效地进行计算。基于容差粗糙集模型,我们开发了一种分层文档聚类算法。该算法在测试集合上进行了实验评估和验证。结果表明,该聚类算法可以很好地适应文本挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号