【24h】

Fuzzy discrete correlation for document clustering

机译:用于文档聚类的模糊离散相关

获取原文

摘要

Nowadays, there is an enormous growth in the quantity of text documents on the Internet, digital libraries and news sources. This has led to an increased interest in developing methods that help users to effectively navigate, summarize, and organize this information. A new method that uses neighbor and link concepts has more suitable performance than previous methods in this field. Two documents are neighbors if their similarity is more than a defined threshold. If they are neighbors, neighbor matrix element is set to one, otherwise it is set to zero. So we lose some information about documents similarity in it and therefore decrease of accuracy. To overcome this problem, we propose two methods of “discrete correlation” and “fuzzy correlation”, which both of them attempt to accurate neighbor definition more and more and so reach better clustering results. To evaluate our work, we used k-means algorithm to determine the initial cluster centers and similarity criteria between documents and centers. The results of applying proposed method on real-world document data sets by information retrieval factors show better performance than traditional algorithms and previous works.
机译:如今,Internet,数字图书馆和新闻来源上的文本文档数量有了巨大的增长。这导致人们对开发帮助用户有效导航,汇总和组织此信息的方法的兴趣日益浓厚。使用邻居和链接概念的新方法比该领域中的先前方法具有更合适的性能。如果两个文档的相似度超过定义的阈值,则它们是邻居。如果它们是邻居,则将邻居矩阵元素设置为1,否则将其设置为0。因此,我们会丢失一些有关文档相似性的信息,因此会降低准确性。为了克服这个问题,我们提出了“离散相关”和“模糊相关”两种方法,它们都试图越来越精确地定义邻居,从而获得更好的聚类结果。为了评估我们的工作,我们使用k-means算法来确定初始聚类中心以及文档和中心之间的相似性标准。通过信息检索因素将方法应用于现实世界文档数据集的结果显示,其性能优于传统算法和先前的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号