首页> 外文会议>International Conference on Intelligent Computing and Control Systems >An Improved Document Clustering Approach with Multi-Viewpoint Based on Different Similarity Measures
【24h】

An Improved Document Clustering Approach with Multi-Viewpoint Based on Different Similarity Measures

机译:一种基于不同相似度的改进的多视角文档聚类方法

获取原文

摘要

Electronic information such as online newspapers, journals, conference proceedings, Web sites, e-mails, etc. They are-growing very fast in extremely large amount. Using all this electronic information controlling, indexing or searching is not possible for human and for search engines also for such a huge amount of large data. Therefore, automatic document organization become a critical issue. With the help of document clustering methods, we can understand data distribution or we can preprocess data for other applications. For an instance, Search engine can produce results more effectively and efficiently if a search engine uses documents those are clustered to search an item or data. Document clustering is an automatic clustering operation and also it is a technique of an unsupervised learning. It combines related documents in one cluster and unrelated documents in different clusters so each cluster consist of documents that are related to one another within the same clusters and are unrelated to documents belonging to other cluster. For applying any clustering methods, it is necessary to calculate similarity measure. The similarity measure is used to find out the degree of closeness or degree of similarity of the target objects. In this paper, we introduce document clustering on Multiview point-based similarity measure and two related document clustering methods. The existing document clustering dissimilarity/similarity measure uses only a single viewpoint, which is the origin that means it uses only one reference point, while the ours use many different viewpoints of references.
机译:电子信息,例如在线报纸,期刊,会议记录,网站,电子邮件等。它们以非常大量的速度增长。对于人类来说,使用所有这些电子信息进行控制,索引或搜索是不可能的,对于如此大量的大数据而言,对于搜索引擎而言,这也是不可能的。因此,自动文档组织成为一个关键问题。借助文档聚类方法,我们可以了解数据分布或可以预处理其他应用程序的数据。例如,如果搜索引擎使用聚类的文档来搜索项目或数据,则搜索引擎可以更有效地产生结果。文档聚类是一种自动聚类操作,也是一种无监督学习的技术。它组合了一个群集中的相关文档和不同群集中的不相关文档,因此每个群集都由同一群集中彼此相关且与属于其他群集的文档无关的文档组成。为了应用任何聚类方法,有必要计算相似性度量。相似度度量用于找出目标对象的接近度或相似度。在本文中,我们介绍了基于多视点的相似性度量的文档聚类以及两种相关的文档聚类方法。现有的文档聚类差异/相似性度量仅使用单个视点,这是其起源,这意味着它仅使用一个参考点,而我们的文档使用了许多不同的参考视点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号