...
首页> 外文期刊>Research journal of applied science, engineering and technology >Improvement Tfidf for News Document Using Efficient Similarity
【24h】

Improvement Tfidf for News Document Using Efficient Similarity

机译:使用有效相似度改进新闻文档的Tfidf

获取原文
获取原文并翻译 | 示例

摘要

This study proposed a new method about clustering in documents. Clustering is a very powerful data mining technique for topic discovery from documents. In document clustering, it must be more similarity between intra-document and less similarity between intra-document of two clusters. The cosine function measures the similarity of two documents. When the clusters are not well separated, partitioning them just based on the pair wise is not good enough because some documents in different clusters may be similar to each other and the function is not efficient. To solve this problem, a measurement of the similarity in concept of neighbors and links is used. In this study, an efficient method for measurement of the similarity with a more accurate weighting in bisecting k-means algorithms is proposed. Having evaluated by the data set of documents, the efficiency is compared with the cosine similarity criterion and traditional methods. Experimental results show an outstanding improvement in efficiency by applying the proposed criterion.
机译:本研究提出了一种新的文档聚类方法。聚类是一种非常强大的数据挖掘技术,可用于从文档中发现主题。在文档集群中,两个集群的文档内部之间必须具有更高的相似度,而文档内部之间必须具有较低的相似度。余弦函数测量两个文档的相似性。当群集没有很好地分开时,仅基于成对对它们进行分区是不够的,因为不同群集中的某些文档可能彼此相似并且功能效率不高。为了解决这个问题,使用了邻居和链路概念上的相似性的度量。在这项研究中,提出了一种在二等分k均值算法中以更准确的权重衡量相似性的有效方法。通过文档数据集评估,将效率与余弦相似性准则和传统方法进行比较。实验结果表明,通过应用所提出的标准,效率显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号