首页> 外文期刊>Current Journal of Applied Science and Technology >Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition
【24h】

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

机译:相似度测量文本文档聚类的算法,使用奇异值分解

获取原文
获取外文期刊封面目录资料

摘要

We examined a similarity measure between text documents clustering. Data mining is a challenging field with more research and application areas. Text document clustering, which is a subset of data mining helps groups and organizes a large quantity of unstructured text documents into a small number of meaningful clusters. An algorithm which works better by calculating the degree of closeness of documents using their document matrix was used to query the terms/words in each document. We also determined whether a given set of text documents are similar/different to the other when these terms are queried. We found that, the ability to rank and approximate documents using matrix allows the use of Singular Value Decomposition (SVD) as an enhanced text data mining algorithm. Also, applying SVD to a matrix of a high dimension results in matrix of a lower dimension, to expose the relationships in the original matrix by ordering it from the most variant to the lowest.
机译:我们检查了文本文档聚类之间的相似性度量。 数据挖掘是一个具有挑战性的领域,具有更多的研究和应用领域。 文本文档群集是数据挖掘的子集帮助组并将大量非结构化文本文档组织成少量有意义的集群。 通过计算使用其文档矩阵计算文档的亲密度更好的算法用于查询每个文档中的术语/单词。 当查询这些条款时,我们还确定了一组给定的文本文档文件是否与另一组不同/不同。 我们发现,使用矩阵排列和近似文档的能力允许使用奇异值分解(SVD)作为增强的文本数据挖掘算法。 此外,将SVD应用于高维的矩阵导致较低尺寸的矩阵,通过将其从最大变量排序到最低限制来暴露原始矩阵中的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号