首页> 外文会议> >Decomposition of term-document matrix representation for clustering analysis
【24h】

Decomposition of term-document matrix representation for clustering analysis

机译:分解术语-文档矩阵表示形式以进行聚类分析

获取原文

摘要

Latent Semantic Indexing (LSI) is an information retrieval technique using a low-rank singular value decomposition (SVD) of term-document matrix. The aim of this method is to reduce the matrix dimension by finding a pattern in document collection with concurrently referring terms. The methods are implemented to calculate the weight of term-document in vector space model (VSM) for document clustering using fuzzy clustering algorithm. LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query-matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query-matching method requires computing the similarity measure about the query of every term and document in the vector space. In this paper, the Maximal Tree Algorithm is used within a recent LSI implementation to mitigate the computational time and computational complexity of query matching. The Maximal Tree data structure stores the term and document vectors in such a way that only those terms and documents are most likely qualified as the nearest neighbor to the query will be examined and retrieved. In a word, this novel algorithm is suitable for improving the accuracy of data miners.
机译:潜在语义索引(LSI)是一种使用术语文档矩阵的低秩奇异值分解(SVD)的信息检索技术。此方法的目的是通过在文档集合中同时引用术语的方式来查找矩阵,从而减小矩阵的维数。实现了利用模糊聚类算法计算向量空间模型(VSM)中术语文档权重的方法。 LSI试图利用文档中单词用法的基本语义结构。在LSI的查询匹配阶段,首先将用户的查询投影到术语文档空间中,然后将其与向量空间中表示的所有术语和文档进行比较。使用某种相似性度量,可以识别最近(最相关)的术语和文档,并将其返回给用户。当前的LSI查询匹配方法要求计算向量空间中每个术语和文档查询的相似性度量。在本文中,Maximum Tree算法在最近的LSI实现中使用,以减轻查询匹配的计算时间和计算复杂度。 “最大树”数据结构以一种方式存储术语和文档向量,使得只有那些术语和文档最有可能被限定为将查询和检索到与查询最近的邻居。总之,这种新颖的算法适合于提高数据挖掘器的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号