首页> 外文会议>International Conference on Information Systems Architecture and Technology >Use of the EPSILON Decomposition and the SVD Based LSI Techniques for Reduction of the Large Indexing Structures
【24h】

Use of the EPSILON Decomposition and the SVD Based LSI Techniques for Reduction of the Large Indexing Structures

机译:使用epsilon分解和基于SVD的LSI技术来减少大型分度结构

获取原文

摘要

Storage of indexing structures in the Vector Space Model (VSM) form has a number of advantages. In the case when text documents are considered, the indexing structure states the Term-By-Document (TBD) matrix. Its size is proportional to the product of the indexed documents number and the keywords number. In the case of large text documents databases, the size of the indexing structure is a serious limitation. Too large TBD matrix may not be able to be stored in memory or the process of searching for documents may take too much time. The article presents a methodology that allows to reduce the size of the large TBD matrix. The operation performed on the TBD matrix is the Singular Value Decomposition (SVD). It allows to transform the original indexing structure vectors into a space with fewer dimensions. As a result of the operation, keywords used in the indexing process are generalized. This is a desirable effect, methods for generalizing the keywords are called the Latent Sematic Indexing (LSI) methods. Despite the undeniable advantages of the SVD decomposition, it has a big disadvantage. Its computational complexity is O(n~3). In practice, this prevents the application of the method to a large indexing structure. The methodology presented in the article assumes the use of the Epsilon decomposition in order to divide the original TBD matrix into parts before the reduction process. The proposed modification allows the use of the SVD decomposition for the indexing structure of any size.
机译:在向量空间模型(VSM)形式中存储索引结构具有许多优点。在考虑文本文档时,索引结构排列了逐个文档(TBD)矩阵。其大小与索引文档编号和关键字编号的乘积成比例。在大文本文档数据库的情况下,索引结构的大小是严重的限制。 TBD矩阵太大可能无法存储在存储器中,或者搜索文档的过程可能需要太多时间。该物品呈现了一种方法,允许减小大TBD矩阵的大小。在TBD矩阵上执行的操作是奇异值分解(SVD)。它允许将原始索引结构向量转换为具有较少维度的空间。作为操作的结果,索引过程中使用的关键字是概括的。这是一个理想的效果,用于概括关键字的方法称为潜在语义索引(LSI)方法。尽管SVD分解的不可否认的优势,但它具有很大的缺点。其计算复杂性是O(n〜3)。在实践中,这可以防止该方法的应用到大型索引结构。在文章中呈现的方法假设使用ε分解,以便将原始TBD矩阵分成还原过程之前的部分。所提出的修改允许使用SVD分解进行任何尺寸的索引结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号