首页> 外国专利> INFORMATION RETRIEVAL AND TEXT MINING USING DISTRIBUTED LATENT SEMANTIC INDEXING

INFORMATION RETRIEVAL AND TEXT MINING USING DISTRIBUTED LATENT SEMANTIC INDEXING

机译:分布式隐式语义索引的信息检索与文本挖掘

摘要

The use of latent semantic indexing (LSI) for information retrieval and text mining operations is adapted to work on large heterogeneous data sets by first partitioning the data set into a number of smaller partitions having similar concept domains. A similarity graph network is generated in order to expose links between concept domains which are then exploited in determing which domains to query as well as in expanding the query vector. LSI is performed on those partitioned data sets most likely to contain information related to the user query or text mining operation. In this manner LSI can be applied to datasets that heretofore presented scalability problems. Additionally, the computation of the singular value decomposition of the term-by-document matrix can be accomplished at various distributed computers increasing the robustness of the retrieval and text mining system while decreasing search times.
机译:通过首先将数据集划分为多个具有相似概念域的较小分区,将潜在语义索引(LSI)用于信息检索和文本挖掘操作的方法适用于大型异构数据集。生成相似度图网络,以暴露概念域之间的链接,然后在确定要查询的域以及扩展查询向量时利用这些链接。对那些最有可能包含与用户查询或文本挖掘操作有关的信息的分区数据集执行LSI。以此方式,可以将LSI应用于迄今为止存在可伸缩性问题的数据集。另外,可以在各种分布式计算机上完成逐项文档矩阵的奇异值分解的计算,从而提高了检索和文本挖掘系统的健壮性,同时减少了搜索时间。

著录项

  • 公开/公告号EP1618467A2

    专利类型

  • 公开/公告日2006-01-25

    原文格式PDF

  • 申请/专利权人 TELCORDIA TECHNOLOGIES INC.;

    申请/专利号EP20040750497

  • 发明设计人 BEHRENS CLIFFORD A.;BASSU DEVASIS;

    申请日2004-04-23

  • 分类号G06F7/00;

  • 国家 EP

  • 入库时间 2022-08-21 21:29:13

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号