Techniques for improved LSI text retrieval.

机译：改进LSI文本检索的技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work identifies and studies four major issues in LSI (Latent Semantic Indexing) text retrieval: a multiplicity of standard query methods, alternative non-standard query methods, the issue of Generic Terms, and the lacking of Structural Data.; Firstly, three commonly-used standard query methods (versions A, B and B') are identified, compared, analyzed, and tested. Both mathematical analysis and experimental results reveal that version B is a better choice than version A, and that versions B and B' are essentially equivalent provided that the Equivalency Principle is satisfied. This finding shall eliminate the confusion and randomness of applying possibly incompatible query methods among LSI researchers and help restore the comparability of their works.; Secondly, some novel non-standard versions of query methods using the discovered technique of singular value rescaling (SVR) are proposed and studied. Testing results in the prototyping experimental environments and the standardized TREC data sets both confirmed the effectiveness of SVR. This finding bears the practical significance that the current information retrieval techniques may be significantly improved by simply adopting a novel query method which is computationally as efficient as the best standard query method.; Thirdly, this work studies the effects of Generic Terms, a minority group of terms that have relatively uniform distribution pattern among all topics of documents, on the LSI models. Characterization and definition of Generic Terms are achieved and an iterative algorithm is designed and implemented to identify these special terms. Experimental results strongly suggest that identification and exclusion of Generic Terms helps improve LSI text retrieval performance.; Fourthly, this work also studies how to integrate Structural Data (loosely defined as sentence structure) into the LSI models. Four major characteristics of Structural Data are identified: derivativity, maneuverability, language dependency, and updatability/downdatability. Qualifications of two candidate forms of Structural Data, i.e., word order and non-word-order syntax (both in English language), are carefully studied. A complete series of procedures are developed to fully integrate Structural Data (in its most qualified form of word order data) into the LSI models. Experimental results strongly suggest that acquisition and integration of Structural Data helps improve LSI text retrieval performance.

机译：这项工作确定并研究了LSI（潜在语义索引）文本检索中的四个主要问题：多种标准查询方法，替代性非标准查询方法，通用术语问题以及缺乏结构数据。首先，确定，比较，分析和测试三种常用的标准查询方法（版本A，B和B'）。数学分析和实验结果均表明，版本B比版本A更好，并且只要满足等效原则，版本B和B'实质上是等效的。这一发现将消除在LSI研究人员中应用可能不兼容的查询方法的困惑和随机性，并有助于恢复其工作的可比性。其次，提出并研究了一些新的非标准版本的查询方法，这些方法使用了发现的奇异值重定标度（SVR）技术。在原型实验环境中的测试结果和标准化的TREC数据集均证实了SVR的有效性。这一发现具有实际意义，即通过简单地采用一种在计算上与最佳标准查询方法一样有效的新颖查询方法，可以显着改善当前的信息检索技术。第三，这项工作研究了通用术语（少数术语，在文档的所有主题之间具有相对统一的分配模式）对LSI模型的影响。实现了通用术语的表征和定义，并设计并实现了一种迭代算法来识别这些特殊术语。实验结果强烈表明，对通用术语的识别和排除有助于提高LSI文本检索性能。第四，这项工作还研究了如何将结构数据（通常定义为句子结构）集成到LSI模型中。确定了结构数据的四个主要特征：衍生性，可操作性，语言依赖性和可更新性/可压缩性。仔细研究了结构数据的两种候选形式的资格，即单词顺序和非单词顺序语法（均为英语）。开发了一系列完整的过程，以将结构数据（以其最优质的字序数据形式）完全集成到LSI模型中。实验结果强烈表明，结构数据的获取和集成有助于提高LSI文本检索性能。

著录项

作者
Yan, Hua.;
展开▼
作者单位

Wayne State University.;

展开▼
授予单位 Wayne State University.;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 p.1535
总页数 220
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. 2.4Gbit/s STM-16 Regenerator-Section Terminating LSI Using Low-Power Bipolar Technology -power Reduction for Gbit/s-LSI by Using Power Management Logic, 2.5V-ECL Circuit, and Power Optimization CAD Techniques- [J] . Haruhiko Ichino, Kenji Kawai, Keiichi Koike NTT R&D . 1997,第7期

机译：使用低功率双极技术的2.4Gbit / s STM-16再生器端接LSI-使用电源管理逻辑，2.5V-ECL电路和功耗优化CAD技术降低Gbit / s-LSI的功耗
2. Improving strategic decision making by the detection of weak signals in heterogeneous documents by text mining techniques [J] . Griol-Barres Israel, Milla Sergio, Millet Jose AI communications . 2019,第5a6期

机译：通过文本挖掘技术通过文本挖掘技术检测异构文件弱信号的战略决策
3. Techniques for Improving Communication of Emotional Content in Text-Only Web-Based Therapeutic Communications: Systematic Review [J] . Christine Louise Paul PhD, Martine Elizabeth Cox B Nutr, Diet, JMIR Mental Health . 2017,第4期

机译：在纯文本的基于Web的治疗性交流中改善情感内容交流的技术：系统综述
4. LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier [C] . Wang Ding, Songnian Yu, Shanqing Yu, Rough Sets and Knowledge Technology . 2008

机译：LRLW-LSI：改进的潜在语义索引（LSI）文本分类器
5. Text mining biomedical literature for improving MEDLINE retrieval. [D] . Lin, Yongjing. 2008

机译：文本挖掘生物医学文献，以改善MEDLINE检索。
6. Automated indexing for full text information retrieval. [O] . D. C. Berrios 2000

机译：自动索引用于全文检索。
7. LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier [O] . Wang Ding, Songnian Yu, Shanqing Yu, 2015

机译：LRLW-LsI：一种改进的潜在语义索引（LsI）文本分类器
8. Complex Event Processing for Content-Based Text, Image, and Video Retrieval. [R] . Boury-Brisset, A., Bowman, E. K., Burghouts, G., 2016

机译：基于内容的文本，图像和视频检索的复杂事件处理。

Techniques for improved LSI text retrieval.

摘要

著录项

相似文献

相关主题

期刊订阅