首页> 外文学位 >A term co-occurrence based framework for understanding LSI: Theory and practice.

【24h】

A term co-occurrence based framework for understanding LSI: Theory and practice.

机译：用于理解LSI的基于共现的术语框架：理论与实践。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic methods for searching textual collections have been developed since the early 1960's, but a global solution to the problem remains elusive. Latent Semantic Indexing (LSI) is a well-known information retrieval algorithm. LSI is based on a linear algebraic technique, Singular Value Decomposition (SVD).; The primary goal of this dissertation is the development of a theoretical framework for understanding LSI. In particular, we study the values produced by the SVD process and determine their impact on LSI performance. We use two approaches to this analysis of values, and develop two practical applications based on our improved knowledge of the relationship between the values in the truncated matrices and the performance of LSI.; The focus in the first part of this dissertation is the development of a theoretical framework for understanding LSI. Our framework is based on the concept of term co-occurrences, and we prove that LSI encapsulates term co-occurrence information. We also show a strong correlation between the retrieval quality of LSI and the distribution of the term co-occurrence weights.; In the second part of this document, we focus our study of the values produced by SVD by implementing several practical applications. First, we determine the most critical values of the LSI matrices by reducing the density of the matrices by up to 70% without impacting retrieval quality. This reduction results in memory requirement decrease of 55% during query run time. We also develop a term clustering algorithm that is based on the LSI term matrix. This algorithm is shown to develop effective clusters for use in an emerging trend detection application. Our emerging trend detection system was able to achieve .81–.89 f-measure (beta = 1) for several collections.

机译：自1960年代初以来，已经开发出了自动搜索文本集合的方法，但是对于该问题的全球解决方案仍然难以捉摸。潜在语义索引（LSI）是一种众所周知的信息检索算法。 LSI基于线性代数技术，奇异值分解（SVD）。本文的主要目的是为理解LSI的理论框架的发展。特别是，我们研究了SVD过程产生的值，并确定它们对LSI性能的影响。我们使用两种方法进行值分析，并基于对截断矩阵中的值与LSI性能之间关系的了解，开发了两个实际应用程序。本文第一部分的重点是建立一个理解LSI的理论框架。我们的框架基于术语共现的概念，并且我们证明LSI封装了术语共现信息。我们还显示了LSI的检索质量与术语共现权重的分布之间有很强的相关性。在本文档的第二部分中，我们将重点研究通过实现一些实际应用而对SVD产生的值的研究。首先，我们通过在不影响检索质量的情况下将矩阵密度降低多达70％来确定LSI矩阵的最关键值。这种减少导致查询运行期间内存需求减少了55％。我们还开发了基于LSI术语矩阵的术语聚类算法。该算法显示出可以开发有效的聚类，用于新兴的趋势检测应用程序。我们新兴的趋势检测系统能够针对多个集合实现0.81–.89 f测度（β= 1）。

著录项

作者
Kontostathis, April.;
展开▼
作者单位

Lehigh University.;

展开▼
授予单位 Lehigh University.;
学科 Computer Science.
学位 Ph.D.
年度 2003
页码 110 p.
总页数 110
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Understanding the Past, Present, and Future of China＇s Economic Development ---Based on A Unified Framework of Growth Theories [J] . 蔡昉中国经济学人：英文版 . 2014,第002期

机译：理解中国经济发展的过去，现在和未来---基于统一的增长理论框架
2. Framework of relevant leaning system for understanding Japanese academic articles: description of strategic teaching model for academic Japanese based on relevance theory [J] . Yukari Kato, Toshio Okamoto, Tatsunori Matsui 電子情報通信学会技術研究報告. 情報セキュリティ. Information Security . 2002,第65期

机译：用于理解日语学术文章的相关学习系统的框架：基于关联理论的学术日语战略教学模型的描述
3. Framework of relevant leaning system for understanding Japanese academic articles: description of strategic teaching model for academic Japanese based on relevance theory [J] . Yukari Kato, Toshio Okamoto, Tatsunori Matsui 電子情報通信学会技術研究報告. 情報セキュリティ. Information Security . 2002,第65期

机译：理解日本学术文章的相关倾斜系统框架：基于相关理论的学术教学模式描述
4. Understanding the Quadrilateral Concept of Junior High School Students Based on APOS Theory in Terms of Differences in Cognitive Styles [C] . A. C. Anam, D. Juniati, P. Wijayanti Mathematics, Informatics, Science, and Education International Conference . 2019

机译：基于APOS理论的认知风格差异的基于APOS理论，了解初中生的四边形概念
5. Towards a theory of collective understanding: Rethinking the relationship between social integration and system integration for improving organizational practice. [D] . Nobbs, Donald Walter. 2010

机译：迈向集体理解的理论：重新思考社会整合与系统整合之间的关系以改善组织实践。
6. LSI based framework to predict gene regulatory information [O] . Sujoy Roy, Lijing Xu, Ramin Homayouni 2009

机译：基于LSI的框架可预测基因调控信息
7. Uma proposta de coextensividade entre termo técnico, grupo nominal e item lexical no português brasileiro: um estudo com base em ferramentas da linguística de corpus sob o arcabouço de teoria sistêmico-funcional / A proposal of coextensiveness between technical term, nominal group, and lexical item in Brazilian Portuguese: a study based on corpus linguistics’ software within the framework of systemic-functional theory [O] . Júlia Santos Nunes Rodrigues, Kícila Ferreguetti, Adriana S. Pagano 2021

机译：巴西葡萄牙语葡萄牙语的技术术语，标称小组和词汇物品之间的共同助理性的提案：巴西葡萄牙语葡萄牙群组延伸的系统功能型技术框架下基于语料库语言学工具的研究：基于框架内的语料库语言学软件的研究系统功能理论

A term co-occurrence based framework for understanding LSI: Theory and practice.

摘要

著录项

相似文献

相关主题

期刊订阅