Topic Modeling for Mediated Access to Very Large Document Collections

Gheorghe Muresan; David J. Harper

首页> 外文期刊>Journal of the American Society for Information Science and Technology >Topic Modeling for Mediated Access to Very Large Document Collections

【24h】

Topic Modeling for Mediated Access to Very Large Document Collections

机译：中介模型访问大型文档集的主题建模

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clear and precise queries are a necessity when searching very large document collections, especially when query-based retrieval is the only means of exploration. We propose system-mediated information access as a solution for users' well-documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized "source collection," and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the user's selection of relevant "exemplary" documents and clusters from this source collection, the system builds a language model of her information need. This model is subsequently used to derive "mediated queries," which are expected to convey precisely and comprehensively the user's information need, and can be submitted by the user to search any large and heterogeneous "target collections." We present results of experiments that simulated various mediation strategies and compared the effect on mediation effectiveness of a variety of parameters, such as the similarity measure, the weighting scheme, and the clustering method. They provide both upperbounds of performance that can potentially be reached by real end users and a comparison between the effectiveness of these strategies. The experimental evidence suggests that information retrieval mediated through a clustered specialized collection has potential to improve effectiveness significantly.

机译：当搜索非常大的文档集合时，尤其是当基于查询的检索是唯一的探索手段时，清晰而精确的查询是必不可少的。我们建议使用系统介导的信息访问作为解决方案，以解决用户有据可查的无法编写好的查询的问题。我们的方法基于两个主要假设：首先，基于文档聚类揭示由专门的“源集合”表示的问题域的主题语义结构的能力；其次，基于统计语言模型传达信息的能力内容。中介系统充当人类中介者或中介搜索者的角色，与用户交互并支持她探索相对较小的源集合，该集合被选为代表问题领域。基于用户从该源集合中选择的相关“示例性”文档和聚类，系统建立了其信息需求的语言模型。此模型随后用于派生“中介查询”，该查询有望准确而全面地传达用户的信息需求，并且可以由用户提交以搜索任何大型且异构的“目标集合”。我们提供了模拟各种调解策略并比较各种参数对调解效果的影响的实验结果，这些参数包括相似性度量，加权方案和聚类方法。它们既提供了实际最终用户可能达到的性能上限，又提供了这些策略的有效性之间的比较。实验证据表明，通过聚类的专业馆藏调解的信息检索具有显着提高有效性的潜力。

著录项

来源
《Journal of the American Society for Information Science and Technology》 |2004年第10期|p.892-910|共19页
作者
Gheorghe Muresan; David J. Harper;
展开▼
作者单位

Department of Library and Information Science, Rutgers University, New Brunswick, NJ 08901;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类科学、科学研究;
关键词

相似文献

外文文献
中文文献
专利

1. Probabilistic Topic Modeling for Comparative Analysis of Document Collections [J] . Hua Ting, Lu Chang-Tien, Choo Jaegul, ACM transactions on knowledge discovery from data . 2020,第2期

机译：用于文档收集比较分析的概率主题建模
2. VISTopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling [J] . Yi Yang, Quanming Yao, Huamin Qu Visual Informatics . 2017,第1期

机译：VISTopic：一种可视化分析系统，可使用分层主题建模来理解大型文档集合
3. Using Topic Modeling to Enhance Access to Library Digital Collections [J] . Jonathan O. Cain Journal of web librarianship . 2016,第3期

机译：使用主题建模来增强对图书馆数字馆藏的访问
4. INFORMATION ACCESS VIA TOPIC HIERARCHIES AND THEMATIC ANNOTATIONS FROM DOCUMENT COLLECTIONS [C] . Hermine Njike Fotzo, Patrick Gallinari International Conference on Enterprise Information Systems . 2004

机译：通过主题层次结构和文档集合的主题注释信息访问
5. Classifying attitude by topic aspect for English and Chinese document collections [D] . Wu, Yejun 2008

机译：按主题方面对中英文文档集的态度进行分类
6. Incorporating Statistical Topic Models in the Retrieval of Healthcare Documents [O] . Karla Caballero, Ram Akella 2015

机译：在医疗文档检索中纳入统计主题模型
7. Topic modeling for mediated access to very large document collections [O] . Gheorghe Muresan, David J. Harper 2004

机译：主题建模，用于介导访问非常大的文档集
8. New Data Collection System for Ionospheric Modelling and Related Topics [R] . Sheehan, R. E. 1993

机译：新的电离层建模数据收集系统及相关主题

Topic Modeling for Mediated Access to Very Large Document Collections

摘要

著录项

相似文献

相关主题

期刊订阅