...
首页> 外文期刊>Journal of the American Society for Information Science and Technology >Topic Modeling for Mediated Access to Very Large Document Collections
【24h】

Topic Modeling for Mediated Access to Very Large Document Collections

机译:中介模型访问大型文档集的主题建模

获取原文
获取原文并翻译 | 示例
           

摘要

Clear and precise queries are a necessity when searching very large document collections, especially when query-based retrieval is the only means of exploration. We propose system-mediated information access as a solution for users' well-documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized "source collection," and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the user's selection of relevant "exemplary" documents and clusters from this source collection, the system builds a language model of her information need. This model is subsequently used to derive "mediated queries," which are expected to convey precisely and comprehensively the user's information need, and can be submitted by the user to search any large and heterogeneous "target collections." We present results of experiments that simulated various mediation strategies and compared the effect on mediation effectiveness of a variety of parameters, such as the similarity measure, the weighting scheme, and the clustering method. They provide both upperbounds of performance that can potentially be reached by real end users and a comparison between the effectiveness of these strategies. The experimental evidence suggests that information retrieval mediated through a clustered specialized collection has potential to improve effectiveness significantly.
机译:当搜索非常大的文档集合时,尤其是当基于查询的检索是唯一的探索手段时,清晰而精确的查询是必不可少的。我们建议使用系统介导的信息访问作为解决方案,以解决用户有据可查的无法编写好的查询的问题。我们的方法基于两个主要假设:首先,基于文档聚类揭示由专门的“源集合”表示的问题域的主题语义结构的能力;其次,基于统计语言模型传达信息的能力内容。中介系统充当人类中介者或中介搜索者的角色,与用户交互并支持她探索相对较小的源集合,该集合被选为代表问题领域。基于用户从该源集合中选择的相关“示例性”文档和聚类,系统建立了其信息需求的语言模型。此模型随后用于派生“中介查询”,该查询有望准确而全面地传达用户的信息需求,并且可以由用户提交以搜索任何大型且异构的“目标集合”。我们提供了模拟各种调解策略并比较各种参数对调解效果的影响的实验结果,这些参数包括相似性度量,加权方案和聚类方法。它们既提供了实际最终用户可能达到的性能上限,又提供了这些策略的有效性之间的比较。实验证据表明,通过聚类的专业馆藏调解的信息检索具有显着提高有效性的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号