首页> 外文会议>ACM international conference on information and knowledge management >TEXplorer: Keyword-based Object Search and Exploration in Multidimensional Text Databases
【24h】

TEXplorer: Keyword-based Object Search and Exploration in Multidimensional Text Databases

机译:TEXplorer:多维文本数据库中基于关键字的对象搜索和探索

获取原文

摘要

We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.
机译:我们提出了一种新颖的系统TEXplorer,该系统将基于关键字的对象排名与OLAP的聚合和探索能力集成在文本数据库中,该文本数据库具有丰富的可用结构化属性,例如产品评论数据库。 TEXplorer可以在多维文本数据库中实现,其中每一行都与结构尺寸(属性)和文本数据(例如文档)相关联。该系统利用文本多维数据集数据模型,其中一个单元汇总一组文档,这些文档在维的子集中具有匹配的值。文本多维数据集中的单元格捕获文档摘要的不同级别,并且可以表示不同概念级别的对象。用户通过提交一组关键字来查询系统。我们提出了一种基于关键字的交互式探索框架,该框架可以提供灵活的OLAP导航指南,并帮助用户识别他们感兴趣的级别和对象,而不是返回所有单元的排名列表。细胞的IR相关性分布。在每个交互阶段,都会根据维度的重要性得分对维度进行排序,以指导进行深入研究;并根据相同的长方体的相关性对细胞进行排名,以指导探索。我们提出了用于对前k个维度和单元格进行排名的高效算法和物化策略。最后,在真实数据集上的大量实验证明了我们方法的有效性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号