首页> 外文会议>ACM international conference on information and knowledge management >TEXplorer: Keyword-based Object Search and Exploration in Multidimensional Text Databases
【24h】

TEXplorer: Keyword-based Object Search and Exploration in Multidimensional Text Databases

机译:texplorer:多维文本数据库中基于关键字的对象搜索和探索

获取原文

摘要

We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach.
机译:我们提出了一种新颖的系统Texplorer,它将基于关键字的对象排序与OLAP的聚合和探索能力集成在文本数据库中,其中包含丰富的结构化属性,例如产品审查数据库。 Texplorer可以在多维文本数据库中实现,其中每行与结构维度(属性)和文本数据相关联(例如,文档)。该系统利用文本多维数据集数据模型,其中小区聚合具有匹配值的一组文档,其中尺寸的子集中。文本立方体中的单元格捕获文档的不同级别,并且可以表示不同概念级别的对象。用户通过提交一组关键字来查询系统。我们提出了一个基于关键字的交互式探索框架,而不是返回所有单元的排名列表,可以提供灵活的OLAP导航指南,并帮助用户确定他们感兴趣的级别和对象。提出了基于的新颖性尺寸的重要性测量细胞IR相关性的分布。在每个相互作用阶段,尺寸根据其显着分数排序以引导钻孔;和相同立方体中的细胞根据其与指导探索的相关性进行排名。我们提出了高效的算法和用于排名前k尺寸和细胞的实质化策略。最后,大量数据集的实验证明了我们方法的效率和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号