首页> 外文期刊>Data Science and Engineering >Efficient Indexing of Top-k Entities in Systems of Engagement with Extensions for Geo-tagged Entities
【24h】

Efficient Indexing of Top-k Entities in Systems of Engagement with Extensions for Geo-tagged Entities

机译:高效索引与地理标记实体的扩展系统中的Apperements

获取原文
获取外文期刊封面目录资料

摘要

Next-generation enterprise management systems are beginning to be developed based on the Systems of Engagement (SOE) model. We visualize an SOE as a set of entities. Each entity is modeled by a single parent document with dynamic embedded links (i.e., child documents) that contain multi-modal information about the entity from various networks. Since entities in an SOE are generally queried using keywords, our goal is to efficiently retrieve the top- k entities related to a given keyword-based query by considering the relevance scores of both their parent and child documents. Furthermore, we extend the afore-mentioned problem to incorporate the case where the entities are geo-tagged. The main contributions of this work are three-fold. First, it proposes an efficient bitmap-based approach for quickly identifying the candidate set of entities, whose parent documents contain all queried keywords. A variant of this approach is also proposed to reduce memory consumption by exploiting skews in keyword popularity. Second, it proposes the two-tier HI-tree index, which uses both hashing and inverted indexes, for efficient document relevance score lookups. Third, it proposes an R-tree-based approach to extend the afore-mentioned approaches for the case where the entities are geo-tagged. Fourth, it performs comprehensive experiments with both real and synthetic datasets to demonstrate that our proposed schemes are indeed effective in providing good top- k result recall performance within acceptable query response times.
机译:开始基于参与系统(SOE)模型开发下一代企业管理系统。我们将SOE视为一组实体。每个实体由单个父文档建模,单个父文档具有动态嵌入式链接(即,子文档),其包含关于来自各种网络的实体的多模态信息。由于通常使用关键字查询SOE中的实体,因此我们的目标是通过考虑其父文档的相关性分数,有效地检索与给基于关键字的查询相关的顶部K实体。此外,我们扩展了上述问题以将实体所在地理标记的情况结合在一起。这项工作的主要贡献是三倍。首先,它提出了一种基于位图的方法,用于快速识别候选实体集的方法,其父文档包含所有查询的关键字。还提出了这种方法的变体来通过在关键户中利用偏光来降低内存消耗。其次,它提出了双层高树索引,它使用散列和反相索引,以实现有效的文档相关性得分查找。第三,它提出了一种基于R树的方法来扩展到实体是地理标记的情况的前述方法。第四,它用真实和合成数据集进行全面的实验,以证明我们的建议方案确实有效地提供了良好的顶级结果回忆性能在可接受的查询响应时间内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号