【24h】

TableRank: A Ranking Algorithm for Table Search and Retrieval

机译:TableRank:表格搜索和检索的排名算法

获取原文
获取原文并翻译 | 示例

摘要

Tables are ubiquitous in web pages and scientific documents. With the explosive development of the web, tables have become a valuable information repository. Therefore, effectively and efficiently searching tables becomes a challenge. Existing search engines do not provide satisfactory search results largely because the current ranking schemes are inadequate for table search and automatic table understanding and extraction are rather difficult in general. In this work, we design and evaluate a novel table ranking algorithm - TableRank to improve the performance of our table search engine Table-Seer. Given a keyword based table query, TableRank facilities TableSeer to return the most relevant tables by tailoring the classic vector space model. TableRank adopts an innovative term weighting scheme by aggregating multiple weighting factors from three levels: term, table and document. The experimental results show that our table search engine outperforms existing search engines on table search. In addition, incorporating multiple weighting factors can significantly improve the ranking results.
机译:表格在网页和科学文档中无处不在。随着网络的爆炸性发展,表格已成为有价值的信息存储库。因此,有效地搜索表成为一个挑战。现有的搜索引擎不能提供令人满意的搜索结果,这在很大程度上是因为当前的排名方案不足以进行表格搜索,并且自动表格的理解和提取通常比较困难。在这项工作中,我们设计和评估了一种新颖的表排名算法-TableRank,以提高表搜索引擎Table-Seer的性能。给定基于关键字的表查询,TableRank通过定制经典向量空间模型使TableSeer能够返回最相关的表。 TableRank采用创新的术语加权方案,将术语,表格和文档三个层次的多个加权因子进行汇总。实验结果表明,我们的表搜索引擎在表搜索方面优于现有的搜索引擎。此外,合并多个加权因子可以显着改善排名结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号