首页> 外文期刊>Future generation computer systems >Benchmarking top-k keyword and top-k document processing with T~2K~2 and T~2K~2D~2
【24h】

Benchmarking top-k keyword and top-k document processing with T~2K~2 and T~2K~2D~2

机译:使用T〜2K〜2和T〜2K〜2D〜2对top-k关键字和top-k文档进行基准测试

获取原文
获取原文并翻译 | 示例
       

摘要

Top-kkeyword and top-kdocument extraction are very popular text analysis techniques. Top-kkeywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T2K2, a top-kkeywords and documents benchmark, and its decision support-oriented evolution T2K2D2. Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our benchmarks’ relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand.
机译:top-kkeyword和top-kdocument提取是非常流行的文本分析技术。热门关键字和文档通常是即时计算的,但它们会利用加权词汇来构建成本很高的词汇。为了比较竞争加权方案和数据库实现,通常会进行基准测试。据我们所知,目前还没有基准解决这些问题。因此,在本文中,我们介绍了T2K2,它是最热门的关键字和文档基准,以及面向决策支持的演变T2K2D2。这两个基准测试均具有真实的推文数据集以及具有各种复杂性和选择性的查询。它们有助于根据计算性能评估加权方案和数据库实现。为了说明基准测试的相关性和通用性,我们一方面成功地对TF-IDF和Okapi BM25加权方案以及不同关系(Oracle,PostgreSQL)和面向文档(MongoDB)的数据库实现进行了性能测试。另一方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号