【24h】

Back to Our Roots for Retrieving Very Short Passages

机译:回到我们检索短段落的根源

获取原文

摘要

This article tackles the task of retrieving very shortdocuments via even shorter queries. The problem on handmay relate to the retrieval of tweets, image and tablecaptions, short text messages (SMS) and sponsored retrievalamong others. In such cases, document and/or queryexpansion using thesauri and other external resources (e.g.,Wikipedia) usually available on the World Wide Web(WWW) are proven to be effective approaches. However,the focus of this paper is on documents that are written inlesser known languages for which the WWW is of limiteduse. Our experiments are based on two main corporaextracted from historical manuscripts written in Latin andMiddle High German. We found that retrieving very shortdocuments whose lengths are quite similar via short queriesgiven that no external enrichment resources are available,the classical tf-idf model performs as satisfactorily as themore complex models do, if not better sometimes.
机译:本文解决了检索非常短的任务 通过更短的查询生成文档。手头的问题 可能与推文,图片和表格的检索有关 字幕,短消息(SMS)和赞助商检索 其中。在这种情况下,文档和/或查询 使用叙词表和其他外部资源进行扩展(例如, Wikipedia)通常可以在万维网上找到 (WWW)被证明是有效的方法。然而, 本文的重点是写在 WWW受限制的鲜为人知的语言 使用。我们的实验基于两个主要语料库 摘自用拉丁文写成的历史手稿和 中高级德语。我们发现检索很短 通过短查询其长度非常相似的文档 鉴于没有可用的外部丰富资源, 传统的tf-idf模型的效果令人满意, 如果不是更好的话,更复杂的模型也可以。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号