首页> 外文会议>ACM SIGMOD International Conference on Management of Data >Automatic Discovery of Language Models for Text Databases
【24h】

Automatic Discovery of Language Models for Text Databases

机译:自动发现文本数据库的语言模型

获取原文

摘要

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GlOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.
机译:大型组织和互联网内的文本数据库的扩散使人难以知道要搜索的数据库。给定描述每个数据库内容的语言模型,数据库选择算法(如光泽)可以通过自动选择需要的信息来提供帮助。目前的实践是每个数据库根据要求提供其语言模型,但这种合作方法具有重要的限制。本文表明不需要合作。相反,数据库选择服务可以通过运行查询和检索文档的正常过程来采样数据库内容来构建自己的语言模型。虽然不可能进行随机采样,但它可以仔细选择的查询近似。这种采样方法避免了具有合作方法的局限性,并且还实现了额外的能力。实验结果表明,可以从相对少量的查询和文档中学到准确的语言模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号