首页> 外文会议>ACM SIGMOD international conference on Management of data >Automatic discovery of language models for text databases
【24h】

Automatic discovery of language models for text databases

机译:自动发现文本数据库的语言模型

获取原文

摘要

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations.

This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enablesadditional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.

机译:

大型组织内部和Internet上的文本数据库的激增使人们很难知道要搜索哪些数据库。给定描述每个数据库内容的语言模型,诸如GIOSS之类的数据库选择算法可以通过自动选择适合信息需求的数据库来提供帮助。当前的实践是每个数据库都可以根据要求提供其语言模型,但是这种 cooperative 方法具有重要的局限性。

本文表明不需要合作。相反,数据库选择服务可以通过运行查询和检索文档的常规过程对数据库内容进行采样来构建自己的语言模型。尽管不可能进行随机采样,但是可以通过精心选择的查询来近似得出。这种采样方法避免了表征协作方法的局限性,并且还启用了附加功能。实验结果表明,可以从相对较少的查询和文档中学习准确的语言模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号