Automatic discovery of language models for text databases

机译：自动发现文本数据库的语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations.

This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enablesadditional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.

机译：

大型组织内部和Internet上的文本数据库的激增使人们很难知道要搜索哪些数据库。给定描述每个数据库内容的语言模型，诸如GIOSS之类的数据库选择算法可以通过自动选择适合信息需求的数据库来提供帮助。当前的实践是每个数据库都可以根据要求提供其语言模型，但是这种 cooperative 方法具有重要的局限性。

本文表明不需要合作。相反，数据库选择服务可以通过运行查询和检索文档的常规过程对数据库内容进行采样来构建自己的语言模型。尽管不可能进行随机采样，但是可以通过精心选择的查询来近似得出。这种采样方法避免了表征协作方法的局限性，并且还启用了附加功能。实验结果表明，可以从相对较少的查询和文档中学习准确的语言模型。展开▼

著录项

来源
《ACM SIGMOD international conference on Management of data》|1999年|P.479-490|共12页
会议地点
作者
Jamie Callan; Margaret Connell; Aiqun Du; PJamie Callan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类各种专用数据库;
关键词

相似文献

外文文献
中文文献
专利

1. Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs. [J] . Kantor R, Machekano R, Gonzales MJ, Nucleic Acids Research . 2001,第1期

机译：人类免疫缺陷病毒逆转录酶和蛋白酶序列数据库：扩展的数据模型，集成了自然语言文本和序列分析程序。
2. Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police [J] . Quijano-Sanchez Lara, Liberatore Federico, Camacho-Collados Jose, Knowledge-Based Systems . 2018,第JUNa1期

机译：将基于欺骗性语言的基于文本的自动检测应用于警察报告：从多步骤分类模型中提取行为模式，以了解我们对警察的谎言
3. Text Summarization and Discovery of Frames and Relationship from Natural Language Text - A R&D Methodology [J] . P.Chakrabarti, J.K. Basu International Journal on Computer Science and Engineering . 2010,第3期

机译：文本摘要以及从自然语言文本中发现框架和关系的研究方法
4. Automatic Discovery of Language Models for Text Databases [C] . Jamie Callan, Margaret Connell, Aiqun Du ACM SIGMOD International Conference on Management of Data . 1999

机译：自动发现文本数据库的语言模型
5. Automatic discovery of significant events from databases. [D] . Bharadwaj, Avinash Shankar. 2011

机译：从数据库自动发现重要事件。
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. Automatic Discovery of Language Models for Text Databases [O] . Jamie Callan et al. 2008

机译：自动发现文本数据库的语言模型

Automatic discovery of language models for text databases

摘要

著录项

相似文献

相关主题

期刊订阅