Automatic Discovery of Language Models for Text Databases

机译：自动发现文本数据库的语言模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GlOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.

机译：大型组织和互联网内的文本数据库的扩散使人难以知道要搜索的数据库。给定描述每个数据库内容的语言模型，数据库选择算法（如光泽）可以通过自动选择需要的信息来提供帮助。目前的实践是每个数据库根据要求提供其语言模型，但这种合作方法具有重要的限制。本文表明不需要合作。相反，数据库选择服务可以通过运行查询和检索文档的正常过程来采样数据库内容来构建自己的语言模型。虽然不可能进行随机采样，但它可以仔细选择的查询近似。这种采样方法避免了具有合作方法的局限性，并且还实现了额外的能力。实验结果表明，可以从相对少量的查询和文档中学到准确的语言模型。

著录项

来源
《ACM SIGMOD International Conference on Management of Data》|1999年||共12页
会议地点
作者
Jamie Callan; Margaret Connell; Aiqun Du;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-532;
关键词

相似文献

外文文献
中文文献
专利

1. Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs. [J] . Kantor R, Machekano R, Gonzales MJ, Nucleic Acids Research . 2001,第1期

机译：人类免疫缺陷病毒逆转录酶和蛋白酶序列数据库：扩展的数据模型，集成了自然语言文本和序列分析程序。
2. Applying automatic text-based detection of deceptive language to police reports: Extracting behavioral patterns from a multi-step classification model to understand how we lie to the police [J] . Quijano-Sanchez Lara, Liberatore Federico, Camacho-Collados Jose, Knowledge-Based Systems . 2018,第JUNa1期

机译：将基于欺骗性语言的基于文本的自动检测应用于警察报告：从多步骤分类模型中提取行为模式，以了解我们对警察的谎言
3. Text Summarization and Discovery of Frames and Relationship from Natural Language Text - A R&D Methodology [J] . P.Chakrabarti, J.K. Basu International Journal on Computer Science and Engineering . 2010,第3期

机译：文本摘要以及从自然语言文本中发现框架和关系的研究方法
4. Automatic discovery of language models for text databases [C] . Jamie Callan, Margaret Connell, Aiqun Du, ACM SIGMOD international conference on Management of data . 1999

机译：自动发现文本数据库的语言模型
5. Automatic discovery of significant events from databases. [D] . Bharadwaj, Avinash Shankar. 2011

机译：从数据库自动发现重要事件。
6. Natural Language Processing and Automatic SNOMED-Encoding of Free Text: An Analysis of Free Text Data from a Routine Electronic Patient Record Application with a Parsing Tool Using the German SNOMED II [O] . Joerg H. Hohnloser, Matthias Holzer, Martin R.G. Fischer, 1996

机译：自然语言处理和自由文本的自动SNOMED编码：使用德语SNOMED II的解析工具对例行电子病历应用中的自由文本数据进行分析
7. Automatic Discovery of Language Models for Text Databases [O] . Jamie Callan et al. 2008

机译：自动发现文本数据库的语言模型

Automatic Discovery of Language Models for Text Databases

摘要

著录项

相似文献

相关主题

期刊订阅