首页> 外国专利> Topic specific language models built from large numbers of documents

Topic specific language models built from large numbers of documents

机译：通过大量文档构建特定于主题的语言模型

页面导航

摘要
著录项
相似文献

摘要

Forming and/or improving a language model based on data from a large collection of documents, such as web data. The collection of documents is queried using queries that are formed from the language model. The language model is subsequently improved using the information thus obtained. The improvement is used to improve the query. As data is received from the collection of documents, it is compared to a rejection model, that models what rejected documents typically look like. Any document that meets the test is then rejected. The documents that remain are characterized to determine whether they add information to the language model, whether they are relevant, and whether they should be independently rejected. Rejected documents are used to update the rejection model; accepted documents are used to update the language model. Each iteration improves the language model, and the documents may be analyzed again using the improved language model.

机译：基于来自大量文档的数据（例如Web数据）形成和/或改进语言模型。使用由语言模型形成的查询来查询文档集合。随后使用由此获得的信息来改进语言模型。该改进用于改进查询。当从文档集合中接收到数据时，会将其与拒绝模型进行比较，该模型可以模拟被拒绝文档的典型外观。然后，所有符合测试要求的文件都会被拒绝。保留的文档具有确定它们是否向语言模型添加信息，它们是否相关以及是否应被独立拒绝的特征。拒绝的文档用于更新拒绝模型;接受的文档用于更新语言模型。每次迭代都会改进语言模型，并且可以使用改进的语言模型再次分析文档。

著录项

公开/公告号US7739286B2

专利类型
公开/公告日2010-06-15

原文格式PDF
申请/专利权人 ABHINAV SETHY;PANAYIOTIS GEORGIOU;SHRIKANTH NARAYANAN;
展开▼

申请/专利号US20060384226
发明设计人 ABHINAV SETHY;PANAYIOTIS GEORGIOU;SHRIKANTH NARAYANAN;
展开▼

申请日2006-03-17
分类号G06F7;G06F17/28;G06F17/30;G10L15;
国家 US
入库时间 2022-08-21 18:50:20

相似文献

专利
外文文献
中文文献