首页> 外文期刊>ACM transactions on Asian language information processing >A Statistical Framework for Query Translation Disambiguation
【24h】

A Statistical Framework for Query Translation Disambiguation

机译:查询翻译消歧的统计框架

获取原文
获取原文并翻译 | 示例
       

摘要

Resolving ambiguity in the process of query translation is crucial to cross-language information retrieval (CLIR), given the short length of queries. This problem is even more challenging when only a bilingual dictionary is available, which is the focus of our work described here. In this paper, we will present a statistical framework for dictionary-based CLIR that estimates the translation probabilities of query words based on the monolingual word co-occurrence statistics. In addition, we will present two realizations of the proposed framework, i.e., the "maximum coherence model" and the "spectral query-translation model," that exploit different metrics for the coherence measurement between a translation of a query word and the theme of the entire query. Compared to previous work on dictionary-based CLIR, the proposed framework is advantageous in three aspects: (1) Translation probabilities are calculated explicitly to capture the uncertainty in translating queries; (2) translations of all query words are estimated simultaneously rather than independently; and (3) the formulated problem can be solved efficiently with a unique optimal solution. Empirical studies with Chinese-English cross-language information retrieval using TREC datasets have shown that the proposed models achieve a relative 10%-50% improvement, compared to other approaches that also exploit word co-occurrence statistics for query translation disambiguation.
机译:鉴于查询的长度较短,解决查询翻译过程中的歧义对于跨语言信息检索(CLIR)至关重要。当只有双语词典可用时,这个问题甚至更具挑战性,这是我们在此描述的工作重点。在本文中,我们将提供一个基于字典的CLIR的统计框架,该框架基于单语单词共现统计来估计查询词的翻译概率。另外,我们将介绍所提出框架的两个实现,即“最大一致性模型”和“频谱查询翻译模型”,它们利用不同的度量来度量查询词的翻译与主题的主题之间的一致性。整个查询。与先前基于字典的CLIR的工作相比,该框架在三个方面具有优势:(1)明确计算翻译概率以捕获翻译查询中的不确定性; (2)所有查询词的翻译是同时而不是独立地估计的; (3)通过独特的最优解可以有效地解决提出的问题。使用TREC数据集进行的中英文跨语言信息检索的实证研究表明,与其他也利用词共现统计信息来消除查询歧义的方法相比,所提出的模型可实现10%-50%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号