首页> 外文会议>Workshop of the Cross-Language Evaluation Forum >Utaclir @ CLEF 2001 ― Effects of Compound Splitting and N-Gram Techniques
【24h】

Utaclir @ CLEF 2001 ― Effects of Compound Splitting and N-Gram Techniques

机译:UTACLIR @ CLEF 2001 - 复合分裂和N-GRAM技术的影响

获取原文

摘要

The Tampere University CLEF research group participated in CLEF2001 with four automated bilingual runs. Our cross-lingual software, UTACLIR, uses an automated method for query construction for cross-language information retrieval (CLIR). This method seeks to automatically extract topical information from request sentences written in one of the source languages and to create a target language query, based on translations given by a translation dictionary. The new features for the CLIR process from Finnish, Swedish and German to English focus on translating and matching compound words, and a new n-gram based technique for translating and matching proper names and other non-translatable words. Non-translatable words can also be components in compounds. The n-gram based method is clearly efficient in matching inflected proper names and spelling variants. However, using it for all non-identified and non-translatable words adds noise to the query. For German ― English we have tested two types of dictionaries (two runs). The first included all translations from the standard dictionary. The second contained the same data, except that all direct translations of compounds were excluded. The test with two dictionaries for the German runs gives an indication that the new features for compound processing work well even with a limited dictionary.
机译:Tampere大学CLEF研究小组参加了CLEF2001,有四个自动双语。我们的交叉语言软件UTACLIR使用自动查询结构进行跨语言信息检索(CLIR)。该方法旨在自动从源语言之一写入的请求句子中提取主题信息,并根据由翻译词典给出的翻译来创建目标语言查询。从芬兰语,瑞典和德语到英语专注于翻译和匹配复合词的新功能,以及一种用于翻译和匹配正确名称和其他不可转换单词的新型n-gram技术。不可翻译的单词也可以是化合物中的组件。基于N-GRAM的方法在匹配的变形适当的名称和拼写变体中显然是有效的。但是,对所有未识别和不可转换的单词使用它会增加查询的噪声。对于德语 - 英语我们已经测试了两种类型的词典(两次运行)。第一个包括标准字典的所有翻译。第二个包含相同的数据,不同之处在于排除了化合物的所有直接翻译。与德国运行的两个词典的测试表明,即使具有有限的字典,复合处理的新功能也很好地工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号