首页> 外文OA文献 >DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation
【2h】

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

机译:DCU @ FIRE2010:术语混淆,盲目相关反馈和跨语言IR与手动和自动查询翻译

摘要

For the first participation of Dublin City University (DCU)udin the FIRE 2010 evaluation campaign, information retrievalud(IR) experiments on English, Bengali, Hindi, and Marathiuddocuments were performed to investigate term conationud(different stemming approaches and indexing word prefixes),udblind relevance feedback, and manual and automatic queryudtranslation. The experiments are based on BM25 and onudlanguage modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP)udcompared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi,udthe corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAPudthan BM25 (0.4944 vs. 0.4526). In all experiments usingudBM25, blind relevance feedback yields considerably higherudMAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) areudbased on query translations obtained from native speakersudand the Google translate web service. For the automaticallyudtranslated queries, MAP is slightly (but not significantly)udlower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi)udexperiments achieve 81.7%-83.3% (78.0%-80.6%) of the bestudcorresponding monolingual experiments.
机译:在都柏林城市大学(DCU) ud在FIRE 2010评估活动中的首次参与下,进行了英语,孟加拉语,北印度语和马拉地语 ud文档的信息检索 ud(IR)实验,以研究术语conud ud(不同的词干处理方法和索引字词前缀), udblind相关性反馈以及手动和自动查询 udtranslation。实验基于BM25和基于IR的语言建模(LM)。结果表明,与索引未处理的字词形式相比,术语组合法始终可以提高平均平均精度(MAP) ud,但是对于不同的语言,不同的方法似乎效果最好。例如,在单语的Marathi实验中,索引5个前缀的性能要优于基于语料库的词干。在印地语中,基于语料库的词干可以获得更高的MAP。对于孟加拉语而言,LM检索模型比MAP25获得了更高的MAP ud(0.4944对0.4526)。在所有使用udBM25的实验中,与不使用udBM25的实验相比,盲相关反馈会产生更高的udMAP。双语IR实验(英语!孟加拉语和英语!印地语) ud基于从母语使用者获得的查询翻译 ud和Google翻译网络服务。对于自动 udtranslated查询,与使用手动查询翻译的实验相比,MAP稍低(但不明显) udlow。双语/孟加拉语(英语!印地语) udexperiments达到了最佳相对应的单语种实验的81.7%-83.3%(78.0%-80.6%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号