A Case Study in Decompounding for Bengali Information Retrieval

机译：孟加拉信息检索分解的案例研究

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Decompounding has been found to improve information retrieval (IR) effectiveness for compounding languages such as Dutch, German, or Finnish. No previous studies, however, exist on the effect of decomposition of compounds in IR for Indian languages. In this case study, we investigate the effect of decompounding for Bengali, a highly agglutinative Indian language. The standard approach of decompounding for IR, i.e. indexing compound parts (constituents) in addition to compound words, has proven beneficial for European languages. Our experiments reported in this paper show that such a standard approach does not work particularly well for Bengali IR. Some unique characteristics of Bengali compounds are: i) only one compound constituent may be a valid word in contrast to the stricter requirement of both being so; and ii) the first character of the right constituent can be modified by the rules of Sandhi in contrast to simple concatenation. As a solution, we firstly propose a more relaxed decompounding where a compound word is decomposed into only one constituent if the other constituent is not a valid word, and secondly we perform selective decompounding by ensuring that constituents often co-occur with the compound word, which indicates how related the constituents and the compound are. We perform experiments on Bengali ad-hoc IR collections from FIRE 2008 to 2012. Our experiments show that both the relaxed decomposition and the co-occurrence-based constituent selection proves more effective than the standard frequency-based decomposition method, improving mean average precision (MAP) up to 2.72% and recall up to 1.8%, compared to not decompounding words.

机译：已经发现分解改善了荷兰语，德语或芬兰等复合语言的信息检索（IR）有效性。然而，以前没有关于印度语言的IR中化合物分解的影响。在这种情况下，我们调查了孟加拉的分解，这是一种高度凝聚的印度语言的效果。用于IR的二散化的标准方法，即索引复合部分（成分）除了复合词之外，已被证明有利于欧洲语言。我们的实验在本文中报告显示，这种标准方法对孟加拉IR不起作用。孟加拉化合物的一些独特特征是：i）只有一个化合物组成部分可能是一个有效的词，与它们的更严格的要求相反; II）与简单的连接相比，Sandhi规则可以修改右组成部分的第一个特征。作为一种解决方案，我们首先提出了一种更轻松的分解，如果其他成分不是有效的话，则复合词只分解为一个组成部分，其次是通过确保复合字经常发生的成分来执行选择性分解，这表明了组分和化合物的关系。我们在2008年至2012年的Fire中对孟加拉ad-hoc IR系列进行实验。我们的实验表明，缓解分解和基于共同发生的组成选择比标准频率的分解方法更有效，提高平均平均精度（地图）与不分解的单词相比，映射高达2.72％，最高可达1.8％。

著录项

来源
《International Conference of the Cross-Language Evaluation Forum》|2013年||共12页
会议地点
作者
Debasis Ganguly; Johannes Leveling; Gareth J. F. Jones;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP312-53;
关键词

相似文献

外文文献
中文文献
专利

1. Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding [J] . AITAO CHEN, FREDRIC C. GEY Information retrieval . 2004,第1a2期

机译：使用机器翻译，相关性反馈和分解来进行多语言信息检索
2. How Effective is Stemming and Decompounding for German Text Retrieval? [J] . MARTIN BRASCHLER, BAERBEL RIPPLINGER Information retrieval . 2004,第3a4期

机译：词干和分解对德语文本检索的效果如何？
3. Combining IR Models for Bengali Information Retrieval [J] . Soma Chatterjee, Kamal Sarkar International journal of information retrieval research . 2018,第3期

机译：结合红外模型进行孟加拉语信息检索
4. A Case Study in Decompounding for Bengali Information Retrieval [C] . Debasis Ganguly, Johannes Leveling, Gareth J.F. Jones International conference of the CLEF initiative . 2013

机译：分解孟加拉语信息检索的案例研究
5. ENGLISH LITERATURE AND MODERN BENGALI SHORT FICTION: A STUDY IN INFLUENCES. [D] . LAGO, MARY MCCLELLAND. 1969

机译：英语文学和现代孟加拉语虚构小说：影响研究。
6. Gender differences in marital violence: A cross-ethnic study among Bengali Garo and Santal communities in rural Bangladesh [O] . Rabiul Karim, Hafijur Rahman, Suchona Rahman, 2021

机译：婚姻暴力的性别差异：孟加拉国农村孟加拉加洛和桑塔瓦社区之间的跨民族研究
7. A Case Study in Decompounding for Bengali Information Retrieval [O] . Debasis Ganguly, Johannes Leveling, Gareth J. F. Jones 2016

机译：孟加拉语信息检索分解的案例研究
8. Documentation and Information Retrieval Aspects of Army Studies. Volume II. Annex C to the Army Study System. Study Documentation and Information Retrieval [R] . Davis, C. J. 1963

机译：陆军研究的文献和信息检索方面。第二卷。陆军研究系统附件C.研究文档和信息检索

A Case Study in Decompounding for Bengali Information Retrieval

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅