A Dictionary-and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages

机译：词典和语料库无关的统计抽取器，用于低资源语言中的信息检索

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a dictionary- and corpus-independent statistical lemmatizer StaLe that deals with the out-of-vocabulary (OOV) problem of dictionary-based lemmatization by generating candidate lemmas for any inflected word forms. StaLe can be applied with little effort to languages lacking linguistic resources. We show the performance of StaLe both in lemmatization tasks alone and as a component in an IR system using several datasets and query types in four high resource languages. StaLe is competitive, reaching 88-108 % of gold standard performance of a commercial lemmatizer in IR experiments. Despite competitive performance, it is compact, efficient and fast to apply to new languages.

机译：我们提出了一种独立于字典和语料库的统计词法分解器StaLe，它通过为任何变形的词形生成候选词条来处理基于字典的词法化的词外（OOV）问题。 StaLe可以毫不费力地应用于缺乏语言资源的语言。我们使用四种高资源语言的几个数据集和查询类型，展示了StaLe在单独词义化任务中以及作为IR系统中组件的性能。 StaLe具有竞争优势，在红外实验中达到了商用脱胶机的黄金标准性能的88-108％。尽管具有竞争优势，但它可以紧凑，高效且快速地应用于新语言。

著录项

来源
《Multilingual and multimodal information access evaluation》|2010年|p.3-14|共12页
会议地点 Padua(IT);Padua(IT)
作者
Aki Loponen; Kalervo Jarvelin;
展开▼
作者单位

Department of Information Studies and Interactive Media, FI-33014 University of Tampere, Finland;

Department of Information Studies and Interactive Media, FI-33014 University of Tampere, Finland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Using Dictionary and Lemmatizer to Improve Low Resource English-Malay Statistical Machine Translation System [J] . Yin-Lai Yeong, Tien-Ping Tan, Siti Khaotijah Mohammad Procedia Computer Science . 2016,第1期

机译：使用字典和引词化器改进低资源英语-马来语统计机器翻译系统
2. Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages [J] . Hai-Long Trieu, Duc-Vu Tran, Ittoo Ashwin, ACM transactions on Asian language information processing . 2019,第3期

机译：利用其他资源来改善亚洲低资源语言的统计机器翻译
3. A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages [J] . Prachya BOONKWAN, Thepchai SUPNITHI IEICE transactions on information and systems . 2015,第5期

机译：语言驱动的低资源语言统计分析方法
4. A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages [C] . Aki Loponen, Kalervo Jarvelin International Conference of the Cross-Language Evaluation Forum . 2010

机译：字典和语料库无关的统计lemmatizer，用于低资源语言的信息检索
5. Cross language information retrieval for languages with scarce resources. [D] . Loza, Christian E. 2009

机译：资源匮乏的语言的跨语言信息检索。
6. Overcoming Terminology Barrier Using Web Resources for Cross-Language Medical Information Retrieval [O] . Wen-Hsiang Lu, Ray Shih-Jui Lin, Yi-Che Chan, 2006

机译：使用Web资源克服跨语言医学信息检索的术语障碍
7. A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages [O] . Loponen Aki, Järvelin Kalervo 2010

机译：词典和语料库无关的统计抽取器，用于低资源语言中的信息检索

A Dictionary-and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages

摘要

著录项

相似文献

相关主题

期刊订阅