An Improved Stemming Approach Using HMM for a Highly Inflectional Language

机译：一种利用HMM实现高拐点语言的改进的茎秆方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we focus on the problem of stemming texts in Assamese, a low resource Indic language spoken in the North-Eastern part of India by approximately 30 million people. Stemming is hard in Assamese due to the common appearance of single letter suffixes as morphological inflections. More than 50% of the inflections in Assamese appear as single letter suffixes. Such single letter morphological inflections cause ambiguity when predicting underlying root word. Therefore, we propose a new method that combines a rule based algorithm for predicting multiple letter suffixes and an HMM based algorithm for predicting the single letter suffixes. The combined approach can predict morphologically inflected words with 92% accuracy.

机译：Stemming是自然语言文本的形态标准化的常见方法。现代信息检索系统依赖于自动文档处理任务的此类标准化技术。高质量的茎秆在高度拐点的方向语言中很难。对设计算法进行了少量研究，以源于indical语言的文本。在这项研究中，我们专注于印度东北部门的敏感文本的敏感文本的问题，大约有3000万人。由于单个字母后缀的常见外观作为形态变形，所令人遗憾的是。 assamese中超过50％的折射显示为单个字母后缀。这种单字母形态拐点在预测底层根系时导致模糊性。因此，我们提出了一种新方法，该方法结合了基于规则的算法来预测多个字母后缀和基于HMM的算法，用于预测单个字母后缀。组合方法可以预测92％精度的形态上变形的单词。

著录项

来源
《Annual International Conference on Intelligent Text Processing and Computational Linguistics》|2013年||共10页
会议地点
作者
Navanath Saharia; Kishori M. Konwar; Utpal Sharma; Jugal K. Kalita;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP312-53;
关键词

相似文献

外文文献
中文文献
专利

1. Information Retrieval From Historical Newspaper Collections in Highly Inflectional Languages: A Query Expansion Approach [J] . Anni Jaervelin, Heikki Keskustalo, Eero Sormunen, Journal of the American Society for Information Science and Technology . 2016,第12期

机译：从高折语的历史报纸收藏中检索信息：一种查询扩展方法
2. To stem or lemmatize a highly inflectional language in a probabilistic IR environment? [J] . Kettunen K, Kunttu T, Jarvelin K The Journal of Documentation . 2005,第4期

机译：要在概率性IR环境中阻止或限制高度变形的语言？
3. On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages [J] . Jakub Piskorski, Karol Wieloch, Marcin Sydow Information retrieval . 2009,第3期

机译：关于高变形语言的人名匹配和词形化的知识贫乏方法
4. An Improved Stemming Approach Using HMM for a Highly Inflectional Language [C] . Navanath Saharia, Kishori M. Konwar, Utpal Sharma, International conference on intelligent text processing and computational linguistics . 2013

机译：改进的使用HMM的高屈折语言词干方法
5. A system for inducing the phonology and inflectional morphology of a natural language. [D] . McClure, Scott Nathanael. 2011

机译：一种用于诱导自然语言的语音和屈折形态的系统。
6. Non-Invasive Mapping for Effective Preoperative Guidance to Approach Highly Language-Eloquent Gliomas—A Large Scale Comparative Cohort Study Using a New Classification for Language Eloquence [O] . Sebastian Ille, Axel Schroeder, Lucia Albers, 2021

机译：有效术前指导的非侵入性绘图以实现高度语言 - 雄辩的胶质瘤 - 一种使用新分类进行语言口语的大规模比较队列研究
7. A Unification-based Approach to Morpho-syntactic Parsing of Agglutinative and Other (Highly) Inflectional Languages [O] . Prószéky Gábor, Kis Balázs 1999

机译：基于统一方法的胶合语和其他（高度）屈折语言的句法语法解析

An Improved Stemming Approach Using HMM for a Highly Inflectional Language

摘要

著录项

相似文献

相关主题

期刊订阅