首页> 外文会议>International Conference on Informatics, Electronics and Vision >A corpus based unsupervised Bangla word stemming using N-gram language model

【24h】

A corpus based unsupervised Bangla word stemming using N-gram language model

机译：一种基于语料库的无人监督的孟加拉词，使用n克语言模型源

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a contextual similarity based approach for identification of stems or root forms of Bangla words using N-gram language model. The core purpose of our work is to build a big corpus of Bangla stems with their corresponding inflectional forms. Identification of stem form of a word is generally called stemming and the tool which identifies the stems is called stemmer. Stemmers are important mainly in information retrieval systems, recommending systems, spell checkers, search engines and other sectors of Natural Language Processing applications. We selected N-gram model for stem detection based on the assumption that if two words which exhibit a certain percentage of similarity in spelling and have a certain percentage of contextual similarity in many sentences then these words have higher probability of originating from the same root. We implemented 6-gram model for the stem identification procedure and we gained 40.18% accuracy for our corpus.

机译：在本文中，我们提出了一种基于语境相似性的方法，用于使用n克语言模型识别Bangla单词的茎或根形式。我们作品的核心目的是建立一个孟加拉的大语料，其具有相应的折射形式。识别单词的茎形式通常被称为茎和识别茎的工具被称为茎。 SEMPMERS主要是在信息检索系统，推荐系统，拼写检查，搜索引擎和自然语言处理应用程序的其他部门中。我们选择了基于假设拼写在许多句子中具有一定百分比的相似性并且在许多句子中具有一定百分比的语境相似性的单词，因此这些词具有较高概率源自同一根的单词，因此选择了N-Gram检测。我们为茎识别程序实施了6克模型，我们的语料库获得了40.18％的准确性。

著录项

来源
《International Conference on Informatics, Electronics and Vision 》|2016年|624p|共5页
会议地点
作者
Tapashee Tabassum Urmi; Jasmine Jahan Jammy; Sabir Ismail;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Automobiles; Computational modeling; Text processing; Natural language processing; Algorithm design and analysis; Computers; Jamming;

机译：汽车;计算建模;文本处理;自然语言处理;算法设计和分析;计算机;干扰;

相似文献

外文文献
中文文献
专利

1. Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach [J] . Tsegay Mullu Kassa Journal of Information Engineering and Applications . 2020 ,第4期

机译：句子级别n-gram上下文特征在实际单词拼写错误检测和校正中：基于无监督的语料库方法
2. Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity [J] . Welly NAPTALI, Masatoshi TSUCHIYA, Seiichi NAKAGAWA IEICE transactions on information and systems . 2012 ,第9期

机译：基于词外到词内相似度的新词基于类的N-Gram语言模型
3. Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity [J] . Welly NAPTALI, Masatoshi TSUCHIYA, Seiichi NAKAGAWA IEICE Transactions on Information and Systems . 2012 ,第9期

机译：基于词外到词内相似度的新词基于类的N-Gram语言模型
4. A corpus based unsupervised Bangla word stemming using N-gram language model [C] . Tapashee Tabassum Urmi, Jasmine Jahan Jammy, Sabir Ismail 2016 5th International Conference on Informatics, Electronics and Vision . 2016

机译：使用N-gram语言模型的基于语料库的无监督Bangla词干
5. Building a Corpus-Based Instructional Vocabulary Model: Interdisciplinary Academic Words in University Reading Support Courses [D] . Nelson, Timothy S. 2019

机译：建立基于语料库的教学词汇模型：大学阅读支持课程中的跨学科学术词汇
6. Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets [O] . Dario Borrelli, Gabriela Gongora Svartzman, Carlo Lipizzi 2020

机译：无监督的象征自然语言惯用单位的收购：新闻文章和推文的分组的基于n克频率的方法
7. Sentence Level N-Gram Context Feature in Real-Word Spelling Error Detection and Correction: Unsupervised Corpus Based Approach [O] . 2020

机译：句子级别n-gram上下文特征在实际单词拼写错误检测和校正中：基于无监督的语料库方法

A corpus based unsupervised Bangla word stemming using N-gram language model

摘要

著录项

相似文献

相关主题

期刊订阅