首页> 外文会议>International conference on big data analytics >Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

【24h】

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

机译：古吉拉特语基于规则，基于字典和混合词干的比较分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Gujarati is an Indo-Aryan language spoken substantially by-people of Gujarat state of India. It is highly and actively used for communication in Gujarat government's educational institutes and offices, local industries, businesses as well as in media such as newspapers, magazines, radio and television programs. In all these areas, Internet is the keen requirement today. Its utilization will be increased if contents are provided on web in regional language using the notion of Natural Language Processing (NLP). In NLP, stemming plays a vital role in retrieving accurate contents and producing effective results for web search query. It identifies the root word from morphological variants of respective word. There are three typical approaches to perform stemming: rule-based approach, dictionary-based approach and hybrid approach. In this paper, we present a comparative empirical study of these three approaches for Gujarati language. The aim of the study is to evaluate the effectiveness of different types of stemmers for Gujarati language. Firstly, we discuss the rule-based algorithm and present its evaluation with 152 different suffix stripping rules. Next, we illustrate stemming mechanism developed using Gujarati dictionary that contains around 20000 root words. Lastly, we discuss the hybrid approach that is a combination of rule-based and dictionary-based approaches. Experimental results reveal that hybrid approach retrieves more accurate stemmed words compared to rule-based and dictionary-based approaches.

机译：古吉拉特语是印度古吉拉特邦人所说的印度－雅利安语。它在古吉拉特邦政府的教育机构和办公室，本地行业，企业以及报纸，杂志，广播和电视节目等媒体中得到了广泛而积极的交流。在所有这些领域中，互联网是当今的迫切需求。如果使用自然语言处理（NLP）的概念以区域语言在Web上提供内容，则会提高其利用率。在NLP中，词干在检索准确的内容并为Web搜索查询产生有效的结果方面起着至关重要的作用。它从各个词的形态变异中识别出词根。有三种典型的执行词干的方法：基于规则的方法，基于字典的方法和混合方法。在本文中，我们对古吉拉特语的这三种方法进行了比较实证研究。这项研究的目的是评估古吉拉特语的不同类型词干的有效性。首先，我们讨论基于规则的算法，并用152种不同的后缀剥离规则对算法进行评估。接下来，我们说明使用古吉拉特语字典开发的词干机制，该字典包含大约20000个词根。最后，我们讨论了混合方法，该方法是基于规则和基于字典的方法的组合。实验结果表明，与基于规则的方法和基于字典的方法相比，混合方法检索的词干词更准确。

著录项

来源
《International conference on big data analytics 》|2019年|140-155|共16页
会议地点
作者
Nakul R. Dave; Mayuri A. Mehta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Gujarati; Indian language; Natural Language Processing; Rule-based stemmer; Dictionary-based stemmer; Hybrid stemmer;

机译：古吉拉特语印度语言;自然语言处理;基于规则的声音;基于字典的声音;混合声音;

相似文献

外文文献
中文文献
专利

1. A HYBRID METHOD OF RULE-BASED AND STRING MATCHING STEMMER FOR JAVANESE LANGUAGE [J] . FATKHUL AMIN, WIWIEN HADIKURNIAWATI, SETYAWAN WIBISONO, Journal of Theoretical and Applied Information Technology . 2017 ,第19期

机译：日语语言的基于规则和字符串匹配词干的混合方法
2. A Hybrid Failure Diagnosis and Prediction using Natural Language-based Process Map and Rule-based Expert System [J] . Kim D., Lin Y., Lee S., International journal of computers, communications & control . 2018 ,第2期

机译：基于自然语言的流程图和基于规则的专家系统的混合故障诊断与预测
3. A Hybrid Failure Diagnosis and Prediction using Natural Language-based Process Map and Rule-based Expert System [J] . Dohyeong Kim, Yingru Lin, Sungyoung Lee, International journal of computers, communications and control . 2018 ,第2期

机译：基于自然语言的流程图和基于规则的专家系统的混合故障诊断与预测
4. Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language [C] . Nakul R. Dave, Mayuri A. Mehta International conference on big data analytics . 2019

机译：基于规则的古教堂语言和杂交词的比较分析
5. A machine-aided approach to generating grammar rules from Japanese source text for use in hybrid and rule-based machine translation systems. [D] . Jones, Sean. 2015

机译：一种从日语源文本生成语法规则的机器辅助方法，用于混合和基于规则的机器翻译系统。
6. Comparative Pollen Morphological Analysis and Its Systematic Implications on Three European Oak (Quercus L., Fagaceae) Species and Their Spontaneous Hybrids [O] . Dorota Wrońska-Pilarek, Władysław Danielewicz, Jan Bocianowski, 2011

机译：三种欧洲栎树（栎属，紫菜科）及其自发杂种的花粉形态比较分析及其系统学意义
7. Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati [O] . Kartik Suba, Dipti Jiandani, Pushpak Bhattacharyya 2014

机译：古吉拉特语的混合拐点投票和基于规则的派生投票
8. Comparative Economic Analysis of a Solar-Powered/Fuel-Assisted Hybrid Rankine System [R] . Lior, N., Koai, K. 1984

机译：太阳能/燃料辅助混合朗肯系统的比较经济分析

Comparative Analysis of Rule-Based, Dictionary-Based and Hybrid Stemmers for Gujarati Language

摘要

著录项

相似文献

相关主题

期刊订阅