首页> 外文期刊>International Journal of Engineering Science and Technology >STEMMING OF PUNJABI WORDS BY USING BRUTE FORCE TECHNIQUE
【24h】

STEMMING OF PUNJABI WORDS BY USING BRUTE FORCE TECHNIQUE

机译:利用蛮力技术对旁遮普语词进行词干

获取原文
           

摘要

Stemming is an operation that conflates morphologically similar terms into a single term without doing complete morphological analysis. Stemming is used in information retrieval systems to improve performance. We describe a method to get the stem from the given word. Stemming is a technique which is required in information retrieval system and it is used to increase the performance of the retrieval result. All natural language processing systems must require a stemmer for it. The common goal of stemming is to standardize words by reducing a word to its base. Porter's stemmers have been used as a standard for English language. In this paper we have used Brute force technique with suffix stripping approach. Here we are using two approaches to get the maximum accuracy from the stemmer. For a language like Punjabi it is not easy to create a stemmer for it. Well known techniques for stemming are suffix removal, brute force technique, rule based technique and hybrid approaches. The need for good stemming algorithms for these languages has increased in the wake of search and retrieval system. In this we have created a huge database and a list of suffixes. With the help of big database we are getting higher accuracy than the other stemmers. We also reduce the over-stemming and understemming errors by finding number of words which causes these errors. We have already added these words in our database so to avoid the errors.
机译:词干提取是将形态相似的词混为单个词而不进行完整的形态分析的操作。信息检索系统中使用了词干以提高性能。我们描述了一种从给定单词中提取词干的方法。提取是信息检索系统中必需的一种技术,用于提高检索结果的性能。所有自然语言处理系统都必须为此使用词干分析器。词干的通用目标是通过将单词缩为基数来使单词标准化。波特的词干已被用作英语的标准。在本文中,我们使用了带后缀剥离方法的蛮力技术。在这里,我们使用两种方法从词干分析器中获得最大的准确性。对于像旁遮普语这样的语言,要为其创建词干并不容易。众所周知的词干提取技术是后缀去除,暴力破解技术,基于规则的技术和混合方法。随着搜索和检索系统的兴起,对这些语言的良好词干算法的需求也在增加。在此,我们创建了一个巨大的数据库和一个后缀列表。在大型数据库的帮助下,我们获得了比其他词干分析器更高的准确性。我们还通过查找引起这些错误的单词数来减少过度填充和不足填充错误。我们已经在数据库中添加了这些单词,以避免出现错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号