首页> 外文期刊>International journal of artificial intelligence and soft computing >Effect of stop word removal on the performance of naive Bayesian methods for text classification in the Kannada language
【24h】

Effect of stop word removal on the performance of naive Bayesian methods for text classification in the Kannada language

机译:停用词移除对卡纳达语文本分类的朴素贝叶斯方法性能的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Stop words are high frequency words in a document, which add unrealistic requirement on the classifier, both in terms of time and space complexity. There has been considerable amount of work done in information retrieval in English, but information retrieval in the Kannada language is a new concept. The identification and removal of stop words in the Kannada language could be an important piece of work, as elimination of stop words would definitely reduce the feature space, which in turn would help in reducing time and space complexity. It is to be noted that, there is no standard stop word list in the Kannada language. This warrants us to take up this task of developing an algorithm for removing structurally similar stop words. The stop word removal though reduces feature space, may not contribute to the improvement in the performance of the classifiers as is evident from our results.
机译:停用词是文档中的高频词,从时间和空间复杂度两方面对分类器增加了不切实际的要求。用英语进行信息检索已经做了大量工作,但是用卡纳达语进行信息检索是一个新概念。卡纳达语中的停用词的识别和删除可能是一项重要的工作,因为消除停用词肯定会减少特征空间,进而有助于减少时间和空间复杂度。要注意的是,在卡纳达语中没有标准的停用词列表。这使我们能够承担开发消除结构上相似的停用词的算法的任务。从我们的结果可以明显看出,去除停用词虽然会减少特征空间,但可能不会有助于提高分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号