【24h】

Bengali Stop Phrase Detection Mechanism using Corpus Based Method

机译:基于语料库的孟加拉语停止短语检测机制

获取原文

摘要

This paper discusses a corpus-based method for the detection of the stop phrase. These phrases must be detected and eliminated during NLP in the quest for attaining efficient indexing in modern Information Retrieval (IR) systems. A complete set of stop phrases for the Bengali language has not been developed yet. In this paper, a corpus-based approach is introduced for recognizing and extracting Bengali stop phrases. This proposed technique indicates that an input paragraph will be tokenized in several required manners and after that identification of stop phrases will be obtained by checking through the corpus. Accepted stop phrases will be sent for uniqueness. Outcomes of this proposed approach for stop phrases detection are notable where accuracy, precision, and recall results are observable. Eliminating these stop phrases will further reduce the time complexity of those algorithms, which were used in case of text summarizing and IR system.
机译:本文讨论了一种基于语料库的停用词检测方法。为了在现代信息检索(IR)系统中获得有效索引,必须在NLP期间检测并消除这些短语。尚未开发出一套完整的孟加拉语停用词组。本文介绍了一种基于语料库的方法来识别和提取孟加拉语停用词。这项提议的技术表明,将以几种必需的方式对输入的段落进行标记,然后通过对语料库进行检查来获得停用词的标识。接受的停用词将被发送以确保唯一性。在可以观察到准确性,准确性和召回结果的情况下,这种建议的用于停止短语检测的方法的结果非常明显。消除这些停止短语将进一步降低那些算法的时间复杂度,这些算法在文本摘要和IR系统的情况下使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号