Bengali Stop Phrase Detection Mechanism using Corpus Based Method

机译：基于语料库的孟加拉语停止短语检测机制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper discusses a corpus-based method for the detection of the stop phrase. These phrases must be detected and eliminated during NLP in the quest for attaining efficient indexing in modern Information Retrieval (IR) systems. A complete set of stop phrases for the Bengali language has not been developed yet. In this paper, a corpus-based approach is introduced for recognizing and extracting Bengali stop phrases. This proposed technique indicates that an input paragraph will be tokenized in several required manners and after that identification of stop phrases will be obtained by checking through the corpus. Accepted stop phrases will be sent for uniqueness. Outcomes of this proposed approach for stop phrases detection are notable where accuracy, precision, and recall results are observable. Eliminating these stop phrases will further reduce the time complexity of those algorithms, which were used in case of text summarizing and IR system.

机译：本文讨论了一种基于语料库的停用词检测方法。为了在现代信息检索（IR）系统中获得有效索引，必须在NLP期间检测并消除这些短语。尚未开发出一套完整的孟加拉语停用词组。本文介绍了一种基于语料库的方法来识别和提取孟加拉语停用词。这项提议的技术表明，将以几种必需的方式对输入的段落进行标记，然后通过对语料库进行检查来获得停用词的标识。接受的停用词将被发送以确保唯一性。在可以观察到准确性，准确性和召回结果的情况下，这种建议的用于停止短语检测的方法的结果非常明显。消除这些停止短语将进一步降低那些算法的时间复杂度，这些算法在文本摘要和IR系统的情况下使用。

著录项

来源
《International Conference on Informatics, Electronics Vision;International Conference on Imaging, Vision Pattern Recognition 》|2019年|178-183|共6页
会议地点
作者
Rakib ul Haque; Parisa Mehera; M. F. Mridha; Md. Abdul Hamid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Compounds; Asia; Information retrieval; Tokenization; Libraries; Indexing;

机译：化合物;亚洲;信息检索;标记化;图书馆;索引;

相似文献

外文文献
中文文献
专利

1. Bengali Stop Word and Phrase Detection Mechanism [J] . Rakib Ul Haque, M. F. Mridha, Md. Abdul Hamid, Arabian Journal for Science and Engineering. Section A, Sciences . 2020 ,第4期

机译：孟加拉语停用词和词组检测机制
2. Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus [J] . Jamatia Anupam, Swamy Steve Durairaj, Gamback Bjorn, International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2020 ,第5期

机译：基于码混合英语 - 印度和英语 - 孟加拉社交媒体语料库的深度学习情感分析
3. A web-based Bengali news corpus for named entity recognition [J] . Asif Ekbal, Sivaji Bandyopadhyay Computers and the Humanities . 2008 ,第2期

机译：基于网络的孟加拉新闻语料库，用于命名实体识别
4. Bengali Stop Phrase Detection Mechanism using Corpus Based Method [C] . Rakib ul Haque, Parisa Mehera, M. F. Mridha, International Conference on Informatics, Electronics amp;amp;amp;amp;amp;amp; Vision . 2019

机译：基于语料库的方法，孟加拉止术检测机制
5. A corpus-based analysis of 'I' and 'me' variation in coordinate noun phrases. [D] . Turley, Nancy Romans. 2009

机译：基于语料库的坐标名词短语中“ I”和“ me”变化的分析。
6. Corpus-based Statistical Screening for Phrase Identification [O] . Won Kim, W. John Wilbur 2000

机译：基于语料库的短语识别统计筛选
7. Corpus-based evaluation of prosodic phrase break prediction using nltk_lite’s chunk parser to detect prosodic phrase boundaries in the Aix-MARSEC corpus of spoken English [O] . Brierley C, Atwell ES 2007

机译：基于语料库的韵律短语中断预测评估使用nltk_lite的块解析器来检测aix-maRsEC英语口语中的韵律短语边界

Bengali Stop Phrase Detection Mechanism using Corpus Based Method

摘要

著录项

相似文献

相关主题

期刊订阅