首页> 外文会议>International Conference on Information Reuse and Integration for Data Science >Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources
【24h】

Development of Sentiment Lexicon in Bengali utilizing Corpus and Cross-lingual Resources

机译:利用语料库和跨语言资源在孟加拉语中开发情感词典

获取原文
获取外文期刊封面目录资料

摘要

Bengali, one of the most spoken languages, lacks tools and resources for sentiment analysis. To date, the Bengali language does not have any sentiment lexicon of its own; only the translated versions of English lexica are available. Therefore, in this work, we focus on developing a Bengali sentiment lexicon from a large Bengali review corpus utilizing a cross-lingual approach. To build the sentiment dictionary, we first created a Bengali corpus of around 42000 drama reviews; among them, we manually annotated around 12000 reviews. Utilizing a machine translation system, labeled and unlabeled Bengali review corpus, English sentiment lexica, pointwise mutual information (PMI), and supervised machine learning (ML) classifiers in different phases, we develop a Bengali sentiment lexicon of around 1000 sentiment words. We compare the coverage of our lexicon with the translated English lexica in two evaluation datasets. The proposed lexicon achieves 70%-74% coverage in document-level and around 65% coverage in word-level, which is approximately 30%-100% improvement over the translated lexica in word-level and 30%-50% in document-level. The results demonstrate that our developed lexicon is highly effective in recognizing sentiments in the Bengali text.
机译:孟加拉语是最常用的语言之一,缺乏用于情感分析的工具和资源。迄今为止,孟加拉语还没有自己的情感词典。仅提供英语词典的翻译版本。因此,在这项工作中,我们专注于使用跨语言方法从大型孟加拉语评论语料库中开发孟加拉语情感词典。为了建立情感词典,我们首先创建了一个约42000个戏剧评论的孟加拉语语料库;其中,我们手动注释了大约12000条评论。利用机器翻译系统,带标签和未带标签的孟加拉语评论语料库,英语情感词典,逐点互信息(PMI)和有监督的机器学习(ML)分类器,我们开发了约1000个情感词的孟加拉语情感词典。我们在两个评估数据集中比较了词典和翻译的英语词典的覆盖范围。拟议的词典在文档级别上可达到70%-74%的覆盖率,在单词级别上可达到约65%的覆盖率,这比翻译词典在单词级别上和文档-文档中的30%-50%分别提高了30%-100%等级。结果表明,我们开发的词典在识别孟加拉语文本中的情感方面非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号