...
【24h】

Sentiment Analysis of Sinhala News Comments

机译:Sinhala新闻评论的情感分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Sinhala is a low-resource language, for which basic language and linguistic tools have not been properly defined. This affects the development of NLP-based end-user applications for Sinhala. Thus, when implementing NLP tools such as sentiment analyzers, we have to rely only on language-independent techniques. This article presents the use of such language-independent techniques in implementing a sentiment analysis system for Sinhala news comments. We demonstrate that for low-resource languages such as Sinhala, the use of recently introduced word embedding models as semantic features can compensate for the lack of well-developed language-specific linguistic or language resources, and text classification with acceptable accuracy is indeed possible using both traditional statistical classifiers and Deep Learning models. The developed classification models, a corpus of 8.9 million tokens extracted from Sinhala news articles and user comments, and Sinhala Word2Vec and fastText word embedding models are now available for public use; 9,048 news comments annotated with POSITIVE/NEGATIVE/NEUTRAL polarities have also been released.
机译:Sinhala是一种低资源语言,基本语言和语言工具尚未正确定义。这会影响Sinhala的基于NLP的最终用户应用程序的开发。因此,在实现诸如情感分析仪的NLP工具时,我们必须仅依赖于独立于语言的技术。本文介绍了这种语言独立技术在实施Sinhala新闻评论中的情绪分析系统方面。我们证明,对于诸如Sinhala等低资源语言,使用最近引入的Word嵌入模型作为语义特征可以补偿缺乏发达的语言特定语言或语言资源,以及具有可接受准确性的文本分类确实可以使用传统统计分类器和深层学习模型。发达的分类模型,从僧伽罗新闻文章和用户评论中提取了890万令牌的语料库,现在可以公开使用Sinhala Word2Vec和FastText Word嵌入模型; 9,048新闻评论呈阳性/负面/中性极性也已发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号