首页> 外文期刊>ACM transactions on Asian language information processing >Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique
【24h】

Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique

机译:话语信息在乌尔都语情感分类中的作用:基于规则的方法和机器学习技术

获取原文
获取原文并翻译 | 示例
           

摘要

CIn computational linguistics, sentiment analysis refers to the classification of opinions in a positive class or a negative class. There exist a lot of different methods for sentiment analysis of the English language, but the literature lacks the availability of methods and techniques for Urdu, which is the largely spoken language in the South Asian sub-continent and the national language of Pakistan. The currently available techniques, such as adjective count method known as Bag of Words (BoW), is not sufficient for classification of complex sentiment written in the Urdu language. Also, the performance of available machine-learning techniques (with legacy features), for classification of Urdu sentiments, are not comparable with the achieved accuracy of other languages. In the case of the English language, the discourse information (sub-sentence-level information) boosts the performance of both the BoW method and machine-learning techniques, but there are very few works available that have tested the context-level information for the sentiment analysis of the Urdu language. This research aims to extract the discourse information from the Urdu sentiments and utilise the discourse information to improve the performance and reduce the error rate of existing techniques for Urdu Sentiment classification. The proposed solution extracts the discourse information, suggests a new set of features for machine-learning techniques, and introduces a set of rules to extend the capabilities of the BoW model. The results show that the task has been enhanced significantly and the performance metrics such as recall, precision, and accuracy are increased by 31.25%, 8.46%, and 21.6%, respectively. In future, the proposed technique can be extended to sentiments with more than two sub-opinions, such as for blogs, reviews, and TV talk shows.
机译:在计算语言学中,情感分析是指对正面或负面的观点进行分类。有许多不同的方法来分析英语,但文献缺乏针对乌尔都语的方法和技术的支持。乌尔都语是南亚次大陆和巴基斯坦的主要语言。当前可用的技术,例如称为单词袋(BoW)的形容词计数方法,不足以对用Urdu语言编写的复杂情感进行分类。同样,用于分类乌尔都语情感的可用机器学习技术(具有传统功能)的性能无法与其他语言实现的准确性相提并论。以英语为例,话语信息(子句级信息)提高了BoW方法和机器学习技术的性能,但是很少有可用于测试上下文信息的作品。乌尔都语的情绪分析。本研究旨在从乌尔都语情感中提取话语信息,并利用该话语信息来提高性能并降低现有的乌尔都语情感分类技术的错误率。提出的解决方案提取了话语信息,为机器学习技术提出了一组新功能,并引入了一组规则来扩展BoW模型的功能。结果表明,该任务得到了显着增强,召回率,准确性和准确性等性能指标分别提高了31.25%,8.46%和21.6%。将来,所提出的技术可以扩展到具有两个以上子观点的情感,例如博客,评论和电视脱口秀节目。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号