...
首页> 外文期刊>Knowledge-Based Systems >A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection
【24h】

A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection

机译:一种多语言半监督方法,用于推导用于极性检测的新加坡感觉模式

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Due to the huge volume and linguistic variation of data shared online, accurate detection of the sentiment of a message (polarity detection) can no longer rely on human assessors or through simple lexicon keyword matching. This paper presents a semi-supervised approach in constructing essential toolkits for analysing the polarity of a localised scarce-resource language, Singlish (Singaporean English). Corpus based bootstrapping using a multilingual, multifaceted lexicon was applied to construct an annotated testing dataset, while unsupervised methods such as lexicon polarity detection, frequent item extraction through association rules and latent semantic analysis were used to identify the polarity of Singlish n-grams before human assessment was done to isolate misleading terms and remove concept ambiguity. The findings suggest that this multilingual approach outshines polarity analysis using only the English language. In addition; a hybrid combination of the Support Vector Machine and a proposed Singlish Polarity Detection algorithm, which incorporates unigram and n-gram Singlish sentic patterns with other multilingual polarity sentic patterns such as negation and adversative, is able to outperform other approaches in comparison. The promising results of a pooled testing dataset generated from the vast amount of unannotated Singlish data clearly show that our multilingual Singlish sentic pattern approach has the potential to be adopted in real-world polarity detection. (C) 2016 Elsevier B.V. All rights reserved.
机译:由于在线共享的数据量巨大且语言种类繁多,因此,准确检测消息的情感(极性检测)不再依赖人工评估者或通过简单的词典关键字匹配。本文提出了一种半监督的方法来构建基本工具包,以分析本地稀缺资源语言Singlish(新加坡英语)的极性。应用基于语料库的自举法,使用多语言,多方面的词典来构建带注释的测试数据集,同时使用无监督方法(如词典极性检测,通过关联规则频繁提取项目和潜在语义分析)在人类之前识别Singlish n-gram的极性进行了评估以隔离误导性术语并消除概念歧义。研究结果表明,这种多语言方法优于仅使用英语进行的极性分析。此外;支持向量机与提出的Singlish极性检测算法的混合组合,将unigram和n-gram Singlish情感模式与其他多语种极性情感模式(例如求反和敌对)结合在一起,在性能上要优于其他方法。由大量未注释的Singlish数据生成的合并测试数据集的有希望的结果清楚地表明,我们的多语言Singlish感觉模式方法有可能在现实世界的极性检测中被采用。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号