首页> 外文期刊>Information Processing & Management >Multi-lingual opinion mining on YouTube
【24h】

Multi-lingual opinion mining on YouTube

机译:YouTube上的多语言意见挖掘

获取原文
获取原文并翻译 | 示例
       

摘要

In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (ⅰ) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ⅱ) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (ⅲ) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (ⅰ) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ⅱ) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (ⅲ) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.
机译:为了成功地将意见挖掘(OM)应用于每天产生的大量用户生成的内容,我们需要强大的模型,该模型能够很好地处理嘈杂的输入,并且可以轻松地适应新的领域或语言。我们通过(modeling)建模分类器来重点关注YouTube的观点挖掘,这些分类器预测评论的类型及其极性,同时区分极性是针对产品还是视频; (ⅱ)提出了一种健壮的浅层语法结构(STRUCT),该结构在跨域测试时具有很好的适应性; (ⅲ)用英语和意大利语两种语言评估拟议结构的有效性。与传统上使用的词袋模型相比,我们依靠树内核自动提取和学习具有更好泛化能力的特征。我们广泛的经验评估表明,(ⅰ)STRUCT在相同领域内都胜过词袋模型(意大利语和英语的绝对改善分别高达2.6%和3%); (ⅱ)在跨领域进行测试时(两种语言的绝对改进最高超过4%)特别有用,尤其是在缺乏培训数据时(绝对改进高达10%),并且(ⅲ)提出的结构也很有效在资源较少的语言方案中,只能使用不太准确的语言处理工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号