首页> 外文会议>AAAI Conference on Artificial Intelligence >Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence
【24h】

Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence

机译:通过推断多语素外用通信将罗马化评论与新闻文章联系起来

获取原文

摘要

Commenting is a popular facility provided by news sites. Analyzing such user-generated content has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons: (1) There are more than 20 official languages but linguistic resources are available mainly for Hindi. It is observed that people frequently use romanized text as it is easy and quick using an English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases, comments are made on a specific part of the article rather than the topic of the entire article. Off-the-shelf methods such as correspondence LDA are insufficient to model such relationships between articles and comments. In this paper, we extend the notion of correspondence to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over the state-of-the-art.
机译:评论是新闻网站提供了一个受欢迎的设施。分析这种用户生成的内容,最近引起了研究兴趣。然而,在多语言社会,如印度,分析这样的用户生成的内容是很难由于以下几个原因:(1)有超过20种官方语言,但语言资源可主要用于印地文。据观察,人们经常使用的罗马化的文字,因为它是方便,快捷使用英文键盘,导致多glyphic意见,其中文本是在同一种语言,但在不同的脚本。这样的罗马文本的机器至今几乎学习未知。 (2)在许多情况下,评论的文章的特定部分,而不是整个文章的主题作出。场外的货架,如对应LDA方法不足以文章和评论之间的这种关系建模。在本文中,我们扩展对应于所谓的多glyphic对应主题模型(MCTM)统一的概率模型,多语种,多的脚本,和跨语言议题的示范概念。用几个指标,我们确认我们的做法,并表明它提高在国家的最先进的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号