首页> 外文会议>International workshop for computational linguistics of uralic languages >Learning multilingual topics through aspect extraction from monolingual texts
【24h】

Learning multilingual topics through aspect extraction from monolingual texts

机译:通过从单语文本中提取方面来学习多语主题

获取原文

摘要

Texts rating products and services of all kind are omnipresent on the internet. They come in various languages and often in such a large amount that it is very time-consuming to get an overview of all reviews. The goal of this work is to facilitate the summarization of opinions written in multiple languages, exemplified on a corpus of English and Finnish reviews. To this purpose, we propose a framework that extracts aspect terms from reviews and groups them to multilingual topic clusters. For aspect extraction we work on texts of each language separately. We evaluate three methods, all based on neural networks. One of them is supervised, one unsupervised, based on an attention mechanism and one a rule-based hybrid method. We then group the extracted aspect terms into multilingual clusters, whereby we evaluate three different clustering methods and juxtapose a method that creates clusters from multilingual word embeddings with a method that first creates monolingual clusters for each language separately and then merges them. We report on our results from a variety of experiments, observing the best results when clustering aspect terms extracted by the supervised method, using the k-means algorithm on multilingual embeddings.
机译:各种文本对产品和服务进行评级的内容在互联网上无处不在。它们以各种语言出现,并且数量通常很大,以至于要获得所有评论的概述是非常耗时的。这项工作的目的是促进以多种语言撰写的意见的摘要,例如英语和芬兰语评论的语料库。为此,我们提出了一个框架,该框架从评论中提取方面术语并将其分组为多语言主题群。对于方面提取,我们分别处理每种语言的文本。我们评估了三种基于神经网络的方法。其中一种是基于注意力机制的,一种是无监督的,另一种是基于规则的混合方法。然后,我们将提取的方面术语归类为多语言聚类,从而评估三种不同的聚类方法,并将通过多语言单词嵌入创建聚类的方法与首先为每种语言分别创建单语言聚类然后合并的方法并置。我们报告了来自各种实验的结果,当使用多语言嵌入的k-means算法对通过监督方法提取的方面项进行聚类时,观察到了最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号