首页> 外文会议>International workshop for computational linguistics of uralic languages >Learning multilingual topics through aspect extraction from monolingual texts
【24h】

Learning multilingual topics through aspect extraction from monolingual texts

机译:通过各个方面提取单语言主题从单语言文本学习多语言主题

获取原文

摘要

Texts rating products and services of all kind are omnipresent on the internet. They come in various languages and often in such a large amount that it is very time-consuming to get an overview of all reviews. The goal of this work is to facilitate the summarization of opinions written in multiple languages, exemplified on a corpus of English and Finnish reviews. To this purpose, we propose a framework that extracts aspect terms from reviews and groups them to multilingual topic clusters. For aspect extraction we work on texts of each language separately. We evaluate three methods, all based on neural networks. One of them is supervised, one unsupervised, based on an attention mechanism and one a rule-based hybrid method. We then group the extracted aspect terms into multilingual clusters, whereby we evaluate three different clustering methods and juxtapose a method that creates clusters from multilingual word embeddings with a method that first creates monolingual clusters for each language separately and then merges them. We report on our results from a variety of experiments, observing the best results when clustering aspect terms extracted by the supervised method, using the k-means algorithm on multilingual embeddings.
机译:文本的评价所有的产品和服务种类在互联网上无处不在。它们有不同的语言,往往在这样一个大量,这是很费时获取所有评论的概述。这项工作的目的是促进用多种语言书写的意见汇总,举例英语和芬兰语评论的语料库。为此,我们提出了一个框架,从审查,并将它们分组到多语种话题群集提取物方面的条款。对于提取方面,我们分别对每种语言的文本工作。我们评估的三种方法,都是基于神经网络。其中之一是监督,监督的一个基于注意的机制和一个基于规则的混合方法。然后,我们组所提取的方面而言成多种语言的簇,由此我们评估三种不同的聚类方法和并列创建从与该第一对单独各个语言创建单语簇的方法多种语言字的嵌入簇的方法,然后将它们合并。我们对我们的结果报告从多种不同的实验,通过聚类的方法监督方面提取条件时,采用多语种的嵌入的K-means算法观测效果最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号