首页> 外文OA文献 >Cross lingual opinion holder extraction based on multi-kernel SVMs and transfer learning
【2h】

Cross lingual opinion holder extraction based on multi-kernel SVMs and transfer learning

机译:基于多核SVM和迁移学习的跨语言意见持有人提取

摘要

Fine grained opinion analysis has much higher demand for annotated corpus which makes high quality analysis difficult when there are insufficient resources. In this paper we explore the use of cross lingual resources for opinion mining for resource poor languages. This paper presents a novel approach for cross lingual opinion holder extraction through leveraging finely annotated opinion corpus selectively from a source language as the supplementary training samples for the target language. Firstly, the opinion corpus in the source language with fine grained annotations are translated and projected to the target language to generate the training samples. Then, a classifier based on multi-kernel Support Vector Machines (SVMs) is developed to identify opinion holders in the target language, which uses a tree kernel based on syntactic features and a polynomial kernel based on semantic features, respectively. The two kernels are further improved by incorporating a pivot function based on word pair similarity. To reduce the noise of low quality translated samples, a Transfer learning algorithm is applied to select high quality translated samples iteratively for training the multi-kernel classifiers on the target language. Evaluations on transferring MPQA, an English opinion corpus (as the source language), to Chinese opinion analysis (as the target language) show that the opinion holder extraction performance on NTCIR-7 MOAT dataset is improved, which is higher than the Conditional Random Fields (CRFs) based approach and most reported systems in NTCIR-7 MOAT evaluation.
机译:细粒度的意见分析对带注释的语料库有更高的要求,这在资源不足的情况下很难进行高质量的分析。在本文中,我们探索使用跨语言资源进行资源贫乏语言的观点挖掘。本文提出了一种新的方法,可以通过从源语言中选择性地利用带有精细注释的意见语料库作为目标语言的补充训练样本来提取跨语言意见持有人。首先,将源语言中带有细粒度注释的意见语料库翻译并投影到目标语言,以生成训练样本。然后,开发了一种基于多核支持向量机(SVM)的分类器来识别目标语言中的观点持有者,该分类器分别使用基于句法特征的树形内核和基于语义特征的多项式内核。通过结合基于单词对相似性的枢轴函数进一步改善了两个内核。为了减少低质量翻译样本的噪声,应用转移学习算法迭代地选择高质量的翻译样本,以在目标语言上训练多核分类器。将英语意见语料库MPQA(作为源语言)转移到中国意见分析(作为目标语言)的评估表明,NTCIR-7 MOAT数据集上的意见持有者提取性能得到了改善,高于条件随机字段(CRF)方法和大多数报告的系统在NTCIR-7 MOAT评估中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号