首页> 外文期刊>World Wide Web >A multi-granularity semantic space learning approach for cross-lingual open domain question answering
【24h】

A multi-granularity semantic space learning approach for cross-lingual open domain question answering

机译:跨语言开放域问题回答的多粒度语义空间学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-lingual Open Domain Question Answering (Cross-lingual Open-QA) has been developed since it was proposed in the mid-1990s. It can be divided into two mainstream tasks according to the training corpus used in the answer extraction stage. One is that both of the training and testing data are in the target language. The other is that the training data is in the source language, and the testing data is in the target language. For a long time, the former has been studied mainly through translation based approaches. Until 2019, the latter appeared and non-translation based approaches become available thanks to multilingual BERT model. Therefore, the two tasks have been discussed separately, which encourages our work on whether it is possible to achieve these two tasks simultaneously without any additional transformation. It is observed that the existence of the multilingual BERT model makes a solution to establish a unified framework. However, there are two problems with using the multilingual BERT model directly. The one is in the document retrieval stage, directly working multilingual pretraining model for similarity calculation will result in insufficient retrieval accuracy. The other is in the answer extraction stage, the answers will involve different levels of abstraction related to retrieved documents, which needs deep exploration. This paper puts forward a multi-granularity semantic space learning based approach for cross-lingual Open-QA. It consists of the Match-Retrieval module and the Multi-granularity-Extraction module. The matching network in the retrieval module makes heuristic adjustment and expansion on the learned features to improve the retrieval quality. In the answer extraction module, the reuse of deep semantic features is realized at the network structure level through cross-layer concatenation, and it enables us to learn multi-granularity semantic space. Experimental results on two public cross-lingual Open-QA datasets show the superiority of our proposed approach over the state-of-the-art methods.
机译:自20世纪90年代中期提出,已经开发了交叉语言开放域问题应答(交叉语言开放式QA)。根据答案提取阶段中使用的培训语料库,它可以分为两个主流任务。一个是,两个训练和测试数据都是目标语言。另一种是训练数据处于源语言,并且测试数据处于目标语言。长期以来,前者主要通过基于翻译的方法进行研究。直到2019年,由于多语种伯爵模型,后者出现了基于非翻译的方法。因此,已经单独讨论了这两个任务,这鼓励我们的工作是是否有可能同时实现这两个任务,而没有任何额外的变换。观察到,多语言BERT模型的存在使得建立统一框架的解决方案。但是,使用多语言BERT模型直接使用两个问题。在文件检索阶段,直接工作的多语言预测模型用于相似性计算将导致检索精度不足。另一个是在答案提取阶段,答案将涉及与检索到的文档相关的不同抽象级别,这需要深入探索。本文提出了一种基于多粒度的语义空间学习方法,用于交叉开放式QA。它由匹配检索模块和多粒度提取模块组成。检索模块中的匹配网络在学习功能上进行启发式调整和扩展,以提高检索质量。在答案提取模块中,通过跨层连接在网络结构级别实现了深度语义特征,使我们能够学习多粒度语义空间。两种公共交叉型开放式数据集的实验结果表明,通过最先进的方法表明了我们提出的方法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号