首页> 外文期刊>Intelligenza Artificiale >Enabling deep learning for large scale question answering in Italian
【24h】

Enabling deep learning for large scale question answering in Italian

机译:在意大利语中支持大规模问题的深度学习

获取原文
获取原文并翻译 | 示例
           

摘要

The recent breakthroughs in the field of deep learning led to state-of-the-art results in several NLP tasks, such as Question Answering (QA). Unfortunately, the requirements of such neural QA systems are very strict due to the size of the involved training datasets. In cross-linguistic settings these requirements are not satisfied as training datasets for QA over non-English texts are often not available. This represents the major barrier for a wide-spread adoption of neural QA methods in NLP applications. In this paper, the acquisition of a large scale dataset for an open-domain factoid question answering system in Italian is discussed. It is obtained by automatic translation and linguistic elicitation of an existing English dataset, i.e. the SQuAD question-answer pair corpus. Even though the quality of the resulting corpus for Italian might not be completely satisfying, our work allowed to generate more than 60 thousand question-answer pairs. In the paper the impact of this resource on the QA process over the ItalianWikipedia is studied, according to different training conditions and architectural constraints. A comparative evaluation against the English version, in line with standards in the SQuAD literature, is carried out. The outcomes show that the results achievable for Italian are below the state-of-the-art for English, but the ability of learning not to respond (i.e. the adoption of techniques for detecting question whose answers are simply not available, i.e. EMPTY set of answers) allows the system to pursue reasonable levels of precision. This make it already usable within realistic application scenarios. Finally, an error analysis is presented that suggests possible future research directions on still critical but highly beneficial enhancements, in view of concrete QA applications in Italian.
机译:近期深度学习领域的突破导致了最先进的结果,例如若干NLP任务,如问题应答(QA)。不幸的是,由于涉及的训练数据集的大小,这种神经QA系统的要求非常严格。在跨语言环境中,这些要求不满足于非英语文本上的QA的训练数据集通常不可用。这代表了NLP应用中神经QA方法的广泛采用的主要障碍。本文讨论了在意大利语中获取用于开放式因子问题应答系统的大型数据集。它是通过现有英语数据集的自动翻译和语言elization获得的,即小队问题答案对语料库。即使意大利人的结果的质量可能无法完全满足,我们的工作允许产生超过60万的问题答案对。根据不同的培训条件和建筑限制,研究了本文对QA过程对QA过程的影响。进行了对英语版本的比较评估,符合小队文献中的标准。结果表明,意大利人可以实现的结果低于最先进的英语,但学习不响应的能力(即通过用于检测答案根本不可用的问题的技术,即空集答案)允许系统追求合理的精度水平。这使得它已经可以在现实的应用程序中使用。最后,介绍了一个错误分析,旨在考虑到在意大利的具体QA应用程序看来,在仍然至关重要但高利益的增强的未来研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号