首页> 外文会议>International conference on computational processing of portuguese >When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems
【24h】

When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

机译:何时,何地,谁,什么或为什么?问答系统的混合模型

获取原文

摘要

Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag-of-words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.
机译:问答系统是信息检索和自然语言处理的一个领域,可以自动以自然语言回答人类提出的问题。这些系统的主要步骤之一是问题分类,其中系统尝试识别问题的类型(即,它是否与人,时间或位置有关),从而有助于产生准确的答案。机器学习技术通常用于将文本表示为特征向量的任务中,例如词袋,词频逆文档频率(TF-IDF)或词嵌入。但是,由监督算法产生的结果的质量取决于是否存在大型的,取决于领域的训练数据集,由于人工标注数据集的劳动强度,有时无法使用该训练数据集。通常,单词嵌入在小型训练集上表现出更好的相关性能,而词袋和TF-IDF在大型训练集上表现出更好的效果。在这项工作中,我们提出了一种混合模型,该模型结合了TF-IDF和单词嵌入功能,以便使用大小训练集为文本问题提供答案类型。我们使用葡萄牙语使用不同大小的训练集进行的实验表明,提出的混合模型在统计上优于词袋,TF-IDF和词嵌入方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号