...
首页> 外文期刊>Natural language engineering >Machine learning for query formulation in question answering
【24h】

Machine learning for query formulation in question answering

机译:机器学习用于问答式查询

获取原文
获取原文并翻译 | 示例
           

摘要

Research on question answering dates back to the 1960s but has more recently been revisited as part of TREC's evaluation campaigns, where question answering is addressed as a subarea of information retrieval that focuses on specific answers to a user's information need. Whereas document retrieval systems aim to return the documents that are most relevant to a user's query, question answering systems aim to return actual answers to a users question. Despite this difference, question answering systems rely on information retrieval components to identify documents that contain an answer to a user's question. The computationally more expensive answer extraction methods are then applied only to this subset of documents that are likely to contain an answer. As information retrieval methods are used to filter the documents in the collection, the performance of this component is critical as documents that are not retrieved are not analyzed by the answer extraction component. The formulation of queries that are used for retrieving those documents has a strong impact on the effectiveness of the retrieval component. In this paper, we focus on predicting the importance of terms from the original question. We use model tree machine learning techniques in order to assign weights to query terms according to their usefulness for identifying documents that contain an answer. Term weights are learned by inspecting a large number of query formulation variations and their respective accuracy in identifying documents containing an answer. Several linguistic features are used for building the models, including part-of-speech tags, degree of connectivity in the dependency parse tree of the question, and ontological information. All of these features are extracted automatically by using several natural language processing tools. Incorporating the learned weights into a state-of-the-art retrieval system results in statistically significant improvements in identifying answer-bearing documents.
机译:关于答疑的研究可以追溯到1960年代,但最近作为TREC评估活动的一部分被重新审视,其中答疑被视为信息检索的一个子领域,该领域着重于满足用户信息需求的特定答案。文档检索系统旨在返回与用户查询最相关的文档,而问答系统旨在将实际答案返回给用户问题。尽管存在这种差异,问题回答系统仍依赖于信息检索组件来识别包含用户问题答案的文档。然后,将计算上更昂贵的答案提取方法仅应用于可能包含答案的文档的此子集。由于使用了信息检索方法来过滤集合中的文档,因此该组件的性能至关重要,因为未提取的文档不会由答案提取组件进行分析。用于检索那些文档的查询的形式对检索组件的有效性有很大的影响。在本文中,我们专注于根据原始问题预测术语的重要性。我们使用模型树机器学习技术,以便根据它们对识别包含答案的文档的有用性来为查询词分配权重。通过检查大量查询公式的变体及其在标识包含答案的文档中的准确性,可以了解术语权重。几种语言功能用于构建模型,包括词性标签,问题的依存关系分析树中的连接程度以及本体信息。所有这些功能都是通过使用几种自然语言处理工具自动提取的。将学习到的权重整合到最新的检索系统中,可以从统计上显着改善识别带有答案的文档。

著录项

  • 来源
    《Natural language engineering》 |2011年第4期|p.425-454|共30页
  • 作者

    CHRISTOF MONZ;

  • 作者单位

    Informatics Institute, University of Amsterdam, Science Park 107, 1098 XG Amsterdam, The Netherlands;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号