首页> 外文OA文献 >ダイアログ ナビ : ジツセカイ テキスト シュウゴウ ニ モトズク バクゼン ト シタ シツモン カラ グタイテキナ カイトウ エノ ナビゲーション システム
【2h】

ダイアログ ナビ : ジツセカイ テキスト シュウゴウ ニ モトズク バクゼン ト シタ シツモン カラ グタイテキナ カイトウ エノ ナビゲーション システム

机译:Dialog Navi:Jitsusekai Text Shugou Nimotozuku Bakuzento Shita Shitsumon Karagutai Tekina Kaito上野导航系统

摘要

As computers and their networks continue to be developed, our day-to-day lives are being surrounded by increasingly more complex instruments, and we often have to ask questions about using them. At the same time, large collections of texts to answer these questions are being gathered. Therefore, there are potential answers to many of our questions that exist as texts somewhere. However, there are various gaps between our various questions and the texts, and these prevent us from accessing appropriate texts to answer our questions. The gaps are mainly composed of both expression and vagueness gaps. When we seek texts for answers using conventional keyword-based text retrieval systems, we often have trouble locating them. In contrast, when we ask experts on instruments or operators of call centers, they can resolve the various gaps, by interpreting our questions flexibly, and by producing some ask-backs. The problem with experts and call centers is that they are not always available. Two approaches have been studied to resolve the various gaps: the extension of keyword-based text retrieval systems, and the application of artificial intelligence techniques. However, these approaches have their respective limitations. The former uses texts or keywords as methods for ask-back questions, but these methods are not always suitable. The latter requires a specialized knowledge base described in formal languages, so it cannot be applied to existing collections with large amount of texts. This thesis targets real-world the large text collections provided by Microsoft Corporation, and addresses a novel methodology to resolve the gaps between various user questions and the texts. The methodology consists of two key solutions: precise and flexible methods of matching user questions with texts based on NLP (natural language processing) techniques, and ask-back methods using the matching methods. First, the matching methods, including sentence structure analysis and expression gap resolution, are described. In addition, these methods are extended into matching through metonymy, which is frequently observed in natural languages. After that, a solution to make ask backs based on these matching methods, by using two kinds of ask-backs that complement each other, is proposed. Both ask-backs navigate users from vague questions to specific answers. Finally, our methodology is evaluated through the real-world operation of a dialog system, Dialog Navigator, in which all the proposed methods are implemented. Chapter 1 discusses issues on information retrieval, and present which issues are to be solved. That is, it examines the question logs from a real-world natural-language-based text retrieval system, and organizes types and factors of the gaps. The examination indicates that some gaps between user questions and texts cannot be resolved well by methods used in previous studies, and suggests that both interactions with users and applicability to real-world text collections are needed. Based on the discussion, a solution to deal with these gaps is proposed, by advancing an approach employed in open-domain question-answering systems, i.e., utilization of recent NLP techniques, into resolving the various gaps. Chapter 2 proposes several methods of matching user questions with texts, based on the NLP techniques. Of these techniques, sentence structure analysis through fullparsing is essential for two reasons: first, it enables expression gaps to be resolved beyond the keyword level; second, it is indispensable in resolving vagueness gaps by providing ask-backs. Our methods include: sentence structure analysis using a Japanese parser KNP, expression-gap resolution based on two kinds of dictionaries, text-collection selection through question-type estimates, and score calculations based on sentence structures. An experimental evaluation on testsets shows significant improvements of performance by our methods. Chapter 3 proposes a novel method of processing metonymy, as an extension of the matching methods proposed in Chapter 2. Metonymy is a figure of speech in which the name of one thing is substituted for that of something else to which it is related, and this frequently occurs in both user questions and texts. Namely, this chapter addresses the automatic acquisition of pairs of metonymic expressions and their interpretative expressions from large corpora, and applies the acquired pairs to resolving structural gaps caused by metonymy. Unlike previous studies on metonymy, the method targets both recognition and interpretation process of metonymy. The method acquired 1,126 pairs from corpora, and over 80% of the pairs were correct as interpretations of metonymy. Furthermore, an experimental evaluation on the testsets demonstrated that introducing the acquired pairs significantly improves matching. Chapter 4 presents a strategy of navigating users from vague questions to specific texts based on the previously discussed matching methods. Of course, it is necessary to make some use of ask-backs to achieve this, and this strategy involves two approaches: description extraction as a bottom-up approach, and dialog cards as a top-down approach. The former extracts the neighborhoods of the part that matches the user question in each text through matching methods. Such neighborhoods are mostly suitable for ask-backs that clarify vague user questions. However, if a user’s question is too vague, this approach often fails. The latter covers vague questions based on the know-how of the call center; dialog cards systematize procedures for ask-backs to clarify frequently asked questions that are vague. Matching methods are also applied to match user questions with the cards. Finally, a comparison of the approaches with those used in other related work demonstrates the novelty of the approaches. Chapter 5 describes the architecture for Dialog Navigator, a dialog system in which all the proposed methods are implemented. The system uses the real-world large text collections provided by Microsoft Corporation, and it has been open to the public on a website from April 2002. The methods were evaluated based on the real-world operational results of the system, because the various gaps to be resolved should reflect those in the real-world. The evaluation proved the effectiveness of the methods: more than 70% of all user questions were answered with relevant texts, the behaviors of both users and the system were reasonable with most dialogs, and most of the extracted descriptions for ask-backs were suitably matched. Chapter 6 concludes the thesis.
机译:随着计算机及其网络的不断发展,我们的日常生活被越来越复杂的工具所包围,我们常常不得不问有关使用它们的问题。同时,正在收集大量回答这些问题的文本。因此,以文本形式存在于某处的许多问题都有可能得到解答。但是,我们的各种问题与案文之间存在各种差距,这使我们无法获取适当的文本来回答我们的问题。差距主要由表达和模糊差距组成。当我们使用常规的基于关键字的文本检索系统寻找文本来寻找答案时,经常会在查找它们时遇到麻烦。相反,当我们向呼叫中心的仪器或运营商咨询专家时,他们可以通过灵活地解释我们的问题并提出一些要求来解决各种差距。专家和呼叫中心的问题在于它们并不总是可用。研究了两种方法来解决各种差距:基于关键字的文本检索系统的扩展以及人工智能技术的应用。但是,这些方法有其各自的局限性。前者使用文本或关键字作为询问问题的方法,但这些方法并不总是适用。后者需要以正式语言描述的专门知识库,因此不能应用于具有大量文本的现有馆藏。本文针对的是Microsoft Corporation提供的现实世界中的大型文本集合,并提出了一种新颖的方法来解决各种用户问题和文本之间的差距。该方法包括两个关键解决方案:基于NLP(自然语言处理)技术将用户问题与文本匹配的精确而灵活的方法,以及使用匹配方法的回问方法。首先,描述了匹配方法,包括句子结构分析和表达间隙解析。此外,这些方法通过转喻扩展为匹配,这在自然语言中经常出现。此后,提出了一种解决方案,通过使用两种相互补充的回问,基于这些匹配方法进行回问。两次询问都会使用户从模糊的问题导航到特定的答案。最后,我们的方法是通过对话系统Dialog Navigator的实际操作进行评估的,在该系统中实现了所有建议的方法。第1章讨论有关信息检索的问题,并介绍需要解决的问题。即,它检查来自真实世界基于自然语言的文本检索系统的问题日志,并组织差距的类型和因素。该检查表明,用户问题与文本之间的某些差距不能通过以前的研究中使用的方法很好地解决,并且建议与用户的交互以及对现实世界文本集合的适用性都需要。在讨论的基础上,提出了一种解决这些差距的解决方案,方法是将开放域问答系统中采用的一种方法,即利用最新的NLP技术,解决各种差距。第2章基于NLP技术提出了几种将用户问题与文本匹配的方法。在这些技术中,通过全解析进行句子结构分析非常重要,其原因有两个:第一,它可以解决超出关键字级别的表达空白;第二,它是通过提供问询来解决模糊性差距所不可或缺的。我们的方法包括:使用日语解析器KNP进行句子结构分析,基于两种词典的表达间隙解析,通过问题类型估计进行文本收集选择以及基于句子结构的分数计算。对测试集的实验评估表明,通过我们的方法,性能有了显着提高。第3章提出了一种处理转喻的新方法,作为第2章中提出的匹配方法的扩展。转喻是一种比喻,用一个事物的名称代替与之相关的其他事物的名称,用户问题和文本中都经常发生。即,本章介绍了从大型语料库中自动获取转喻表达对及其解释性表达的方法,并将获取的对应用于解决由转喻引起的结构空缺。与先前有关转喻的研究不同,该方法既针对转喻的识别过程,又针对转喻的解释过程。该方法从语料库中获取了1,126对,其中80%以上是正确的对转喻的解释。此外,对测试集进行的实验评估表明,引入获取的对可以显着改善匹配度。第4章介绍了一种基于先前讨论的匹配方法将用户从模糊的问题导航到特定文本的策略。当然,有必要利用回问来实现此目的,该策略涉及两种方法:描述提取是自下而上的方法,而对话框卡是自上而下的方法。前者通过匹配方法提取每个文本中与用户问题匹配的部分的邻域。这样的社区最适合用于澄清模糊用户问题的询问。但是,如果用户的问题太含糊,此方法通常会失败。后者涵盖了基于呼叫中心技术的模糊问题;对话卡将询问程序系统化,以澄清模糊的常见问题。匹配方法也适用于用卡匹配用户问题。最后,将这些方法与其他相关工作中使用的方法进行比较,证明了这些方法的新颖性。第5章介绍了Dialog Navigator的体系结构,Dialog Navigator是一个在其中实现所有建议方法的对话框系统。该系统使用Microsoft Corporation提供的现实世界中的大型文本集,并且已于2002年4月在网站上向公众开放。由于存在各种差距,因此基于系统的实际运行结果对方法进行了评估。要解决的问题应该反映现实世界中的问题。评估证明了该方法的有效性:所有用户问题的70%以上都用相关文本回答了,大多数对话框中用户和系统的行为都是合理的,并且提取的大部分询问描述都与之相匹配。第六章总结了论文。

著录项

  • 作者

    Kiyota Yoji;

  • 作者单位
  • 年度 2004
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号