首页> 外文学位 >The importance of using domain knowledge in solving information distillation problems.

【24h】

The importance of using domain knowledge in solving information distillation problems.

机译：在解决信息提炼问题中使用领域知识的重要性。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis is an inquiry into the importance of incorporating domain knowledge into emerging information distillation tasks which are in principle similar to that of text summarization, but in practice require techniques that are not adequately addressed in previous work. Tasks being analyzed are headline generation, biography creation, online discussion summarization, and automatic evaluation for summaries. This thesis shows empirically that while traditional text summarization techniques are designed for generic summarization tasks, they cannot be readily applied to the above four tasks. Each task requires prior knowledge on the operating domain, data type, task structure, and output structure. Techniques and algorithms designed with this knowledge perform significantly better than the ones without.; This thesis explores the solutions to headline generation, or the generation of summaries of very short length. By identifying features that are specific to headlines, a keyword selection model was designed to select words that are headline-worthy. Context information surrounding these headline words are extracted to produce phrase-based headlines.; Typical question-answering systems target definition questions and produce factoid answers. However, when questions require complex answers, like "who is x" questions, a biography creation engine is required to address the problem. Categorizing a person's life into multiple classes of information, the engine becomes a classification engine, coupled with extraction and re-ranking algorithms, and produces biographies on every aspects of a person's life.; The emergence of multi-party conversations recorded in text, such as online discussions, prompted development and analyses on the summarization of such data input. Recognizing the speech aspect of this type of information, including modeling subtopic structures and the exchanges between multiple speakers, shows a significantly better quality of summaries, whose constructions are also in accordance with what human summary writers do.; Text summarization evaluation previously had been limited to manual annotation or comparison on lexical identity. What separates manual and automatic matching is the ability to paraphrase, which makes automatic metrics extremely venerable. This thesis provides a solution to bridge the gap by using a large paraphrase collection that is acquired through applying statistical phrase-based machine translation (MT) algorithms on parallel data. This procedure produces a significantly higher correlation with human judgments and can become an objective function as part of a summarization system.

机译：本文对将领域知识纳入新兴的信息提炼任务的重要性进行了研究，这些任务在原则上与文本概述类似，但实际上需要先前工作中未充分解决的技术。正在分析的任务是标题生成，传记创建，在线讨论摘要以及摘要的自动评估。本文从经验上表明，尽管传统的文本摘要技术是为通用摘要任务设计的，但它们不能轻易应用于上述四个任务。每个任务都需要有关操作域，数据类型，任务结构和输出结构的先验知识。以此知识设计的技术和算法的性能明显优于没有知识和技术的算法。本文探讨了标题生成或长度很短的摘要生成的解决方案。通过识别标题的特定功能，设计了关键字选择模型来选择值得标题使用的单词。提取围绕这些标题词的上下文信息以产生基于短语的标题。典型的问答系统以定义问题为目标，并产生事实性答案。但是，当问题需要复杂的答案（例如“谁是x”）时，就需要传记创建引擎来解决该问题。该引擎将一个人的生活分为多种信息类别，成为分类引擎，再结合提取和重新排序算法，并生成有关该人生活各个方面的传记。以文本形式记录的多方对话的出现，例如在线讨论，促使人们发展和分析这种数据输入的摘要。认识到这类信息的言语方面，包括对子主题结构进行建模以及在多方讲话者之间进行交流，表明摘要的质量明显提高，摘要的结构也与人类摘要作者的工作相符。以前的文本摘要评估仅限于人工注释或词汇身份比较。手动匹配和自动匹配之间的区别在于释义功能，这使自动度量标准变得极为重要。本文提供了一种解决方案，通过使用大型释义集合来弥补差距，该释义是通过对并行数据应用基于统计短语的机器翻译（MT）算法获得的。该程序与人的判断产生显着更高的相关性，并且可以成为摘要系统一部分的目标功能。

著录项

作者
Zhou, Liang.;
展开▼
作者单位

University of Southern California.;

展开▼
授予单位 University of Southern California.;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 155 p.
总页数 155
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. APPLYING INTERNALISED SOURCE-CULTURE KNOWLEDGE TO SOLVE CULTURAL TRANSLATION PROBLEMS. A QUASI-EXPERIMENTAL STUDY ON THE TRANSLATOR'S ACQUISITION OF CULTURAL COMPETENCE [J] . Olalla-Soler Christian Nature reviews neuroscience . 2019,第2期

机译：应用内部化源文化知识解决文化翻译问题。译者收购文化能力的准实验研究
2. Knowledge distillation methods for efficient unsupervised adaptation across multiple domains [J] . Nguyen-Meidine Le Thanh, Belal Atif, Kiran Madhu, Image and Vision Computing . 2021,第Apra期

机译：知识蒸馏方法，用于跨多个域的高效无监督适应
3. Knowledge organization in intelligent tutoring systems for diagnostic problem solving in complex dynamic domains [J] . Vasandani V., Govindaraj T. IEEE Transactions on Systems, Man, and Cybernetics . 1995,第7期

机译：智能辅导系统中的知识组织，用于解决复杂动态领域中的诊断问题
4. Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains [C] . Haojie Pan, Chengyu Wang, Minghui Qiu, Annual Meeting of the Association for Computational Linguistics;International Joint Conference on natural Language Processing . 2021

机译：Meta-KD：跨域语言模型压缩的元知识蒸馏框架
5. A domain decomposition method for solving electrically large electromagnetic problems. [D] . Zhao, Kezhong. 2007

机译：解决电大电磁问题的域分解方法。
6. Analysis of precurrent skills in solving mathematics story problems. [O] . Nancy A Neef, Diane E Nelles, Brian A Iwata, 2003

机译：分析解决数学故事问题的现有技能。
7. Applying Internalised Source-culture Knowledge to Solve Cultural Translation Problems. A Quasi-experimental Study on the Translator's Acquisition of Cultural Competence [O] . Christian Olalla-Soler 2019

机译：应用内部化源文化知识来解决文化翻译问题。译者采购文化能力的准实验研究

The importance of using domain knowledge in solving information distillation problems.

摘要

著录项

相似文献

相关主题

期刊订阅