首页> 外文会议>International conference on language resources and evaluation >Extraction of Unmarked Quotations in Newspapers A Study Based on Direct Speech Extraction Systems
【24h】

Extraction of Unmarked Quotations in Newspapers A Study Based on Direct Speech Extraction Systems

机译:报纸无标记语录的提取基于直接语音提取系统的研究

获取原文

摘要

This paper presents work in progress to automatically extract quotation sentences from newspaper articles. The focus is the extraction and annotation of unmarked quotation sentences. A linguistic study shows that unmarked quotation sentences can be formalised into 16 patterns that can be used to develop an extraction grammar. The question of unmarked quotation boundaries identification is also raised as they are often ambiguous. An annotation scheme allowing to describe all the elements that can take place in a quotation sentence is defined. This paper presents the creation of two resources necessary to our system. A dictionary of verbs introducing quotations has been automatically built using a grammar of marked quotations sentences to identify the verbs able to introduce quotations. A grammar formalising the patterns of unmarked quotation sentences - using the tool Unitex, based on finite state machines - has been developed. A short experiment has been performed on two patterns and shows some promising results.
机译:本文介绍了正在进行的从报纸文章中自动提取引语句子的工作。重点是未标记引号句子的提取和注释。语言学研究表明,未标记的引号句子可以形式化为16种模式,可用于发展提取语法。没有标记的引号边界标识的问题也引起了,因为它们通常是模棱两可的。定义了一种注释方案,该注释方案允许描述可能在引用语句中发生的所有元素。本文介绍了创建我们系统所需的两种资源。引入引语的动词词典已使用标记引文句子的语法自动构建,以识别能够引入引语的动词。已经开发出一种语法,使用基于有限状态机的Unitex工具将未标记引号句子的模式正式化。对两种模式进行了简短的实验,结果显示出了一些有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号