首页> 外文学位 >Annotation syntaxico-semantique des actants en corpus specialise.
【24h】

Annotation syntaxico-semantique des actants en corpus specialise.

机译:专业语料库中参与者的句法语义标注。

获取原文
获取原文并翻译 | 示例

摘要

Semantic role annotation is a process that aims to assign labels such as Agent, Patient, Instrument, Location, etc. to actants or circumstants (also called arguments or adjuncts) of predicative lexical units. This process often requires the use of rich lexical resources or corpora in which sentences are annotated manually by linguists. The automatic approaches (statistical or machine learning) are based on corpora.;Previous work was performed for the most part in English which has rich resources, such as PropBank, VerbNet and FrameNet. These resources were used to serve the automated annotation systems. This type of annotation in other languages for which no corpora of annotated sentences are available often use FrameNet by projection. Although a resource such as FrameNet is necessary for the automated annotation systems and the manual annotation by linguists of a large number of sentences is a tedious and time consuming work. We have proposed an automated system to help linguists in this task so that they have only to validate annotations proposed.;Our work focuses on verbs that are more likely than other predicative units (adjectives and nouns) to be accompanied by actants realized in sentences. These verbs are specialized terms of the computer science and Internet domains (ie. access, configure, browse, download) whose actantial structures have been annotated manually with semantic roles. The actantial structure is based on principles of Explanatory and Combinatory Lexicology, LEC of Mel'cuk and appeal in part (with regard to semantic roles) to the notion of Frame Element as described in the theory of frame semantics (FS) of Fillmore. What these two theories have in common is that they lead to the construction of dictionaries different from those resulting from the traditional theories. These manually annotated verbal units in several contexts constitute the specialized corpus that our work will use.;Our system designed to assign automatically semantic roles to actants is based on rules and classifiers trained on more than 2300 contexts. We are limited to a restricted list of roles for certain roles in our corpus have not enough examples manually annotated. In our system, we addressed the roles Patient, Agent and destination that the number of examples is greater than 300. We have created a class that we called Autre which we bring to gether the other roles that the number of annotated examples is less than 100.;We subdivided the annotation task in the identification of participant actants and circumstants and the assignment of semantic roles to actants that contribute to the sense of the verbal lexical unit. We parsed, with Syntex, the sentences of the corpus to extract syntactic informations that describe the participants of the verbal lexical unit in the sentence. These informations are used as features in our learning model. We have proposed two techniques for the task of participant detection: the technique based in rules and machine learning. These same techniques are used for the task of classification of these participants into actants and circumstants. We proposed to the task of assigning semantic roles to the actants, a partitioning method (clustering) semi supervised of instances that we have compared to the method of semantic role classification. We used CHAMELEON, an ascending hierarchical algorithm.;Key-words: actant, circumstant, semantic roles, syntactic features, classification, clustering, CHAMELEON algorithm, Explanatory and Combinatory Lexicology (LEC), Frame semantics (FS), DicoInfo, FrameNet
机译:语义角色注释是一个过程,旨在为谓词性词汇单位的行为者或旁观者(也称为自变量或辅助词)分配诸如“代理人”,“病人”,“仪器”,“位置”之类的标签。这个过程通常需要使用丰富的词汇资源或语料库,其中的句子是由语言学家手动注释的。自动方法(统计或机器学习)是基于语料库的;以前的工作大部分是用英语进行的,它具有丰富的资源,例如PropBank,VerbNet和FrameNet。这些资源用于服务自动注释系统。其他语言中没有可用注释语料库的这种类型的注释通常通过投影使用FrameNet。尽管对于自动化注释系统而言,诸如FrameNet之类的资源是必需的,而语言学家对大量句子的人工注释是一项繁琐且耗时的工作。我们提出了一个自动化系统来帮助语言学家完成这项任务,以便他们仅需验证提出的注释。;我们的工作重点是比其他谓词单元(形容词和名词)更可能带有在句子中实现的主语的动词。这些动词是计算机科学和Internet域(即访问,配置,浏览,下载)的专用术语,其作用结构已通过语义角色手动进行了注释。言语结构是基于解释和组合词汇学原理,梅尔库克的LEC以及部分(就语义角色而言)对框架元素的概念的呼吁,如Fillmore框架语义学(FS)理论中所述。这两种理论的共同点在于,它们导致了与传统理论不同的字典的构建。这些在多个上下文中手动注释的语言单元构成了我们的工作将使用的专业语料库;我们的系统旨在根据在2300多个上下文中训练的规则和分类器,自动为参与者分配语义角色。我们仅限于角色的受限列表,因为我们的语料库中的某些角色没有手动注释的足够示例。在我们的系统中,我们处理了示例数大于300的“患者”,“代理”和“目标”角色。我们创建了一个名为Autre的类,将带注释的示例数小于100的其他角色带到了其他角色。 。;我们将注释任务细分为识别参与者参与者和旁观者,以及将语义角色分配给有助于语言词汇单元意义的参与者。我们使用Syntex对语料库的句子进行了语法分析,以提取句法信息,这些信息描述了句子中语言词汇单元的参与者。这些信息在我们的学习模型中用作功能。我们提出了两种用于参与者检测任务的技术:基于规则和机器学习的技术。这些相同的技术用于将这些参与者分为参与者和旁观者的任务。我们为分配给角色的语义角色的任务提出了一种对实例进行半监督的分区方法(聚类),我们将其与语义角色分类方法进行了比较。我们使用了CHAMELEON(一种升序的层次算法);关键词:主体,环境,语义角色,句法特征,分类,聚类,CHAMELEON算法,解释和组合词法(LEC),框架语义(FS),DicoInfo,FrameNet

著录项

  • 作者

    Hadouche, Fadila.;

  • 作者单位

    Universite de Montreal (Canada).;

  • 授予单位 Universite de Montreal (Canada).;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 165 p.
  • 总页数 165
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 肿瘤学;
  • 关键词

  • 入库时间 2022-08-17 11:44:54

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号