首页> 外文期刊>Linguistic Issues in Language Technology >Modality annotation for Portuguese: from manual annotation to automatic labeling
【24h】

Modality annotation for Portuguese: from manual annotation to automatic labeling

机译:葡萄牙语的模式注释:从手动注释到自动标记

获取原文
           

摘要

We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classifier trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling ambiguity and the semantic and syntactic?? properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new unified scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.??.
机译:我们研究葡萄牙语中的情态,并将语言学观点与面向应用的情态观点相结合。我们设计了一种反映理论语言概念的注释方案,并将此方案应用于一个小的语料库样本,以显示该方案如何处理现实世界中的语言用法。我们为葡萄牙语提供了两种模式,一种为巴西口语,另一种为欧洲葡萄牙语。此外,我们使用带注释的数据不仅研究了情态的语言现象,而且训练了一种实用的文本挖掘工具来自动检测文本中的情态。模态标记器使用机器学习分类器,该分类器接受了从语法分析器中自动提取的特征的训练。由于我们只有一个小的带注释的示例,因此对标记语进行了评估,该标记语是在我们的语料库中很常见的11个情态动词,表示了不止一种情态含义。最后,我们讨论了对语气语义概念复杂性的一些有价值的见解,这些复杂性源于对语料库的手动注释过程以及对自动标记歧义的结果以及语义和句法的分析。属性通常与上下文中的一种情态意义相关联,以及情态与否定和焦点的相互作用。从手动注释任务中获得的知识使我们提出了一种新的模态统一方案,该方案适用于两个葡萄牙语品种,并且涵盖了书面和口头数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号