首页> 外文学位 >Portable language technology: A resource-light approach to morpho-syntactic tagging.
【24h】

Portable language technology: A resource-light approach to morpho-syntactic tagging.

机译:便携式语言技术:一种语态语法标记的资源匮乏的方法。

获取原文
获取原文并翻译 | 示例

摘要

Morpho-syntactic tagging is the process of assigning part of speech (POS), case, number, gender, and other morphological information to each word in a corpus. Morpho-syntactic tagging is an important step in natural language processing. Corpora that have been morphologically tagged are very useful both for linguistic research, e.g. finding instances or frequencies of particular constructions in large corpora, and for further computational processing, such as syntactic parsing, speech recognition, stemming, and word-sense disambiguation, among others. Despite the importance of morphological tagging, there are many languages that lack annotated resources. This is almost inevitable because these resources are costly to create. But, as described in this thesis, it is possible to avoid this expense.;This thesis describes a method for transferring annotation from a morphologically annotated corpus of a source language to a corpus of a related target language. Unlike unsupervised approaches that do not require annotated data at all and, as a consequence, lack precision, the approach proposed in this dissertation relies on linguistic knowledge, but avoids large-scale grammar engineering. The approach needs neither a parallel corpus nor a bilingual lexicon, and requires much less linguistic labor than the standard technology.;This dissertation describes experiments with Russian, Czech, Polish, Spanish, Portuguese, and Catalan. However, the general method proposed can be applied to any fusional language.
机译:句法标记是将语料(POS),大小写,数字,性别和其他形态信息分配给语料库中每个单词的过程。句法标记是自然语言处理中的重要一步。进行了形态标记的语料库对于两种语言研究都非常有用,例如查找大型语料库中特定结构的实例或频率,并进行进一步的计算处理,例如语法解析,语音识别,词干和词义歧义消除。尽管形态标记很重要,但仍有许多语言缺少注释资源。这几乎是不可避免的,因为创建这些资源的成本很高。但是,如本论文中所述,有可能避免这种花费。;本论文描述了一种用于将注释从源语言的经形态注释的语料库转移到相关目标语言的语料库的方法。与无监督的方法根本不需要注释的数据,因而缺乏精度的方法不同,本文提出的方法依赖于语言知识,但避免了大规模的语法工程。该方法既不需要并行语料库也不需要双语词典,并且比标准技术所需的语言工作少得多。本文对俄语,捷克语,波兰语,西班牙语,葡萄牙语和加泰罗尼亚语进行了实验。但是,提出的通用方法可以应用于任何融合语言。

著录项

  • 作者

    Feldman, Anna.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Language Linguistics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 298 p.
  • 总页数 298
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号