首页> 外文会议>Association for Computational Linguistics Annual Meeting >A Framework for Unsupervised Natural Language Morphology Induction
【24h】

A Framework for Unsupervised Natural Language Morphology Induction

机译:无监督的自然语言形态诱导框架

获取原文

摘要

Many natural language processing tasks, including parsing and machine translation, frequently require a morphological analysis of the language(s) at hand. The task of a morphological analyzer is to identify the lexeme, citation form, or inflection class of surface word forms in a language. Striving to bypass the time consuming, labor intensive task of constructing a morphological analyzer by hand, unsupervised morphology induction techniques seek to automatically discover the morphological structure of a natural language through the analysis of corpora. This paper presents a framework for automatic natural language morphology induction inspired by the traditional and linguistic concept of inflection classes. Monson et al. (2004) uses the framework discussed in this paper and presents results using an intuitive baseline search strategy. This paper presents a discussion of the candidate inflection class framework as a generalization of corpus tries used in early work (Harris, 1955; Harris, 1967; Hafer and Weiss, 1974) and discusses an as yet unimplemented statistically motivated search strategy. This paper employs English to illustrate its main conjectures and a Spanish newswire corpus of 40,011 tokens and 6,975 types for concrete examples.
机译:许多自然语言处理任务,包括解析和机器翻译,经常需要对手语的形态分析。形态学分析仪的任务是以语言识别lexeme,引文或表面词形式的拐点。努力通过手工绕过耗时,劳动密集型任务,手工构建形态分析仪,通过对Corpora的分析,寻求自动发现自然语言的形态学结构。本文介绍了由传统和语言概念的自动自然语言形态学诱导的框架。蒙森等人。 (2004)使用本文讨论的框架,并使用直观的基线搜索策略提出结果。本文讨论了候选人拐级框架,作为早期工作中使用的语料库尝试的概括(Harris,1955; Harris,1967; HARRIS和Weiss,1974),并讨论了一个尚未实现的统计上积极的搜索战略。本文采用英语来说明其主要猜想和西班牙新闻记语料库40,011令牌和6,975种类型的具体示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号