首页> 外文会议>Fifth conference on applied natural language processing >Automatic Acquisition of Two-Level Morphological Rules
【24h】

Automatic Acquisition of Two-Level Morphological Rules

机译:自动获取两级形态规则

获取原文
获取原文并翻译 | 示例

摘要

We describe and experimentally evaluate a complete method for the automatic acquisition of two-level rules for morphological analyzers/generators. The input to the system is sets of source-target word pairs, where the target is an inflected form of the source. There are two phases in the acquisition process: (1) segmentation of the target into morphemes and (2) determination of the optimal two-level rule set with minimal discerning contexts. In phase one, a minimal acyclic finite state automaton (AFSA) is constructed from string edit sequences of the input pairs. Segmentation of the words into morphemes is achieved through viewing the AFSA as a directed acyclic graph (DAG) and applying heuristics using properties of the DAG as well as the elementary edit operations. For phase two, the determination of the optimal rule set is made possible with a novel representation of rule contexts, with morpheme boundaries added, in a new DAG. We introduce the notion of a delimiter edge. Delimiter edges are used to select the correct two-level rule type as well as to extract minimal discerning rule contexts from the DAG. Results are presented for English adjectives, Xhosa noun locatives and Afrikaans noun plurals.
机译:我们描述并实验评估了一种用于形态分析器/发生器的两级规则自动获取的完整方法。系统的输入是源-目标词对的集合,其中目标是源的变体形式。采集过程分为两个阶段:(1)将目标分割为语素;(2)确定具有最小识别上下文的最佳两级规则集。在第一阶段,根据输入对的字符串编辑序列构造最小非循环有限状态自动机(AFSA)。通过将AFSA视为有向无环图(DAG),并使用DAG的属性以及基本编辑操作来应用启发式方法,可以将单词分割为语素。对于第二阶段,可以在新的DAG中使用规则上下文的新颖表示形式(添加了词素边界)来确定最佳规则集。我们介绍了定界边的概念。定界符边用于选择正确的两级规则类型,以及从DAG中提取最小的可识别规则上下文。给出了英语形容词,科萨语名词定位词和南非语名词复数形式的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号