...
首页> 外文期刊>Journal of Functional Programming >A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger
【24h】

A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger

机译:用于形态学和语音处理的功能性工具包,应用于梵语标记器

获取原文
获取原文并翻译 | 示例
           

摘要

We present the Zen toolkit for morphological and phonological processing of natural languages. This toolkit is presented in literate programming style, in the Pidgin ML subset of the Objective Caml functional programming language. This toolkit is based on a systematic representation of finite state automata and transducers as decorated lexical trees. All operations on the state space data structures use the zipper technology, and a uniform sharing functor permits systematic maximum sharing as dags. A particular case of lexical maps is specially convenient for building invertible morphological operations such as inflected forms dictionaries, using a notion of differential word. As a particular application, we describe a general method for tagging a natural language text given as a phoneme stream by analysing possible euphonic liaisons between words belonging to a lexicon of inflected forms. The method uses the toolkit methodology by constructing a non-deterministic transducer, implementing rational rewrite rules, by mechanical decoration of a trie representation of the lexicon index. The algorithm is linear in the size of the lexicon. A coroutine interpreter is given, and its correctness and completeness are formally proved. An application to the segmentation of Sanskrit by sandhi analysis is demonstrated.
机译:我们介绍用于自然语言的形态和语音处理的Zen工具包。该工具包以Objective Caml功能编程语言的Pidgin ML子集以素养的编程风格提供。该工具包基于有限状态自动机和换能器作为修饰词法树的系统表示。状态空间数据结构上的所有操作均使用拉链技术,统一的共享函子允许系统最大程度地共享数据。词汇映射的一种特殊情况对于使用差分词的概念来构建可逆的形态运算(例如,变形的形式字典)特别方便。作为一个特定的应用程序,我们描述了一种通用方法,通过分析属于变形形式词典的单词之间可能存在的回声联系,来标记作为音素流给出的自然语言文本。该方法使用工具箱方法,通过机械装饰词典索引的trie表示来构造非确定性换能器,实施合理的重写规则。该算法的词典大小是线性的。给出了协程解释器,并正式证明了其正确性和完整性。演示了一种通过泛函分析在梵语分割中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号