首页> 外文会议>International Workshop on Finite-State Methods and Natural Language Processing >Scaling an Irish FST Morphology Engine for Use on Unrestricted Text
【24h】

Scaling an Irish FST Morphology Engine for Use on Unrestricted Text

机译:缩放爱尔兰FST形态引擎以用于不受限制的文本

获取原文

摘要

This paper details the steps involved in scaling-up a lexicalised finite-state morphology transducer for use on unrestricted text. Our starting point was a base-line inflectional morphology engine [1], with 81% token coverage measured against a 15 million word corpus of Irish texts [2]. Manually scaling the FST lexicon component of a morphology transducer is time-consuming, expensive and rarely, if ever, complete. In order to scale up the engine we used a combination of strategies including semi-automatic population of the finite-state lexicon from machine-readable dictionary resources and from printed resources using optical character recognition, the addition of derivational morphology and the development of morphological guessers. This paper details the coverage increase contributed by each step. The full system achieves token coverage of 93% which is extended to 100% through the use of morphological guessers.
机译:本文详细介绍了缩放的涉及的步骤,用于释放一个用于不受限制的文本的词汇化的有限状态形态传感器。我们的起点是底线拐点形态发动机[1],以81%的令牌覆盖率测量,测量了1500万字形的爱尔兰文本语料[2]。手动缩放形态传感器的FST词典组分是耗时,昂贵的,很少,如果有的话,完整。为了扩大发动机,我们使用了来自机器可读字典资源的有限状态词典的半自动群体的策略组合,并使用光学字符识别从印刷资源,添加衍生形态和形态猜测的发展。本文详细介绍了每一步所贡献的覆盖范围。全系统实现了93%的令牌覆盖率,通过使用形态渗透器延长至100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号