首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction
【24h】

From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction

机译:从细分到分析:无监督形态归纳的概率模型

获取原文

摘要

A major motivation for unsupervised morphological analysis is to reduce the sparse data problem in under-resourced languages. Most previous work focuses on segmenting surface forms into their constituent morphs (e.g., taking: tak +ing), but surface form segmentation does not solve the sparse data problem as the analyses of take and taking are not connected to each other. We extend the MorphoChains system (Narasimhan et al., 2015) to provide morphological analyses that can abstract over spelling differences in functionally similar morphs. These analyses are not required to use all the orthographic material of a word (stopping: stop +ing), nor are they limited to only that material (acidified: acid +ify +ed). On average across six typologically varied languages our system has a similar or better F-score on EMMA (a measure of underlying morpheme accuracy) than three strong baselines; moreover, the total number of distinct morphemes identified by our system is on average 12.8% lower than for Morfessor (Virpioja et al., 2013), a state-of-the-art surface segmentation system.
机译:无监督形态分析的主要动机是减少资源不足语言中的稀疏数据问题。以前的大多数工作都集中于将表面形式分割成其组成的形态(例如,tak + ing),但是表面形式分割不能解决稀疏数据问题,因为对take和take的分析没有相互联系。我们扩展了MorphoChains系统(Narasimhan等人,2015),以提供形态分析,可以抽象出功能相似的形态中的拼写差异。这些分析不需要使用单词的所有正字法材料(停止:stop + ing),也不限于仅使用该材料(酸化:酸+ ify + ed)。平均而言,我们的系统在六种不同类型的语言上的EMMA(衡量基本语素准确性的指标)的F评分高于或高于三个强基准。此外,我们的系统识别出的独特语素的总数平均比最先进的表面分割系统Morfessor(Virpioja等人,2013年)低12.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号